|
Email: cameron.blackwood@gmail.com |
Phone and address on request... |
Subjects included: communicating processes, systems software, advanced operating systems, distributed systems, neural networks, OSI networking, public telecommunications and user interface design.
Initially I came on board to provide assistance in planning and execution of a migration of a number of physical and virtual servers to a SAN, but due to hardware delays that project was placed on hold.
While waiting for arrival of the SAN hardware, I looked at the monitoring system for METS, chiefly to add notifications to the system.
Nagios is somewhat tricky to hand configure so as part of this work I wrote an automatic system which used a scripts that run on each server to detect items to monitor and then create a text configuration file automatically. All the server text configuration files would then build the final Nagios configuration objects and files.
One of the production NFS networks had a few issues, so I wrote a Nagios plug-in to monitor the time to write and verify a file on each file system we used (and graph historical results). We could then plot availability and response time of each filestore to check performance.
To verify some of the web applications were working, I wrote a script to login, read, write and verify a timestamp through the application. This allowed us to verify the underlying filesystem was working correctly. I also wrote a script to locate the timestamp in the files on the backup site and to plot the time difference between the current and backup timestamp to allow us to monitor and plot the age of the backups (and notify us if they became more than a certain age).
For historical graphing, I looked at linking Munin to Nagios, but finally settled on pnp4nagios as a way of providing historical RRD graphs for the services.
I was employed in the Technology Services Group as a Senior System Administrator for the 13 Linux servers. My everyday tasks included system troubleshooting, updating of system documentation, security auditing and software upgrades of the core servers as well as a number of major infrastructure projects.
The Faculty was preparing a hardware upgrade and I developed the plan to setup and migrate the existing services to the new hardware. The plan detailed the steps required to roll out the new hardware, setup a disk based backup system, configure the SAN storage, implement virtualisation and port the services to the new system.
As part of this plan, I also worked with the other staff updating the documentation for the existing setup and determining the requirements of the new equipment, in terms of computing resources and physical requirements (e.g. power, server room planning space).
A subproject to the hardware migration was to examine both Xen and VMWare as suitable methods of virtualisation for the core services, to allow service migration and easier Disaster Recovery (DR). Another subproject was to explore different disk based backup systems (rsync, rdiff-backup, bacula) to replace the existing tape backup system.
I implemented a centralised monitoring system for the Linux and Windows servers as well as other critical networked systems. The system comprised two parts: historical performance monitoring (with Munin) and system status/reporting (with Nagios).
The historical performance monitoring allowed for graphing and detection of trends over long periods which was useful for long term capacity planning. Historical monitoring also aided in tracking down more everyday issues such as identifying abnormal filesystem usage. Due to problems with our Air Conditioning system, I wrote a Munin module to graph the internal server temperature to allow better tracking of environmental conditions. I also expanded the notification system to include an automatic script which attempted to 'Wake on Lan' the lecture theater systems (in case they were turned off after a lecture) before notifying the helpdesk to ensure the machines were always available during university teaching hours. This script solved around 90% of the issues with the lecture theater systems automatically, thus reducing helpdesk calls and also allowed quicker response to issues that needed intervention.
I successfully applied for a $60000 Faculty Research Infrastructure Scheme grant for funds to build a virtualised research cluster, so that research groups could quickly be allocated pre-imaged computing resources at a predefined cost based on resource use, such as storage and CPU load.
I was employed to help manage the Department's IT infrastructure while one of the Sysadmins was on leave. My general tasks included password resets for staff and students as well as installing and upgrading software on the Solaris systems.
A major project at the department was to document and analyse the existing Network Time Protocol (NTP) servers and determine how to upgrade the system.
NTP is a service which allows computers to determine the time to a high degree of accuracy which is required for legal timestamping of system logs as well as a variety of computer services. The Department provides a number of public stratum-1 time servers and provides time synchronisation services to a large number of organisations around Australia, including Optus, ANU, Monash University, Internode and many others. The existing setup included 3 public NTP servers, a custom distribution board (to drive local wall clocks and a nearby department) and 2 GPS systems. All of this was designed and built by people no longer working for the department and there was very little knowledge of how the system was set up.
I explored and documented the current system and worked on planning how to replace the existing BSD based servers with Linux servers. Accurate NTP timing via GPS requires kernel patches to allow more exact synchronisation with the GPS timing pulses, so I applied the patches and worked on configurations for the NTP service which allowed for the most accurate timing.
Another major task was to examine how local software was installed and to create a scripted solution to allow automatic building of software on the variety of Solaris operating system versions that had to be supported.
I explored various solutions, including BSD's port system and Gentoo's prefix ebuild before creating a simple, shell like, scripting system that allowed for simple custom build scripts for packages that allowed all local packages to be custom compiled across the variety of operating system. The system was geared towards GNU tools, but was customisable to handle other complex build systems.
My area of work is the Aditi network application programming interface (API). This API provides for an interface to the Aditi engine that client applications can use to access Aditi functionality and also provides a transparent network transport to allow a number of clients to access a centralised Aditi engine remotely. The Aditi engine is written in C++ and Mercury requires a C interface, so the API is mostly written in C but does interface to the C++ engine.
Major work for the project began with the design and documentation of the API in cooperation with the Aditi engine programmer and the Mercury design team. The previous interface was poorly documented and implemented and the development of a clean, agreed interface was an important step.
Following the documentation of the API, I designed an incremental, multi-layered transport method to carry data from Mercury to the Aditi engine. The multi-layered design allowed for extensive testing and verification of the building blocks of the final network transport.
One of the key design feature of the API I designed and built is the automatic generation of the library code directly from the API document. The design document is used to generate a large percentage of the Aditi API library directly. The API always reflects the documented API and a change in the API is created by running the build process on the new API document.
Because of the complexity of Aditi, a light weight, client only library was developed to allow client code to be built on systems not yet supporting a full Aditi engine compile.
The API also builds an Aditi engine dynamic library that allows client or server programs to be built without linking the large Aditi engine. To allow simple scripting and easier creation of clients, an Aditi python module is also created. Both of these are created directly from the documentation.
While working as a research programmer, I also administer 6 Unix machines for our project, other research students and staff, running a variety of operating system and hardware combinations, including Gentoo Linux (ppc, ppc64, sparc, x86), Mandriva Linux (x86) and Mac OSX (ppc). These machines are used for project development, postgraduate research as well as research student and staff desktop machines.
I was the lone IT officer providing computer support for the department and the Australian Crustal Research Center. Earth Sciences relies heavily on computer modeling, measurement and research and had an advanced computer infrastructure to support this work. I supported the 150 staff and postgraduate students, as well as administration of the department's servers, workstations and internal high speed network.
Department infrastructure included:
To improve information available to the department, I created an
extensive documentation system for common issues in the department and
for most common tasks.
I installed a number of online applications to provide information to
the department including a service monitor (Nagios), issue tracking
for IT problem (wreq) and modem usage graphs. The issue tracker
especially helped users understand what was going on inside the
department and with their own issues/problems.
[local copy: http://dev.kmem.info/cameron.blackwood/work/esc/www.earth.monash.edu.au/computer/index.html]
I centralised the data for the department, creating a database of user and host information. I then used that information to generate Unix and desktop passwords, phone book, mail aliases, university alias maps and verification of the university LDAP database. The host information automatically created DNS, DHCP and university addhost entries.
To increase security, I migrated desktop hosts to a restricted subnet to protect them from external scanning and attack. With proxied services, most users noticed no change in service. By using DHCP leases to the restricted subnet, it allowed staff with new equipment to use department services quickly. Server scripts which examined system logs allowed me to quickly track down owners and configure newly added systems, usually simply via a 'host configuration page' which updated the host database.
I developed a system of installing local department Unix software by creation of a symlink tree. This allowed for installing some packages onto local disk and accessing the rest via the network on a host by host basis. I also setup a lab of dual boot mac/Linux hosts with a custom rapid clone/reinstall system (similar to ghost).
The largest task was the redesign of the server and network structure for the department. This process involved examining the services the department required, rating their importance and ability to be replicated and then designing a structure to provide the services. This provided the opportunity to unify the services (such as printing and network disk) provided to the Windows, Unix and Mac clients into one common 'view' to allow users to migrate hosts easily and share data.
The final design had three servers providing a robust set of services which were either replicated or could be quickly switched from one server to another in the event of failure. The server had a RAID file system, mirrored to the secondary servers and backed up to a SCSI tape jukebox.
I consulted extensively with the central Information Technology Services for integration with the university network and IT standards.
I was an emergency warden for the building and underwent warden training.
I worked in two parts of the organisation, first the System Administration group which installed and maintained 25 mission critical HPUX machines which provided the core services to the BoM. Later, I moved to the networking group which was responsible for maintaining network services to BoM sites all over Australia (and beyond). Because both groups were pre-existing and working reasonably well, my work at the BoM was mostly solo tasks designed to implement something new or ease the existing load.
The System Administration group maintained:
My first task at the system administration group was to rewrite the adhoc backup script system to use a unified script with configuration files. The new script handled fitting the data to the tape size, notifying the operators, creation of table of contents files and emailing status information to interested parties.
The next task was to develop a system to allow for the quick installation of local software on new hosts. I adapted the Linux Redhat package manager to HPUX and setup source rpm scripts to build the required binary packages from source. This allowed for very quick installation of new software as well as much easier update of existing servers when new software versions were released.
All through my time in the system admin group, I was heavily involved in security issues, as the BoM did not have a very well developed security culture. My security work involved patching, checking the vendor patches against available vulnerabilities, a lot of user education as well as helping the new commercial web service division attempt to develop secure services.
While with the System Administration group, I attended training on HPUX system administration. I also attended a course on Storage Tek tape silo operation, programming and administration (the BoM supercomputers had access to a huge room sized tape storage machine to archive historical and supercomputer generated data.)
When I moved to the network section I was tasked with measuring and analysing the BoM's external network usage and to reduce the bandwidth usage. Following the network analysis, I designed and developed a failover web proxy system for the BoM head office. I also investigated a 'dynamic get' system for Usenet news with another IT group.
I discovered a design flaw in the network layout that was causing network collapse under the heavy loads initiated by the Cray computer. Locating the problem was made more difficult because the network monitoring system would crash and manual attempts to track the problem would usually 'resolve' the problem. After much investigation and examining the network design, I noticed an asymmetric routing issue which caused broadcast floods.
I also helped the the rest of the network team with the physical installation of a redundant Cisco switched backbone in the head office building.
I worked for Fulcrum and was contracted to the PoMA to provide system administration services for the core servers, which included 5 mission critical, FDDI connected servers as well as 15 support servers, 400 users and 120G of disk, all interfacing via ISDN lines to a wide set of sites around Victoria's coast.
I was also involved in the outsourcing of the IT area and the relocation of the PoMA head office, the re-networking of all the servers and the addition of a firewall and improved security.
Initially I was contracted, to develop a model for web based, online manuals and paperless form submission for administration for the Faculty. This involved taking the existing paper forms and translating them into online forms with inbuilt help to automate the administrative load for the Faculty.
This initial project evolved into the Faculty Web Development group which eventually provided web services to the whole Faculty. I developed the initial templating tools that became an essential part of the day to day operation for the WebDev Group.
Subjects that I tutored or demonstrated in included:
I have attended week long training courses on:
Open Source Developers Conference 2004, presented a lightning talk on "Automatic generation of code". I presented an expanded 30 minute presentation for OSDC 2005 on the same topic.
I ran day long, "Introduction to Python Programming" courses for both the Department of Earth Sciences and the Faculty of Business and Economics.
I have been involved in the following published papers, conference presentations and technical reports: