Hello Day of DH!

Just got into work, on a bright, sunny day with a nip of frost in the air. My colleague tells me there are brand-new baby rabbits in front of the Clearihue building — the first of the year, one cinnamon and one black. Now to deal with the email.

TimeCapsule died. Grrr.

First thing this morning, doing some work before breakfast, I looked across the desk to see that the little green light on my Apple TimeCapsule network backup drive had gone out. It seems to be dead. It’s less than two years old, but there have been a lot of reports of TimeCapsules dying at about one and a half years old. I have other backups of key data, but I always feel a bit exposed without the NAS drive on the network. I’ll have to buy another one quickly. I won’t buy any more Apple hardware; we seem to have seen laptop batteries dying without warning, hard drives failing, and all sorts of other problems in the last couple of years with our Apple stuff. I have a couple of Linux-based NAS devices in mind. In the meantime, I’ll be copying lots of data across all my machines to get some peace of mind.

Writing a generic PHP and MySQL db system

For a while now, I’ve been working on something called the “AdaptiveDB”, which is an attempt to create a PHP codebase that will enable us to get a relatively simple database system up and running very quickly when faculty need one. We’ve had an increasing number of researchers who need web-based databases for their projects — usually five or six tables, sometimes more — and we’ve never found an existing system flexible enough for our needs which we can use to build a db system quickly. With the AdaptiveDB code, I can create a MySQL db in PhpMyAdmin, then build a GUI for it in an hour or so, including searches, filters, editing and a few other specialist features such as custom fields.

It’s one of those codebases that starts on one project, then gets pushed forward on another one, then on another, and then at some point you need to merge all the changes back into the older projects that are already using it. PHP isn’t my favourite development platform but I’m getting more comfortable with it. This morning I’m adding new features to the search page, so that users can decide which fields are shown and which are hidden in a table of search results.

A spreadsheet is not a database (and vice versa)

Had a meeting this morning with a researcher who is collecting some fairly complex data on property records, and has been using a spreadsheet so far. He’s one of several who’ve approached us with the same sort of problem in the last couple of years: initially, the spreadsheet looks like a simple and convenient way to collect and collate data, and it’s especially handy when your eventual intent is to run statistical analyses in SPSS, but as the project proceeds, the spreadsheet gets more and more complicated and unwieldy. The most irritating problem is the one-to-many relationship: if your record has (say) some people associated with it, and you can’t predict exactly how many people, you need to create columns for each potential person, resulting in a whole lot of columns, most of which are empty in most records.

Of course, relational databases are designed for exactly this kind of thing, and we’ve moved a few projects from spreadsheets to web-based databases over the years. This gives the added advantage of accessibility from anywhere, and more robust multi-user support where there’s a team working on the data. And you can always spit out a spreadsheet from a database.

However, data entry in a web-based database tends to be a more stepped or granular affair than working in a spreadsheet. In a spreadsheet, you can jump from row 95 to row 209 any time you like, to grab some data and copy it, then go back to 95 to paste; the whole array of data is exposed simultaneously. That’s harder to do in a web application based on a db. In particular, although the back-end can handle one-to-many relationships very elegantly, providing a data-entry interface that makes it easy to handle them in a single form can be tricky. The researcher was asking me if we could have all the benefits of the database structure, but with a data entry interface that is as simple as a spreadsheet — while somehow working around the difficulties actually scrolling around in an enormous spreadsheet.

My tentative answer is “maybe”, but I haven’t seen many examples of such elegance. I can’t (yet) imagine how to do this with his data set.

Where would we be without XSLT?

One of the lovely things about working with XML is that it comes along with XSLT. I’ve just been in a meeting with two of my colleagues who are working on an aboriginal language dictionary. The data is already in XML, converted from a DOS Lexware project a few years ago; now they’re working through the data and re-editing it. Having worked for a couple of weeks, though, it’s becoming clear that some of the editing they’re doing is actually predictable and mechanical. As soon as something like this emerges, I can be tasked with writing XSLT to process all the waiting data and make those changes automatically. A couple of hours of my time doing that can save days for the project as a whole.

It sucks being the only one in the office…

My two colleagues Greg Newton and Stewart Arneil are on vacation this week. That means every phone call and drop-in help request that comes in lands on me, which means my own project work slows to a crawl. One of the biggest issues we face in trying to be a friendly, accessible unit for our users (faculty and staff) is the fact that interruptions have a very bad impact on programming productivity. It takes a while to get your head into the right zone to work on a particular project, and once you’re there, and getting productive, it doesn’t help to have to stop for ad-hoc meetings every fifteen minutes. That’s why I get my best work done before breakfast, and after everyone else has gone home. And that’s why my days are so long. Grumble grumble. Normally, with three of us in the office, you can put the headphones on and give yourself a couple of hours of uninterrupted concentration at least once a day, but this week has been really frustrating. They’re back on Monday. Then I can work.

End-of-the-day tasks

It’s nearly five, and I’m supposed to be going home, but I have to do my usual end-of-day tasks, which take ages these days. First, I have to bring down all the data created and changed during the day by research assistants on three different projects; I validate all the XML to make sure it’s OK; then I push various bits of it up into various eXist databases, so they can see their work tomorrow morning. Then I do another set of backups from my machine to our backup server, covering all of that work (LOCKSS) and the PHP and other stuff I’ve been working on today.

By the time that’s done, I’ve done a bunch of unwanted overtime again. Thank god for rsync, though. And I only have this work to do because I’m blessed with lots of productive workers turning out lots of lovely data for my projects, every day.

Backups and more backups

Since my accursed TimeCapsule gave up the ghost this morning, I’m feeling all inadequately backed up, so I’m now rushing around at home making extra copies of stuff on various external drives and spare hard drives. I always put at least two drives in any desktop computer I buy — sometimes three — and I also have one of those drop-in docks for SATA drives that connects over USB, so there’s no shortage of space, but the annoying thing is not being able to use my scripts to do the backups because all the locations and mount points have changed. I have to actually remember the rsync flags, and figure out again how robocopy works on the Windows machines. I shall never forgive Mr Jobs for this.

This is the second NAS drive I’ve had that died. The first was a Maxtor, but that had two HDs in it, and I was able to pull it to bits and get one working drive out of it. The TimeCapsule doesn’t look so easy to dismantle. Maybe I’ll be able to get it out, though, and discover that there’s some way to mount it and get the data off it.

I never really trusted the damn thing. It always ran too hot to touch. It was too busy looking cool to do its job properly.