Hello world!

Welcome to Day of DH 2010. This is the first time I participate in the dayofdh. I’m looking forward to it and I’m curious about the others activities.

Start of the Day of the DH (I)

Although the dayofthedh2010 had not started officially (tonight), I had to comment on the fb status of a friend of mine who struggled with the pitfalls of copyright. Working digital today means (it is better) to be educated in the field of law, especially the copyright(s). Sigh!

Start of the Day of the DH (II)

Starting with the day in the office means to have to wait for some minutes for the computer to come up as I have this one since quite some time and working “digitally” means to have to test many pieces of software during this time which crowds the registry of windows. This results in the nice opportunity to start reading a book, have a coffee or sort some papers after I start my computer in the morning.

Which data is good data?

The last days I had an intern start with the conversion of one of the remaining card indices we have: a card index with initia of medieval manuscripts. Now, many of these initia have been taken from older publications and I am thinking about whether we should digitise the entire publication and what the use of this will be for the community. Definitely will it be good to have this “historical source” on our manuscripts online but will we have the time to turn the images into (good!) electronic text? I estimate that multiple languages and italics will cause the OCR some trouble. Thus, the main question is what data do we want to provide and what do we need? Is it better to have a start (a concordance of image numbers and manuscript shelfmarks), no matter what quality the data might have (a rough, Google-like OCR full of errors) or should we put our energy into something else from the start?

Materials in digital editions

Right before lunch I discussed with my colleague a special problem in a digital edition project that I supervise. (http://diglib.hab.de/edoc/ed000006/start.htm) In the course of the edition we provide the images, some structural information (headings, chapters, images etc) that should be derived from the transcription of the text. Now we have some (large) parts where we don’t have enough resources to transcribe the entire text but provide images of existing transcriptions other researchers have prepared. Now our problem was: If we generate a file that in general contains the transcription of the manuscript but we replace the transcription with references to the images of the transcription and supply some parts of the transcribed text as “citation” from the transcription, we end up with bits of text that belong to the transcription of the transcription but not to the transcription of the manuscript and thus should not appear in the respective file.

All clear? Yes, that’s what digital editing can look like.