The Unix Way

Last year, I read a book called Linux and the UNIX Philosophy by Mike Gancarz, and it changed my life.

Not completely, of course. I’ve been using UNIX more-or-less exclusively since the mid-nineties or so (the first program I ever wrote wasn’t for a Mac or a PC, but for an IBM RS/6000 running AIX). I moved from there to Solaris, then Linux, and then (after many years) OS X. Even today, with my fancy MacBook Pro, I spend 90% of my time on the command-line dialed into a Linux server.

Spending that much time developing for a single OS influences your thinking as a programmer in ways that you might not be able to identify. Reading Gancarz identified the biases, dispositions, and beliefs that I’d silently absorbed over the last ten years. He presents them as the Nine Precepts:

  1. Small is beautiful.
  2. Make each program do one thing well.
  3. Build a prototype as soon as possible.
  4. Choose portability over efficiency.
  5. Store data in flat text files.
  6. Use software leverage to your advantage.
  7. Use shell scripts to increase leverage and portability.
  8. Avoid captive user interfaces.
  9. Make every program a filter.

It’s amazing how far this list went toward explaining my relationships with other developers (from other platforms). “We’re building a system using the Spring framework, Fedora, Hibernate, and Cocoon. We have the whole thing hooked up to our continuous integration server, and the unit testing framework is automatically called by Maven from Eclipse.”

Sorry, what?

Of course, I know what all of these things are and what they do. I’ve used most of them on one project or another, and they are obviously good at solving the problems they were designed to solve. But the whole thing taken together strikes me—intuitively—as a steaming mass of unnecessary complexity. On the projects I’ve run, I’ve made people demonstrate to me exactly why we need these things right now for where we are in the project right now. Half the time, people look at me like I’m insane. For them, all of these things are just “best practices.”

To me, they’re violations of several UNIX precepts. Even the individual components seem like all-singing, all-dancing examples of way-too-big. They inter-operate with each other and with the rest of the world in strongly coupled ways. It’s exceedingly hard (if not impossible) to back out of them if you change your mind. They are “portable” in the same sense that a cargo container is portable (you need something that can do the heavy lifting). When confronted with this sort of thing, I often wonder, “Why can’t it be dumb, flat, fleet, and small?” “Simple as we can make it (but no simpler).” That kind of thing.

When the so-called “agile methods” came along (XP, Scrum, etc.), some of it sounded like the common liturgy of UNIX: “What’s the dumbest thing that can possibly work?” “Write tracer bullets.” “Do as little up-front specification as possible.” “Iterate often.” But before long, all of these things turned into Frameworks and Methodologies. I listened for a long while, and absorbed several of the lessons, but in the end, the whole thing seemed (to paraphrase another curmudgeon) like a carefully planned, formally-specified, bug-ridden, slow implementation of half of the UNIX philosophy.

I realize that my thinking (and Gancarz’s) is open to the charge that this is more ideology than philosophy. Fair enough. But the more pressing issue is this: I still have to write software with people that think Eclipse is something other than a grotesque abuse of precepts 1,2,6,7,8, and 9, and I still have to use software designed by people who think the Spring framework is a really exciting thing that we should “leverage” on our next project.

I don’t know how many people are converted by Gancarz’s precepts (not many, apparently). But for someone like me (who already believes) the book provides a powerful set of carefully considered counterclaims to the prevailing orthodoxy. I knew it! I’d just never thought of it as a thing I knew.

Gancarz’s eighth precept — “Avoid captive user interfaces” — was probably the most enlightening of all. “Yes!” I thought. “Captive interfaces suck. Down with captive user interfaces. Filters! We need more filters.” Then he proceeded to demonstrate how programs like Mutt violate this principle.

Mutt? Now wait just a doggone (sorry) minute. Mutt is a hard core UNIX program for hard core UNIX nerds. It runs on the command line. It does one thing only (it’s not even an MTA!). It can be configured a zillion different ways using flat text files. How can that not be consistent with the UNIX philosophy?

It holds you captive. It curses you.

Okay, smart guy. I suppose you have a command-line mail program that is as powerful and UNIX-y as Mutt, but which acts as a filter? Actually, he did. It’s called nmh.

The “n” in nmh stands for “new,” but there’s really nothing new about the program at all. In fact, it was originally developed at the RAND Corporation decades ago.

We’re talking old school. Type “inc” and it sends a numbered list of email subject lines to standard out. Type “show” and it will display the first message (in your chosen editor). You could then refile the message (with “refile”) to another mailbox, or archive it, or forward it, and so on. There are thirty-nine separate commands in the nmh toolset, with names like “scan,” “show,” “mark”, “sort,” and “repl.” On a day-to-day basis, you use maybe three or four.

I’ve been using it for months. It is — hands down — the best email program I have ever used.

Why? Because the dead simple things you need to do with mail are dead simple. Because there is no mail client in the world that is as fast. Because it never takes over your life (every time you do something, you’re immediately back at the command prompt ready to do something else). Because everything — from the mailboxes to the mail itself — is just an ordinary plain text file ready to be munged. But most of all, because you can combine the nmh commands with ordinary UNIX commands to create things that would be difficult if not impossible to do with the GUI clients.

I now have a dozen little scripts that do nifty things with mail. I have scripts that archive old mail based on highly idiosyncratic aspects of my email usage. I have scripts that perform dynamic search queries based on analysis of past subject lines. I have scripts that mail todo list items and logs based on cron strings. I have scripts that save attachments to various places based on what’s in my build files. None of these things are “features” of nmh. They’re just little scripts that I hacked together with grep, sed, awk, and the shell. And every time I write one, I feel like a genius. The whole system just delights me. I want everything in my life to work like this program.

What am I going to do on the Day of DH? I’m going to use my Patented Development Methodology (I’m calling it SA for “screwing around”) to make one small aspect of my computational universe work like nmh.

I’m going to spend the whole day writing a program called “nth.”

The nth hour (12:00:22 CDT)

nth, of course, is my “new tweet handler.” It tries to emulate the way nmh works in most respects, though it’s sort of a clean room implementation, in that I’m not poking around in the nmh code for inspiration. I’m really just observing the way nmh works and trying to imitate it. My idea is not just to create a command-line twitter client (there are a few of those out there), but one that embraces the UNIX philosophy.

I know I’ll have hit that mark when it becomes trivially easy to do a negative filter on a hash tag. In other words, you should be able to take my system and easily configure it so that you can tell it not to put tweets bearing a particular hashtag (for a conference you don’t care about, say) into your main timeline. But here’s the thing: I don’t want to build that functionality into the system explicitly. I want it to flow naturally as a consequence of the way the tool integrates with other UNIX tools.

I started working on it a few days ago, because I didn’t want to spend the entire Day of DH working through silly problems. Turns out that was a good call, because OAuth authentication is a total pain. Development started moving along briskly last night. In fact, you can watch the outbound messages by following srtestsream. (But don’t. It’s boring. I have the stream embedded in this page if you’d like to see the goofy things I put in test messages.)

Did I mention I’m writing it in Clojure? I am. Actually, this exercise is mostly about learning Clojure (even though I do desperately want a twitter client that doesn’t irritate me).

More tomorrow. Happy Day of DH!

Clojure

So, I’m planning on spending the whole day writing code. I already feel guilty. I have a grant deadline in a few days, a merit review file that’s probably a month late, and various other commitments. But honestly, I’ve spent my entire spring break working on stuff that would bore the paint off a wall. I would really like to do something fun.

I was skeptical of this whole Day of DH thing when Geoffrey Rockwell proposed it last year. But Geoffrey is adorable, as we all know, and so how could I refuse?

I’m still not convinced that “autoethnography” is a useful thing (or a thing at all), but last year’s Day of DH was — to my astonishment — a total blast. I feel like what we’re really doing here is having a virtual flash mob that celebrates our collective endeavor. I’ll admit it: I’ve been looking forward to it for weeks.

I mentioned in the last post that I’m writing code in Clojure. Let me say a few words about that.

Lisp is not only the best language I’ve ever used, but the best language I ever hope to use. But I don’t use it. And I don’t use it, because even though it is the greatest language ever, it always manages to break my heart in some way. Usually, the moment of betrayal happens with some library. For example, I work with XML a lot, and so I really need a good XML library. All of the major Lisp implementations (whether Common Lisp or Scheme) have XML libraries. At first, it seems like the best way you could possibly do it. Most of them translate XML into s-expressions. It’s like the music of the spheres.

But then you quickly discover that these libraries are (a) poorly documented or (b) impossibly slow or (c) not really complete. It’s that last one that’s the killer, and it happens no matter which type of Lisp I run. Even Chicken — a remarkably practical implementation with a lot of good UNIX integration — broke my heart in the end.

But I’m back. This time with Clojure (a dialect of Lisp that runs on the JVM). Honestly, this might be the one.

I think that not because it does concurrency really well, or because it has multimethods, or because it has the most beautiful data abstraction I’ve ever seen (sequences), but because of how things went when I tried to get things going with a library.

nth is going to be a twitter client, and so naturally, I went to look for twitter library. Big fail. I spent two days trying to get someone’s client library working, and I just could not do it. The library seems like a work in progress. There’s very little documentation, very few examples out in the wild — in other words, it feels like your average Lisp library.

But the thing is, Clojure can use any Java library. Any Java library. Any of them.

(I’ll pause here so you can contemplate the idea of a Lisp that has a complete, working library for anything you could possibly want to do with a computer).

So, I downloaded twitter4j, and started writing. It works! It works like a charm. But more than that, it works in a way that doesn’t make you feel like you’re calling Java from Clojure. It all just feels like Lisp. Here, for example, is a little function I wrote a few days ago for serializing authentication objects in nth:


(defn serialize [object filename]
(let [os (new FileOutputStream filename)]
(with-open [oo (new ObjectOutputStream os)]
(.writeObject oo object))))

This is mostly Java, really, but it’s not at all like calling Java from, say, JRuby. There’s no big signpost saying, “Now we’re calling out to a Java library.”) A “new” here, a dot there. That’s it.

There are lots of reasons to be wild about Clojure (particularly if you are, like me, a smug lisp weenie). But as we all know, languages don’t succeed or fail based on whether they have immutable objects, software transactional memory, first-class functions, or any other language buzzword. They succeed when hard-hatted hackers can parse XML, talk to databases, munge files, and whatever else they need to do without having to write a theorem prover.

Clojure — finally — gets it.

Code (10:40:33 CDT)

I’ve popped some code into a paste bin.

Specifically, auth.clj and tweet.clj.

Morning Papers

Here are the sites I visit just about every day:

The Lisp Moment

Just had one of those Lisp moments where you say: “I know! I’ll create a struct in which all the values are actually lexical closures with function names that are dynamically created based on examination of the symbols used in the parameterized object.”

Time to go do something else.

Winding Down (but still going)

Herding Cats

Seriously, that’s like the worst wine you will ever have. It says it has notes of “green apple and fig.” I think it’s more like “compost and silage.”

The Day of DH is winding down, and I have to say, I feel very fortunate to have so many friends in this field. And the people I don’t know just seem so completely fascinating. Tweetup at my house, y’all.

And just remember . . .

Algorithms are Thoughts

twinc!

Well, I wrote some more. Code is here.

It’s called “twinc” (on analogy with the nmh command “inc”), and it produces output that is almost identical:

What do I think of Clojure after a solid day of hacking in it?

I think it’s the most fun you can have with Java, certainly, and it has a lot of the same feel as Common Lisp.  But really, I think I like it better than CL.  It’s a little hard to judge, because I’m mostly calling out to a Java library (and I’m not sure I’m always using the proper idiom).  Overall, though, it’s just a blast to program.

There’s something about coding Lisp.  Even though the code is often extremely dense, the unified syntax doesn’t punish you for writing this way.  I don’t get lost in it, even though if I tried to write like this in C++ or Java, I wouldn’t be able to read my own code twenty minutes later.

I have greatly enjoyed the Day of DH, and I look forward to digging into everyone’s blogs in the coming weeks.  Thanks to everyone at the University of Alberta for another good year!