Category Archives: programming

Can’t Think

I’m struck by how visceral my reaction to this is:

For my part, I don’t see any reason why the Lisp community should constrain itself to having exactly one defsystem facility.

It misses the heart of the problem so completely that it unintentionally makes it worse.

cl-ppcre as a tokenizer

CL-PPCRE is a Lisp clone of the perl regular expression subsystem (i gather it’s even faster).  I  love it because lots of things I used to do in Perl or AWK can I now do in Lisp.  Each language has it’s coding tricks and there are lot for Perl’s regular expressions.  The Perl regex manpage is 30 pages, and the FAQ is 20 pages.  That’s a lot of clever, and double or triple is once you recall how concise Perl is!  I often is use all this to build lexical analysiers using regular expressions to nibble tokens off string. (One variation on that is outlined in the Perl FAQ.)

Both Cl-PPCRE and Perl have two notations for their regular expressions.  The concise one you’ve certainly used; and a much more verbose one.  For example you might write “alpha|beta” to match either of the two strings in the concise form, and in CL-PPCRE’s long form you’d write (:alternation “alpha” “beta”).  In Lisp you might write something like: (setf (parse-tree-synomym :tokens) ‘(:sequence :start-anchor (:alternation :begin :end :if …))) and then nibble off tokens by doing (scan :tokens program-text :start start).

The only problem with that is you lost track of what token was recognized.  My trick for that is to define the individual tokens so when they are matched the set a variable in the dynamic extent established for tokenizing.  Along these lines:  (setf (parse-tree-synonym :if) (:sequence “if” (:function set-token-kind-to-if)). The :function form is used to do call outs in the midst of the pattern matching.  It is the analog of Perl’s (?{ code }) construct.

That of course all get’s wrapped up in macros so it reads nicely, e.g: (deftoken :constant (“\\\d+”) (parse-integer (token-text))).

You can juice this up by having parse-tree-synonyms that keep track of the line number in the file your parsing.

If you get the details right this can be very efficent; both in time to code up and at runtime.  When combined with CL-YACC you can whip-up a parser for yet another language in a few hours.  Yesterday I built one for XDR.  Here is a simplified example.

What you say?

I believe it was Ray Kurzweil, circa 1989, who advised encouraging a private jargon inside your new company.  I remember because that was just about the time I was starting to think open would totally trump closed in our industry.  The advise seemed to my ears a bit old fashion.  But at the same time I suspected he meant that it was a good way to tighten the bonds inside the team.  By then I knew enough about cults to recognise that’s common inside cults; it supports two of the keys to running a good cult – information control, and thought stopping processes.

But don’t take any of that too seriously.  It is good advise none the less.  I’ll admit to being a Whorfarian, the words you highlight effect your thinking.

These days I’d add to all that though.  Language is the ultimate exchange standard.  So when you decide to innovate a new private language your cutting your self off and creating friction or trade barriers with your outside partners.    Importantly the advantage a new group has is that they can pick and choose what to emphasis.  They can take a run at leveraging some particular competitive advantage.  As Dave Winer says: You can’t win by zigging when he zigs. You have to zag to beat him.  Ray’s advise can be viewed as a bit of implementation advise for that.

So it was with some interest that I saw Google revealing their in house standard for serailizing data.  It’s not hard to see that Protocol Buffers are alternative to XML.  And it is amusing to, at least to me, to think that they did this in the hope of reducing the frictions that occur when they must translate from their in house argot into the dialects used by the outside world.  It’s fun to note that if your start up is as successful as Google you get to promulgate your private jargon.  It is one of the spoils of war.  You can push that friction into your compliments, make them pay the switching costs.

Protocol buffers aren’t anything special: messages are lists of key value pairs, keys can repeat, a small set of value types; including unicode strings and recursively other messages. They are very practical, close to the metal.  Choices were made and they are what they are.  They are quite dense, and easy to parse.  Many messages can be serialized in one pass.  In classic run length encoding style nested structures have their size mentioned in their header.  That makes emitting one pass serialization hard.

Given an array of bytes it wouldn’t be child play to guess that your holding a protocol buffer, you could do it huristically but it would still be a guess.    You need a protocol buffer’s declaration to parse it.  For example you absent the declaration you can’t know if a you’ve got a sint32 or an int64, etc.  All that disappointed me.  It disapointed my inner archivist and the inner peeping tom (who has often debugged tough problems by watching the bytes fly by on the wire).

There is a nice detail that allows optional keys which in turn makes it somewhat easier to add new message variants.  With luck the old message handlers can just ignore the additions.  It made me smile to note that mechanism can be used to pad messages; which in turn makes it more likely that you can serialize in a single pass.

There is another nice detail that allows a key to appear more than once in spite of the metadata saying it is single valued.  The last occurance wins.  This lets you implement a kind of inheritance/defaulting.  For example if your implementing CSS style sheets you read the default style message, and then read the variations from the default, and your ready to go.  They call that merging.

Given the declarative information for a given protocol buffer it’s not hard to convert it to XML or what every else you like.  The observers and archivists will just have to be careful to not loose that metadata; and some of them will no doubt build clever hueristic to cobble together substitute metadata.  Interestingly, inspite of efforts to the contrary, you can’t really work with XML without additional metadata to help.  And, that stuff is horribly complex.

As I like to emphasis what really matters with an exchange standard is how many transactions per second move over it.  No doubt this one has critical mass at least inside the Google cloud.  What matters here for how important this standard might be is how much adoption by non-Google actors.      But, I suspect we will be speaking this dialect more often and for quite a while.  Of course, the rudest way to say that is that we will be chasing their tail lights.  But I don’t really feel that way since I’ve never particularly liked XML and I welcome a substitute.

Better data, fewer customers

Here’s an amusing business tactic: lock out the customers. You know the drill. You visit the website, log-in, and the vendor inserts an extra page forcing you to provide your missing zip code, or what ever. “Our data quality is more important than your time.”

So this company, a health club, locked their patrons out of the club. If they went to the desk they were told that their account was missing their email address.

It’s always good to keep things neat and tidy, but it’s impossible to get across how much this kind of thing drives customers away. Particularly because it drives off the most lightly connected customers. The ones that are hardest to model. The ones you desperately need going forward. There is probably some deep design principle here. You need to design the system to maximize the amount of chaos in your data. Homogeneity is a false God.

Meanwhile, if you enjoy keeping account data tidy you might want to drop by the Useless Account web site. Sign up! Edit your account profile to your hearts content.

Conditions in Common Lisp

Dan Weinreb wrote a long blog post recently outlining what Common Lisp’s “Conditions (exceptions) are really about.”  He argues, and I won’t disagree, that they provide a clean, even elegant, way to handle all the long tail of situations that arise when your trying to be fastidious about the contract for your functions.  One of the things that used to be unique about Lisp was it’s elegant support for unwinding the stack when odd things forced you to abandon what ever your doing.  It was a lot easier to see to it that the printer handle was relinquished, the file closed, the gate shut in Lisp than it was in Fortran or C.  These days most languages are better about admitting that these things happen and something should be in the language to help.

Interestingly though the Common Lisp condition system remains a much more elegant thing than you find in other languages.  I don’t really understand why.  The condition system doesn’t just provide a way to unwind the stack, passing along an object instance to describe what ‘exceptional’ condition arose.  It also allows the handler for these conditions to resolve them.  In effect this provides an elegant way to implement call backs, i.e. that situation where a low level body of code needs to ask the higher, or later written, code how to resolve a problem.  For example you can use the condition system to implement the classic call back scenario where a GUI widget wants to provide it’s users with the ability to do additional checking on user input.

There’s more, but what I really wanted to get off my chest (and that’s what a blog’s for eh?) is that I sometimes I like to muse about raising a condition is yet another example of how complex function calling really is.  Back in the day we used to be able to figure out what function would implement a call site at link time.  Later we got used to having dynamic linking do some of that, and it was common in memory constrained systems to load and flush code on the fly under the covers when in the function call.  I’ve seen systems where you switched interpreters in the function call for performance and space trade off reasons.  These days people are used to having a complex method dispatch that puzzles out what method will will take care of the request.  The method dispatch is a messy beast when you allow new classes to be loaded.  It was with some pleasure that I first read the mythical man month’s description of how some ornate linker came to be obsoleted; since at the time I was struggling with how loading compiled lisp files was an amazingly disruptive event; since it upset all your method tables.

Generally method dispatch is done by searches up stream in the acyclic directed graph of types.  I.e. the behavior of the function is inherited from the class hierarchy.  The class hierarchy is not the only hierarchy in most programs.  For example there is almost always a containment hierarchy, i.e. the OK button is inside this dialog; or for example this file write is taking place inside the task of writing the log, inside the task of handling this web request.

At one point in my life I was quite enthusiastic about the idea of language designs that might allow these other hierarchies more respect.  During that period I built a few class systems where behavior was inherited first from the containment hierarchy and only secondarily from the class hierarchy.

What I like about conditions is how when a condition is raised and the system goes off seeking a handler for that condition – it’s just another variant of method dispatch – and the call hierarchy is were we search for the handler.  For me the fact that these calls often don’t return; or return in such unusual ways is part of what’s delightful about the condition system.

Google App Engine

milkwrecked.jpgFinally Google is revealing some of the means they will provide to allow third parties to run code closer to their assets. Last week was good for Python! Since, we learned that Google’s leading language for developers will be Python. While they assert that the platform is language neutral it certainly looks like only Google can “harden” and deploy a new language. The crowd demanded both an unknown language and Perl, the presenter flinched. That is going to be very rough on the long tail of languages.

I found the 1st and 2nd segments of their announcement videos interesting. Everybody got very nice hair cuts.

There is a lot of value in this offering and the “you don’t have too” theme that runs through out their patter is a strong way to present that. They really are taking a lot of pain out the developer’s job! So while they are probably encouraging the die off of a lot of interesting languages they are enabling a lot of tiny applications.

Their persistent store is cute (note you can’t write to the file system otherwise). I suspect this is a lock-in point, though the semantics of the store look reasonably simple.
In addition to writing to the file system, no pure TCP/IP interfaces, and no threads. Which reflects the nature of the underlying computer. Amusingly this is a non-preemptive computing model, just like the old Macintosh. You have to do all your work triggered by http requests, and those can’t run too long.

I’m fascinated by my emotional reaction to all this. While I’ve rationally seen this coming for a very long time. In fact I’m surprised it’s taken so long! It makes me sad to see the old craft skills become less necessary. It makes me excited to see the opportunities this opens up; it’s always fun with a large population is empowered to author software. That’s one of the big-hard-problems, letting more people be authors.

That kind of somewhat discombobulated emotional reaction in an expert is a good sign in a platform offering. This is going to be big and it is a nice counter point to Amazon’s approach.

Echonest

boomfab.pngWrote small Common Lisp library for accessing the echonest music analysis API. All the other APIs are jealous.
Writing this was fun because I got to use Relax NG for the first time in actual code. By inspecting the XML that comes back from echonest I guessed at what the schema was and wrote it up in the compact form of RNG (see here). Isn’t that easy to read!

Inserting the library cxml-rng for validating your XML given a Relax NG schema and inserted it into the SAX pipeline took less than a half dozen tokens (see here). My reward: is that the handler that loads the XML into a useful data structure doesn’t need to be as paranoid. How relaxing!

clbuild cl-xmpp ejabberd

Assorted brush clearing.

Adding additional libraries to a clbuild setup is easy. Just add lines to your wnpp-projects file. For example:

cl-json get_darcs http://common-lisp.net/project/cl-json/darcs/cl-json

Give it a try and add additional things until everything you need is included. I’ll admit to editing the dependencies file by hand.

Sooner or later you get to things for which this doesn’t work. You can teach clbuild about additional projects by just adding them to the source directory and then using the register-asd command on that project. Given that your stuck doing that you will need to tell clbuild not to attempt to download them. You the command called skip marks the project to prevent any attempt to download.

That scenario arises if you want to use cl-xmpp, which depends on cl-sasl. Cl-sasl uses yet another source control system, GNU arch, which clbuild doesn’t handle yet. Clbuild is really pretty awesome at dealing with the excessive diversity of source control systems members of the Lisp community have embraced. We do love our exceptions.

I did try using get_tarball, which is deprecated, but that didn’t work out. Possibly because the tarball filename is cl-sasl_0.3; and that code appears to want it to be cl-sasl. I didn’t look further.

Cl-xmpp did not work, out of the box, with my ejabberd. That problem appears to be explained in this archive of the development mailing list. The resolution involves passing :mechanism :sasl-digest-md5 to xmpp:auth, but it also requires a patch to ejabberd to improve it’s xml conformance. I haven’t made the ejabberd patch yet, or maybe I need to upgrade ejabberd.

Ben Laurie recently observed “people don’t really talk much about the experience of writing and debugging code.” I suspect if that void were filled we would see many dull postings like the above; or more likely postings like the ones Jason S. Cornez makes in the archive above.

This posting is, in part, triggered by Ben’s comment. But it’s also provides a checkpoint. I need to context switch to other activities at this point. I certainly didn’t expect to devote quite so much time to this when I added this to my wnpp-projects file Saturday afternoon.

cl-xmpp get_cvs_clnet

Update: ejabberd is probably doing the right thing (see here)… Hm, looks like the code forked and a better version is to be had as so:

cl-xmpp get_git http://www.lichteblau.com/git/cl-xmpp.git/

Maybe I should always read the entire transcript of the development mailing list before I start anything.

Secret Project Killers

I’ve been musing recently that there are a few words which you can use to kill most any software project:

  • Safe – As in “What could possibly go wrong?”
  • Scalable – As in, “How do you plan to make that approach scale?”
  • Social – “Shouldn’t the account management include a social network?”
  • Socialize – As in “But first, let’s socialize this idea in the organization.”
  • Semantic
  • Sustainable – “I don’t understand your business model.”
  • Simple
  • Standard

I needed to post this so I stop seeking more S words to add to the list.