Back in the 1970s when I first learned Unix the key idea was that you built lots of clever tools which you’d then hook up in pipelines and shell scripts to solve larger problems. Tools like grep, sed, awk where exemplars of this approach. Tools like lex and yacc (now flex and bison) made it trivial to invent new micro-languages for the problem at hand. It was common to work on systems with a dozen lexer’s and parsers. The unix macro language m4 provides another scheme for getting a micro language.
One reason I got into Lisp was that after building dozens of systems using those design patterns I began to yearn for something that would unify it all. Perl, which came later, is one example of that urge comming to bearing fruit. Two things really got me interested in Lisp; symbols and macros.
Rainer Joswig put up a video (Quicktime via BitTorrent) showing a simple example of how you can use macros in Lisp to build a micro-language. It looks like he stumbled across an article illustrating the micro-language approach and it made him sad to see how horribly complex this approach is in other languages.
The example sets up a micro-language for parsing for call records dumped from a phone switch.
You get to watch a not a-typical Lisp coding session. He grabs some example data to play with. Then he grabs the text from the article that outlines the record formats. Thru the demo that text will get reformated into the statements in the new micro-language. The largest change that happens to that text happens early because he changes all the tokens so they conform to lisp coding conventions.
He then writes a bit of code to parse one line. After testing that code he will reformat it into a Lisp macro so he can stamp out parsers for each flavor of record. This is cool part. For a nonLisp programmer the going gets a bit rough at this point. The tricks of the trade for writting macros aren’t explained and so it’s probably hard to see how the macros work. For example he uses something called “backquote” to create templates that guide the expansions. The backquote templates are denoted with a single character almost the smallest character that could be used, i.e. the single backquote. When the templates are expanded they are copied and as they are copied bits are filled into them were ever comma (and some other syntax sugar) appear.
Lots of other things are glossed over, for example that when you compile a thing like this the macros are all expanded at compile time and don’t necessarily even survive into the runtime environment.
But the final finished micro language is very small and concise so it makes a nice introduction into the kind of programming that I miss when programming in other languages.
Thanks to Zach Beane for the putting together the BitTorrent, it only took 7 minutes to download from 20+ peers in the swarm. Rainer Joswig original post is here.
Meanwhile. Can it really be true that there isn’t a version of lex/flex that handles utf-8 or 16 bit characters? That is too bizzare. It’s as if the corner stone of an entire culture of programming hasn’t made the transition to the world of international data handling.