Category Archives: programming

Data Dealer

Oh what fun, I bought a scanner!

Now I can insert hand drawings in my blog postings. This is a picture of yesterday’s fun idea; an application that runs along side your Browser which does match making for data exchange on your behalf.

This is the User is the happy face.

  1. The user browses to a site that provides data (The drawing should say Browser rather than Browse).
  2. The user downloads a description of the data offered by the site; that is handed off to the helper application “Data Deal Maker.”
  3. Later the user visits a site that consumes data.
  4. He down loads a description of that site’s data consuming interests which is again handed off to the Data Deal Maker.
  5. At this, or some later point, the data deal maker notices the chance to make a deal between the providing and the consuming site. A deal that presumably might make the user’s life better. It offers the deal to the user.
  6. The happy user approves the deal. I don’t show this but the data deal maker then introduces the sites to each other.
  7. The two sites proceed to exchange data on behalf of the user. The angles sing.

In the pedestrian case of subscribing to a blog the data providing site is the blog and the data consuming site is the user’s blog aggreagtor (a web site or a local program, doesn’t matter).

I like love how this keeps the user in the middle and how the data deal maker becomes the user’s personal middleman. That goes nicely with the details about how I bought this scanner; which are outrageous.

Professional Programmer – huh?

Patrick Logan writes:

Agreed…

We’re not so much building on the programming state of the art as continually have each generation of programmers rediscover it.
Bill de hOra

This old fart agrees too. A far more interesting question: why.  Why hasn’t a core of professional knowledge emerged in this industry?  Isn’t it normal, even natural, for a craft to transition into a profession?  Why am I not a member of the Association of Computing Machinery?  My friends aren’t either.  Why when hiring we have a strong preference for people who have built things rather people who are well certified?  Why do project managers see little, if any, value in having a few doctors of Computer Science on their teams?

I see three reasons for the absence of a professional class: fast growth, a culture of anti-professionalism, and competing institutions. I’m sure there are others reasons.  I’m sure that at this point I wouldn’t pick one of these as dominate.

This outcome is not  necessarily  a bad ting.  The craft much more egalitarian than most highly technical crafts. It’s easier to get into this field.  Training barrier is lower.  The tools tend to be simple.  They have to be.  I see forces in play which keep it that way.

Fast growth has meant the demand for skilled craftsmen, tools, and knowledge has continually outstripped the supply. The rapidly expanding frontier of the industry continually creates a new frontier where amateurs can achieve huge success.  In this situations it’s much more important to get there and build something than it is to build it well. In new markets the quantity of your customer relationships always dominates the quality of your technical execution.  The fresh frontiers plus scarce labor creates a demand for simpler tools.

Anti-professionalism – man you could write a whole book about this! The mythology of the hacker, open source, the American cowboy. Libertarianism, the 60’s youth culture. etc. etc. But possibly I can say something a bit new. The scarcity of skill results in loose social networks.  On the frontier everybody is new in town.  So the fabric, the social networks, that interconnect the craftsmen are thin.  But, new technology – network based social interaction tools – have enabled much to compensate for that.  One of the theories about the function of a profession is that it act to create knowledge pools.  The network’s social tools have allowed knowledge pooling inspite of thin social networks.  This is new and might well cause other professional networks to erode as they are less necessary.

Another story people tell about professions is that they are a form of union, which naturally leads to realizing that any profession competes against it’s complementary institutions.  Other institutions in high-tech would like to be the source of legitimization in the computing industry.  This is a pattern I first noticed in the Medical profession. Medical doctors managed over the course of the 20th century to gain  hegemony  over their industry. Today that control is falling apart as other players – insurance companies, drug companies, etc. etc. are competing to take control of the huge amounts of money in flux. Today my HMO sees to it that a person who’s not even a nurse does any minor surgery. In high-tech large vendors play a similar game; and they don’t have to bother to compete with a existing strong profession.  Microsoft, Oracle, Sun, etc. etc. all provide certification programs that substitute for the legitimacy of the professional society working in tandem with universities.

Some of this I think is bad; but other aspects of it are great. It’s very bad for the respect and income that highly skilled practitioners can command. While it certainly holds back the median level of skill – it appears to entrain a larger pool of practitioners.  We get a longer tail.  And, as open source projects demonstrate, we are getting better at aggregating knowledge from an extended tail.

Mostly I think it is great that we remain a craft that sports a reasonably low barrier to entry. It makes my coworkers a more interesting diverse lot.  I think it’s healthy to keep the problem solving closer to the problems. Down in the mud not up in the ivory tower.

It is healthy that the righteous prideful status riddled behaviors of most professions are somewhat more rare in this line of work.

Scarecrow: I haven’t got a brain… only straw.

Dorothy: How can you talk if you haven’t got a brain?

Scarecrow: I don’t know… But some people without brains do an awful lot of talking… don’t they?

… much later …

Wizard of Oz: Why, anybody can have a brain. That’s a very mediocre commodity. Every pusillanimous creature that crawls on the Earth or slinks through slimy seas has a brain. Back where I come from, we have universities, seats of great learning, where men go to become great thinkers. And when they come out, they think deep thoughts and with no more brains than you have. But they have one thing you haven’t got: a diploma.

the calculus of distributed tag space

Running a bit further along the ideas in this posting about revealing a small programming language over your web site’s data.

Today I’m thinking that this is an interesting variation on the idea of a data exchange standard; something much richer.

Consider this URL that assembles a page from the union of URI tagged with a single token at both del.icio.us and at flickr.


     http://oddiophile.com/taggregator/index.php?tag=market (try it)

Or the same idea as executed by Technorati.


     http://www.technorati.com/tag/market (try it)

In both cases the ceiling on what’ possible isn’t very high because the query language available is limited to “What URI does site FOO have with tag BAR.” Clearly a richer query language is desirable. One that let you make queries like: “What URI does site S1 and S2 have with tags T1, T2, but not T3 that site S4 doesn’t tagged with T7 excluding any tags made by the group untrusted actors or commercial actors.” Rich queries demands more consensus about the data model and the operations upon it.

The current bloom of fun illustrated by the two examples above stand on a a very small consensus, i.e. “Yeah let’s mark up URI with single token tags!” A slightly larger consensus would enable a larger bloom.

A small consensus, like the tags, is a lot easier to achieve. It is less likely to fall victim to the IP tar pit. It’s maximally likely to be easy for the N sites to adopt and rendezvous around. It allows one site to set an example (as I’d argue del.icio.us did with tagging) so that the other sites can mimic the behavior.

I wonder if the idea of a simple query language along the lines of the one my earlier posting could enable that. Interesting design problem.

URL micro languages.

Silly, fun, idea from a sleepless moment last night:

This really doesn’t have anything to do with del.icio.us, but I’ll use it as an example. Consider this URL from del.icio.us:


   http://del.icio.us/tag/forth+language (try it?)

That URL returns a list of the links bookmarked at del.icio.us which have been tagged with both “forth” and “language” by one of the users.

The phrase “forth+language” in there is pretty much the extent of the query language provided by delicious (well that’s not really fair).

Meanwhile I’ve been shopping for a calculator and remembering with fondness an old programable stack based HP calculator.

So that’s the silly idea. Why not support a stack based query language for systems like delicious.


   all          -- push's the set of all bookmarks
   "forth"      -- push the tag "forth"
   query        -- pop 2, push set of forth bookmarks
   "language"   -- push the tag "language"
   query        -- pop 2, push final set.

We have two operators used there, all and query. All pushes the set of all bookmarks. Query takes a tag add a set of bookmarks off the stack and returns a subset of those bookmarks that contain that bookmark.

We might pack that script into a URL, using underscore to denote tag literals.


    http://del.icio.us/do/all/_forth/query/_language/query

The fun thing is adding more operators on the bookmark sets. Operations for popularity, user, intersection, etc. etc. Operations that configure the output formating would be another class of fun. Like all good micro-languages the key is to stand on some really big data types, in this case that’s data type is set-of-bookmarks, since that allows a small set of operators to do big useful things.

Systems, like this one, that reveal a micro-language for the use of an untrusted population have abuse problems. Lots of schemes for managing that risk exist. For example: limiting the resources you provide to a given query or actor. Preflight analysis of the program etc.

Of course if you micro-language is sufficently micro you can write optimizers and lazy evaluators and other fun stuff.

So the fun idea here is to reveal tiny micro-language APIs to web sites using in a style like this.

Dynamic Typing

I have been reading with sympathy Patrick Logan’s blog where in he is fighting the good fight for more dynamic languages against the righteous forces of less dynamic languages. It’s a fight that many in my generation fought, and lost. I’ve never buried the hatchet. It’s in the basement someplace. The outcome of the last round wasn’t in was anybody’s best interests. It’s good to see that people are still spending rhetorical calories attempting to fix address the problem.

At ApacheCon I had a short but high bandwidth conversation about RDF with one of my friends. First we quickly toured the reasons we like RDF and then we had an equally fund tour of things that frustrate us about it. One of the frustrating things has been rattling around in my head ever since.

RDF is strangely type-less. This is good and bad. For example let’s say your collecting metadata about URI. You collect a lump of data about each URI; size, format, time of last update, etc. etc. Good fun is had. Time passes and you discover another source of data about URI; his data model is different. RDF is very helpful at this point. His lumps of data and your lumps of data can mix and match casually in a slurry of assertions about the URIs. All kinds of conflicts between the data models are acceptable. Both of you may have a bit of data about last update; both of you don’t have full knowledge about the last update time – you only have approximations, you only have partial data, in some cases he’s last update time and yours are different. You used different time formats and while you were very fastidious about describing the characteristics of your clock he never thought about that so you really don’t know if when he’s data says 11:13AM is +/- 30 seconds or +/- a day.

RDF allows your program to take both sets of information and pour them into your system and get back to work. It’s very very tolerant of the kind of models that appear in real world problem solving.

This is good, because it’s real.

But it drives people crazy! The flexible nature of these models makes them very subtle to reason about. Many of the exception in the data model propagates into exceptions in the code that chews on that data. The programmer, and his even his clients, needs to learn a set of practices that on the one hand leverages the power of a model that is so deeply informed by the nature of real work problems and on the other hand is respectful of the soft and porous nature of the ground he’s building his system on.

One of the unspoken threads that runs thru the dynamic/strong typing debates has always been the way that the two camps are suspicious that the other camp is insane. The dynamic typing crowd suspects that the strong typing crowd is delusional about the real nature of the data they are working on; real data just isn’t particularly strongly typed. The static typing crowd thinks that the dynamic typing crowd is trying to build their house on a foundation of marshmallows.

Displacement and Common Lisp

Displacement is an economic or cultural process where by a community of practice wakes up one morning to discover that the tide of history has left it high and dry. The displaced community is not, necessarily, at fault in these stories. The archetypical example of displacement was the introduction of a new technology into northern england that displaced the tenant farmers from their land holds. In that case the displacement unfolded pretty quickly because their legal claim to their land was based on leases; so when the leases came due and the landlord’s (who wanted to convert to the new paradigm) displaced the residents. The technical innovation that displaced the farmers was a more robust breed of sheep coupled with a business fad for sheep.

In the software industry displacement happens when an existing language community – Cobol, Fortran, whatever – wakes up one morning to discover that the industries current fast growing network effect is taking place outside their community. Some communities manage to catch up by quickly piling on the tools, design patterns, etc. etc. required to play in the new world.

The Common Lisp made one mistake back in the 1980s that helped with it’s displacement. This mistake was around graphic user interfaces. When the Mac came out it redefined how graphic user interface interaction would take place. It set a standard for the interaction. This standard featured the idea of a current selection. First the user would use his mouse to accumulate the selection. He’d select the window/document to work on. He’d adjust the view to bring the thing he wished to modify into view. He’d then select that object. Only then would he browse a selection of commands to affect the object. Who knows if that’s the best design; but it certainly became the standard approach.

Meanwhile over in the older graphic user interface communities the command loop worked quite differently. For example selection was often entangled with mouse location. For example if you moved the mouse over an window or an object the object was automatically selected – but that’s only an example.

In the Common Lisp community a really unbelievably elegant user interface tool kit emerged known as CLIM (or Common Lisp Interface Manager). But this beautiful elegant thing had no concept of “the selection.” As a result it was totally irrelevant to the building of the kinds of user interface that were demanded by those working where the action was. Great ideas displaced by no particular fault of it’s own.

Microsoft had an analogous brush with displacement when the Internet broke out and desk top suddenly became marginalized. Microsoft has, historically, been better at mustering the sense of fear and panic necessary to respond to displacement events. So when it became clear they were at risk they reacted.

The Common Lisp community endures. I still use and prefer it for all kinds of tasks.

But, recently I’ve been concerned by what looks to me like a another displacement threat – character sets.

Emacs is a key complement to the Lisp community; and quite a few others. Emacs has amazing character set support, both the major variants (GNU Emacs, and XEmacs). The support is uniquely powerful. The approach emerged in a branch of emacs known as Mule and more recently has getting folded back into Emacs. All the input output streams can be configured to declare their encoding schemes. The internal strings and buffers are especially clever. The usual trick for systems having to tackle this problem is to normalize all the characters into one standard format, typically unicode. Mule’s approach is different; buffers in a mule enhanced emacs retain their character encoding. Load a file of Big5 characters and point at an arbitrary character in the string and mule emacs knows that’s a big5 character; paste a string of unicode characters into that buffer and now you have a buffer who’s characters are in assorted character set encoding in a way analogous (but with a different implementation) the way that you can have a buffer with characters in assorted fonts. That I find the mule design so cool reminds a bit of how cool I found the CLIM design; but at this point I’m feeling a bit paranoid.

So on the emacs front things are in in pretty good, even very good, shape. It’s all a bit rough around the edges though. The emac communities are still holding mule emacs at arm’s length so you often have to build the mule variant by hand. You often need to get the version of emacs that’s ahead of the stable release curve to capture the features you need. Font support is both amazing and frustrating. If your running under X then you can get a large set of international fonts and after a mess of suffering you can get your GNU emacs or xemacs to use them. One curiosity of the mule buffer design is that a character encoded in one character set may not have a font to render it only because the only font you have installed able to render that cute character happens to be laid out using a different character set. That’s a big pain on the Mac which has beautiful fonts but I can’t see how to get to them from unicode characters sets in emacs.

Over on the Lisp side of things the story is slowly resolving it’s self. There are a _lot_ of really fine commercial and open Common Lisp implementations. Each one has a slightly different story about how and when the unicode problem is getting addressed. The best unicode support in an open implementation happens to be in the slowest implementation. The implementation I’m using today (Steel Bank Common Lisp, or SBCL) has very fresh unicode support.

It’s taken me almost two weeks to get a working tool chain for this stuff. I have tried a lot of combinations and experianced a lot of crashes where both Lisp and Emacs die horrible recursive deaths choking as tried to display or transport characters down pipelines. Currently I’m running the lastest released beta version of XEmacs, build from scratch to get mule support. I’m running that under X on my Mac; so I’m using the open international X fonts. I’m running the bleeding edge versions (CVS Head) of both SBCL and Slime (the emacs< ->lisp interaction mode). [Hint: (setf slime-net-coding-system 'utf-8-unix)]

I’m happy to report that I can now stream unpredictable UTF-8 streams thru reasonably long chains of tools and it all works. Everything in my tool chain except the fonts is beta or bleeding. I’ll be really happy when I’ve got the database linkages working.

If this was 1995 I’d be less concerned about displacement; but it’s 2004. The good news is that the problem is getting solved.

Shun those robots

I added the authimage hack to my blog’s word press installation some time ago. It forces those who leave comments to pass a small test that they are human. It has been a complete success. Previously I was doing lots of hand work as well as maintaining lots of overly clever code.

Highly recommended!

Push/Pull

The web is mostly pull. For example last night when I was attempting to get a price match from a vendor via their online web based chat, the silly thing was designed to poll every 2 seconds; checking to see if they had added anything to the discussion. It didn’t work very well.

Lots of different words for the push/pull distinction. In programming languages folks talk abut lazy evaluation which is a kind of pull – rarely you even hear people talk about both eager and lazy evaluation. In rule based expert systems people talk about forward chaining and backward chaining.

Instant messaging is possibly the only major protocol that works with push; though email comes close.

Protocols are only half the issue of course. The arcs in the distributed data graph. The nodes are the other half; and there we find what in AI circles is called the “truth maintenance” problem. In source control circles (and some expert systems crowds) this is call conflict resolution. For example if two developers make conflicting changes to the shared code base then at some point that conflict will be pushed or pulled into view.

One thing I’d not seen exactly until today is how pull makes one presumption about how the conflict will spread. In a pull system the conflict sits until each of the distributed parties decides to accept the conflict for resolution. In push the conflict is dropped into the lap of the distribute party and he’s expected to juggle the resulting mess until it’s cleared up.

Systems with push or pull make differing assumptions about how the conflict resolution will get framed. Which I found interesting because in some discussions about push v.s. pull the discussion is framed more around which will be more thrifty about bandwidth or latency.

All this was triggered by thinking about Jabber and Growl (which is entirely Ted’ fault). At least in the data space things like Jabber help you get problems dropped faster into the laps of clients. It maybe that there really isn’t that much demand from clients for that; or it maybe that after you find willing clients for that kind of functionality they almost immediately run in the amazingly hard problems that all truth maintenance and conflict resolution systems encounter.

Well, that was fun, now I’ll go back to seeing if I can empty my email inbox.

Maps Links

These are cool clever animated maps showing bicycle trips. A beautiful example of innovation on open substrate. The maps like these are possible because the US census created maps for the entire nation back in 1980 and then publicly released them. Systems like Map Quest are built on that public good. Get them here, for example. The code stands on a mess of Perl modules for manipulation such things. Just a guy having fun with the oportunities all that created.

The Classic Homunculushere (zip file). I’ve not used it yet, but the command line API appears quite straight forward. Input regions and population counts for each region; run the software; cartogram regions are output. In addition you get two postscript files so you can check the input/output. Regions are described by the polygon(s).

There is really no end to things you could use that for. Some examples include cartograms of where members, employees, customers reside. Organization charts scaled by various metrics: cost, income, depreciation, education level, etc. Transaction volumes by classification in your web site, your firewall. In many of these cases the topology of the input regions doesn’t necessarily need to reflex something, but if you have something at hand you can use that.

I’d love to see a cartogram of income over the shelf space of a grocery store, or the gates of an airline, or turn over rates.

Or how about a cartogram showing the click thru numbers for various links on a web page! If you project an image thru a mapping like this you get a homunculus.

Electric promises

Isn’t there is something unnatural about a venture capitalist who’s obviously a hard-core geek?

A long time back Tim Orton posted a list of links on systems that allow delegation of rights to other parties. I’ve been meaning to get around to writing about this kind of think because while lots of people tend to know about private/public key encryption and other approaches to the identity problem these methods seem to have been largely forgotten.

These are such a nostalgia trip for me. Back in the early mid 70s at CMU I had some light involvement in the development of a multiprocessor system known as C.mmp. One unique feature of the operating system of that toy was that it managed permissions using “capabilities” rather that “access control lists”. Capabilities are a much better design but nobody has ever really managed to make them very practical.

Access control lists, or ACL, protect things by marking objects with a list of who is allowed to access. For example if the question arises is Bob allowed to update the payroll data the ACL system works by checking a list associated with that data and then checking that the person about to make the change has proved he is Bob. ACL’s are like the guest list at an exclusive party, each time another guest shows up the doorman checks their ID against the guest list before letting them in. Maintaining the guest lists is a pain.

Capabilities allows Bob to delegate the task of updating the payroll to one of his acquaintances.

In a capability system Bob carries around a set of capabilities, think of them as like one of those huge key rings that building maintenance guys carry around. When Bob wants to modify the payroll data he pulls out the right key/capability presents it and the system checks it. If Bob want’s to delegate, he can hand the key to his assistant. Keeping all these keys under control is a pain.

Real world system work with a hybrid of keys and guest lists. In some situations you present an ID card (for example to withdraw books from the public library) and in other scenarios your provided with a key (for example when you rent a car).

One of the systems Tim points to is really just too cute: i.e. a programming language called E.

Most programming languages consist of a handful of tricks. For example one of Lisp’s trick is to bring the syntax and the parse tree so very close to each other so that at program build time you can do powerful transformations of your program. One of Python’s tricks is to leverage the source text’s indenting for syntax. One of Perl’s tricks is to weave a number of powerful micro-languages together into one dense rich stew. One of Erlang’s clever tricks is designing extremely light weight distributed tasks into the heart of the language.

E seems to have three key tricks. The most minor of these is that it stands on top of the Java VM. The second trick is to fold the idea of capabilities into the language. While in a traditional programming language you can manipulate an object if you can get a pointer to it in E you need to get the capability not the pointer. All kinds of clever cryptography is built in so that these capabilities enabling them to do clever things, like only work for an interval of time.

The last clever thing, the fun one, is an approach to multitasking based on “promises”. Rather than the usual collection of processes, micro-tasks, thread, locks, semaphores, queues, event handling, and messages that most languages cobble together in various combinations to create a multitasking system we get one construct; the promise.

Here is a very amusing description of why this might be “a better way(tm)”:

Let us look at a conventional human distributed computation. Alice, the CEO of Evernet Computing, needs a new version of the budget including R&D numbers from the VP of Engineering, Bob. Alice calls Bob: “Could you get me those numbers?”

Bob jots Alice’s request on his to-do list. “Sure thing, Alice, I promise I’ll get them for you after I solve this engineering problem.”

Bob has handed Alice a promise for the answer. He has not handed her the answer. But neither Bob nor Alice sits on their hands, blocked, waiting for the resolution.

Rather, Bob continues to work his current problem. And Alice goes to Carol, the CFO: “Carol, when Bob gets those numbers, plug ’em into the spreadsheet and give me the new budget,okay?”

Carol: “No problem.” Carol writes Alice’s request on her own to-do list, but does not put it either first or last in the list. Rather, she puts it in the conditional part of the list, to be done when the condition is met–in this case, when Bob fulfills his promise.

Conceptually, Alice has handed to Carol a copy of Bob’s promise for numbers, and Carol has handed to Alice a promise for a new integrated spreadsheet. Once again, no one waits around, blocked. Carol ambles down the hall for a contract negotiation, Alice goes back to preparing for the IPO.

When Bob finishes his calculations, he signals that his promise has been fulfilled; when Carol receives the signal, she uses Bob’s fulfilled promise to fulfill her own promise; when Carol fulfills her promise, Alice gets her spreadsheet. A sophisticated distributed computation has been completed so simply that no one realizes an advanced degree in computer science should have been required.

Very cute.

While I presume that Tim’s interest in all of this is so he can make, break, keep, and delegate responsiblity for promises with greatly increased efficency I suspect that it’s really just about money.