Category Archives: programming

Diagnostic Typing

One of the dialectics in computer science is between dynamic and static typing. Dialectics are like professional wrestling. Cheap fun. But, they leads to category blindness. So let me blather a bit about a “third way” that I call diagnostic typing.

At one point in my career I spent a few years deeply committed to the static typing camp. The height of that experience was an amazing day when we successfully linked, for the first time, a huge complex program. It consisted of hundreds of thousands of parts written by numerous authors and code generators. It worked! First time! Getting to that point had required tremendous labor, since the static type checking demanded by the language we were using had forced us to fix lots of stuff that might have been left for latter. At that moment it seemed worth while that we had deferred so much immediate gratification for so long.

Late in the project we had some really amazing bugs. Bugs that took weeks and teams to fix. Fun bugs with long interesting stories to tell about each of them.

During that later period I found my self writing what I came to call diagnostic typing code. This code would work to prove a complex declaration about the nature of a data structure. These declarations were put forward by the team members. For example somebody might say “All the records of type A are in the dictionary D.” and then somebody else would say “Ah, I thought the core set of those aren’t in the dictionary.” At that point I’d go off and write some code to to check if these declarations were true. It was fun because the truth was almost always much more complex than anybody thought. The bugs were all around the edges of these.

So the dialectic between dynamic and static typing is actually a kind of layered thing; with at least with three layers. Static, Dynamic, and Diagnostic. Static type checking is done at compile time. Dynamic type checking and dispatching is done at runtime. Diagnostic type checking is done intermittently; usually in response to a demand or a fear. It was extremely valuable to become explicit about some of the declarations that had been implicit.

Diagnostic type checking can be very very expensive. That makes it a lot of fun! It lets you can write all kinds of assertions about your program that would never be practical to check or enforce at the compile or runtime. You can get out the big guns: graph theory, statistics, coverage, grammars, budgets, spelling correctors. For example: all the window components form a strongly connected component via their child/parent arcs. For example: the elements of this hash table are uniformly distributed. (As an aside I don’t think I have every found a hash table that was well behaved in the wild.)

One of my favorite examples. This isn’t just for data structures you can do this on program traces too. Back in the 1970s somebody at CMU wrote a paper about using the ideas from language grammars to declare the patterns over calls on class instances. Things like: x:foo<-create(x); {open(x); {update(x)}+; close(x)}+; destroy(x)}*. In a later life I would sometimes write code to diagnostically check statements like that by using the tracing facilities in Common Lisp.

I wrote a lot of this diagnostic typing code for the persistent store using prolog. I would dump the entire persistent store into a suite of prolog assertions and then write the diagnostic typing declaration as small prolog programs who’s execution would prove or disprove these declarations. While that found a lot of very very subtle bugs I found it more fascinating how it raised raised the level of discussion about the work.

This kind of approach will, I suspect, become more common real soon now. So much of the data sloshing around on the net is full of surprises that diagnostic typing declarations would reveal. Moving the data across organizational boundaries creates a demand for tools that can frame the discussion between the parties.

One of the reasons I’m suffering a fit of enthusiasm for RDF is how it appears to offer a normal form, much as prolog assertions did for me in my previous experience, for just this kind of problem solving.

This trio: static/dynamic/diagnostic typing are all about shifting around the work, the trust, and the gratification. Don’t overlook the gratification. There is a lot of fun to be had in diagnostic typing approaches. I doubt you can write down all the declarations about the data before it starts flowing. Why defer the fun of flow?

Collaborative model synchronization

Jim Winstead wrote a nice note on “decentralized web(site|log) update notifications and content distribution.” He plays some cards into the design game that I found fun to toy with. One of these cards the cloud. The idea being that as blogs update things (notifications of update, entire postins) are injected into the cloud. Blog readers tap into the cloud to gain access to that traffic. Signatures is another card he plays; this draws our attention to the spam problem – for example bad actors posting false notifications about other people’s blogs.

Here some commentary from the sidelines.

On the input side: Better I think if a distinction is made between asserting that Ted:actor wrote P:posting v.s. Sam:actor inserted P:posting into the cloud.

Interesting challenge: how to the avoid condensation of the cloud into a small number of players. Splitting the Sam set out from the Ted set actually encourages condensation.

Inside the cloud: How much persistent state is in the cloud and how out of synch that can become from the origin data?

One branch in the design space makes the cloud is just a notification mechanism. That’s tempting and seems at like the right choice. Understanding if it’s a workable choice demands that we understand what the recipients of these notifications are doing with the data.

Each client Carol of the output of the cloud is maintaining a model of N data sources. After service interruptions the clients have to bring their model back into synch. They might just go poll the original sources, or they might patiently wait for the cloud to send them additional notifications. They can also work with other players who also have models, e.g. their peers Cathy or specialized caching middlemen Charlie.

The real design problem here is “collaborative model synchronization.” The service interruptions are what allows the models to get out of synch. Some of those service interruptions are apparent – i.e. you were off line – some of them aren’t – i.e. a unknown portion of the cloud was offline.

Watch dog timers are the usual means by which a client becomes suspicious that a service interruption has taken place in some unknown portion of the cloud. That implies a background of dogs barking. Who signs these barks? You want the dogs as close to the edge as possible, i.e. outside the cloud. Sam obviously barks. But what about Cathy and Charlie? If we are collaborating with them in our model synchronization they clearly need to bark as well. When they fall silent we can then go seek other model maintenance collaborators.

Two other issues come to mind. How critical is is to merge multiple atomic updates into single messages. How much privacy can we offer the cloud’s listeners? If I subscribe into the cloud for updates I’m revealing too much information about my reading habits. To whom? Cathy and Charlie at least and of course to the cloud P2P presence scheme. Both these issues will need work.

In once sense this isn’t a new problem. The real time control industry has lots of design patterns for how to distribute and keep models in synch across the many machines that are doing the control. Expert systems are full of analogous design patterns for inferring models via backward and forward chaining. The scale, privacy, spam, and reputation issues add spice to this version of those problem.

I often use the term “pedigree” when discussing problems of this kind. Because in many cases the data at hand traveled. It passed thru various hands. It underwent some number of transformations. It took some time to get here. I expect we will see an explosion of scrapping over the next few years. Well maybe we shouldn’t call it scraping, maybe we should call it automated data representation transforming. Signatures help to remove some of the reasons you need a pedigree, but not all of them. If your data about Ted’s site came via Sam, thru the cloude, via’s Cathy cache’d model your are happy that Ted signed the posting. But to know why how it came to be four hours late you want at least Sam and Cathy to extend the pedigree.

Fun problem.

Hm, interesting I wrote about this almost exactly a year ago.

Versions of the Data Cloud

Dump Translation Unit

I was inspired by:

   Reflectance from C/C++ code (pdf)
   Duncan Temple Lang
   Statistics and Data Mining Research,
   Bell Labs, Lucent Technologies January 15, 2003

Fun things based on leveraging the GCC compiler’s willingness to dump what it’s compiling after it’s done the parsing etc. So I wasted a few hours writing this hack.

Common Lisp is fun; for example this code that takes things like this xpath mimic “//function-decl/name/strn” and converts it into this SEXP ((:class tu-classes::tu-function-decl) (:attribute tu-classes::tu-name) (:attribute tu-classes::tu-strn)).


(defun parse-path (path-string)
  (let ((path-from-string
	  (with-perl-style (path-string)
            (sg "//" ")(:class tu-")
	    (sg "/" ") (:attribute tu-")
	    (s "$" ")")
	    (s "^)" "")
	    (s "(.*)" "(1)"))))
    (let ((*package* (find-package "TU-CLASSES")))
      (read-from-string path-from-string))))

Then later i can write things like (do-path (x "//function-decl/name/strn") (print x)) to print the name of the functions declared in what ever GCC was asked to compile.

Some things about Lisp have gotten better since I was last playing with it. For example with-perl-style is a little macro I wrote that depends on CL-PPRE. So now we have a solid regular expression package that’s Lisp native. Installing things extentions is easier too. Installing CL-PPRE? Type: (asdf-install:install ‘CL-PPRE).

Darwin Port Packages

Of interest to Mac owning geeks; Open Darwin Ports has a set of Mac OS X install packages for most, but not all, their ports. They are just short of 15 hundred of these. Lots of fun esoteric stuff; elang, maxima, openmcl. All the misc. libraries for python, java, etc. etc.

Until yesterday I was mounting their webdav server via the finder when I wanted stuff; but that is often really slow.

Here’s an example of doing the same kind of thing from the command line.

   mkdir mpoint
   mount_webdav packages.opendarwin.org/ mpoint
   cd mpoint
   find . -mindepth 1 -maxdepth 2 -name '*.mpkg' > package_list.txt
   umount mpoint
   rmdir mpoint

That gives you a list all the available packages.

Here’s an example of that installs aquaterm. That’s found in the ports that are grouped under the name aqua; i.e. the
things that require Mac OS X’s aqua.

   mkdir mpoint
   mount_webdav packages.opendarwin.org/ mpoint
   /Applications/Utilities/Installer.app/Contents/MacOS/Installer mpoint/aqua/aquaterm-0.3.2.mpkg
   umount mpoint
   rmdir mpoint

That long overly clever way of firing up the Installer has the benefit that if that was a script the umount happen after the installer exits.

Sticky, it’s not just data.

I underestimated how sticky Moveable Type is.

Vendors love things that make their product sticky. If developers really appreciated this software products would be even more sticky. Instead developers hate sticky; they call it things like “backward compatiblity” or “legacy.” Maintaining the sticky bits is a pain. Platform developers have the worst of it because the software that stands on the platforms was written by very very clever dudes who find and depend on every curiosity of your API. The software those clever guys write is extremely brittle. The platform’s vendor has to work very very hard to maintain every bizzare detail.

When I switched to Word Press my web server logs suddenly blossomed in a torrent of broken links. I’d arranged to reroute the obvious links before the switch over. But, it turns out that my site’s users are as devious about finding interfaces into my blog as platform developers. It looks a bit like every URL that you could possible generate for reaching into the blog was used by somebody. This was particularly hairy for the various subscription feeds. I notice that a lot of subscription readers aren’t particularly interested in paying attention when my server notifies them that a resource has moved. So now I’m serving up the subscription feeds from the old locations. I wonder how many subscribers I lost during the service interuption?

The backward compatiblity breakage that I didn’t see comming was with Google. All my page names changed to something new and so Google’s model of what’s on my pages evaporated. All my Google ads suddenly became extremely lame. Imagine how sticky things would be if you depended on the revenue from such ads.

The third interface where backward compatiblity is turning out to be very rough is the blog author user interface. I don’t mind switching to a new user interface. Somethings are better, some are worse. But what about the other folks? The folks who’s blogs I host. I think they are going to hate it. In general they get to suffer the cost of changing but for them the benefit of the switch over is pretty obscure. In particular the photo upload in Word Press is much more tedious. Imagine if those people were paying me for their blog hosting?

I used to think that the #1 thing to worry about in buying software was that I would be able to rescue my data, retire the software, and adopt something else. Apparently in the modern world software embedded in the marvalously messy open platform that is the Internet it’s much more complex. We are all platform vendors now.

Movable Type -> WordPress

With luck I have now switched my blog from Movable Type to WordPress. I had taken a run at this some time ago and then decided that if I waited I could freeload on the work of others who would be making the transition easier, but a few months passed and it’s just as hard.

It’s not hard, it’s just tedious. Particularly getting my old links so they still mostly work. Reproducing my old formating; plus the usual minor upgrading, and getting all the mysql support working in the staging area and in the live site. Just work. I really like PHP My Admin for managing mySQL!

Now I can start discovering all the things that are broken. If you notice anything please leave a comment.

Curiously I discover I have about twenty posts I never actually published but left in draft form.

Tasty Redux

New improved version of the Tasty? bookmarklet!

Instead of bouncing off my server this one bounces off the del.icio.us server using a newly revealed (at least to me) mechanism. This is better, you’ll only be revealing your curiosity to del.icio.us rather than to both me and del.icio.us.

Drag this to your bookmark bar, discard your old version if this doesn’t overwrite it.


        Tasty?

Upgrade today!

Earlier posting here.

Cool! Version in pure java script.

Oh piffle, the Tasty bookmarklet above is messed up by wordpress. You need to strip two characters off the front and the back after you install it.