Category Archives: programming

Drumbeat

One of my favorite managerial tricks is to introduce the idea of ‘coordination cost,’ as in “How are we going to manage to coordination costs here?”  or “What would estimate the coordination cost on this is going to be?”  or “Looks like a coordination problem.”

I find this helps people step back from the problem at hand and reframe things, looking at it more as a dance.

When I’m feeling particularly silly I like to be sure everybody gets to listen to this amazing recording: Postal Workers Canceling Stamps at the University of Ghana Post Office.  And then we can talk about the possibility that we just might achieve that imaginary happy state.

There is a huge amount of friction in organizations around coordination costs.  And most schemes for organizing things are really, at their heart, just a bundle of choices about how to lower that friction.  In a sense that’s all that Paul Graham’s nice recent essay “maker’s schedule, manager’s schedule”  is about.  Though in this case he’s highlighting a sort of meta-friction that arises when the standard that achieves low friction for one group runs up against the standards that achieve low friction for another.  I was pleased that he is much less polarizing in this essay then he often is.

Reading his essay I was reminded by a posting I made a long time ago about what makes for a great environment to assure high programmer productivity.  The environment outlined there is much like the one Paul appears to be advocating.  But I have my doubts.  Bill Tozer’s comment raises some of those concerns.  He’s more confident of what’s right than I am.  I tend to think what works varies a lot.  Which if true only makes for more friction as what works for this part of the process or people rubs against what works for something else.

I recently had a friend recommend what I should read about Scrum, and he pointed me to an excellent long essay (pdf).  Given my allergies to singleminded frameworks for solving all the worlds problems I was emotionally prepared to dislike what I read.  But I loved it.  I mostly found myself recalling when in my life I puzzled out this or that rule of thumb that I seen these folks have adopted.  So instead me rolling my eyes and saying “Oh please?”  I found myself saying “Oh yeah! Ain’t that the truth.”

So for some large class of software development activities I think you could do much worse than to adopt those methods.  But, these methods are quite disconnected the list of what makes for a great environment for programming, and there is reason for that.

The Scrum methods reside in a middle ground between Graham’s manager and maker time.  In the middle between the social network managers live in and the work of individual craftsman – a place where focused teams reside.  So these rules are about managing the coordination costs that arise there.  Between team members, between product and program managers, between designers and engineers.  In that world you need to strike a balance between coordination costs – interupts, meetings, et. al.  and pure versions of maker or manager time.

Hanging Darwin Ports

To install unix software on my Mac my prefered tool is Darwin’s port.  Which mostly works.  But there is a problem.  For quite a while now I’ve been unable to upgrade my collection of installed software, a lot of stuff.  I think I finally figured out the problem.  Or at least I appear to have fixed it.

First off it was hanging at the moment of activating a small simple package.  I have no idea why.  But I finally gave up trying to figure it out and forced it.  That failed to, but the next time it worked.  So no idea what that was about.

More interestingly there is a serious performance bug which happens when doing an upgrade, say by doing ‘sudo port upgrade installed‘.  As a subplot of doing a mess of upgrades it may decide to uninstall something.  Uninstalling can take approximately for ever.  It looks like it’s doing some complex calculalation in a shell script (maybe a finding all the dependencies or something).  The upgrade finally finished if I let it run for a few hours.  There are a few emails in the relevant email list that suggest that the shell needs a huge amount of memory to do this calculation.  So i shutdown everything else, no idea if that actually helped or not.

This kind of thing always makes me feel stupid.  Like I should be willing to go depth first until the problem is diagnosed and resolved.  But really, I have enough trouble staying focused.

Cascades of Surprise

We build monitoring frameworks like the one I outlined in “Listening to the System” for at least four reasons.  Their maybe legal requirements that we keep records for later auditing and dispute resolution.  We may want to monitor the system so we can remain in control.  We may want to collect data in service of tuning the system, say to reduce cost or improve latency.  And there there is debugging.  Audit, control, tuning, and debugging are, of course, are not disjoint categories.

Good monitoring will draw our attention to surprising behaviors.  Surprising behaviors trigger debugging projects.  The universe of tools for gleaning out surprising behavior from systems is very large.  Years ago, when I worked at BBN, the  acoustics’  guys were working on a system that listened to the machine room noise on a ship hoping to sense that something  anomalous  was happening.

I attended a talk “Using Influence to Understand Complex Systems”  this morning by  Adam Oliner  (the same talk performed by his coauthor  Alex Aiken  is on youtube) where I was again reminded of how you can often do surprisingly effective things with surprisingly simple schemes.

Adam and Alex are tackling an increasingly common problem.  You have a huge system with  numerous  modules.  It is acting in surprising ways.  You’ve got a vast piles of logging data from some of those modules.  Now what do you do?

Their scheme works as follows.  For each of the data streams convert the stream into a metric that roughly measures how surprising the behavior was at each interval in time.  Do time series  correlation between the modules.  That lets you draw a graph: module A influence B (i.e. surprising behavior in A tends to precede surprising behavior in B).  You can also have arcs that say A and B tend to behave surprisingly at the same time.  These arcs are the influence mentioned in their title.

If you add a pseudo module to include the  anomalous  behavior your investigating, then the graph can give you some hints for were to investigate further.

At first blush you’d think that you need domain expertise to convert each log into a metric of how surprising the log appears at that point in time.  But statistics is fun.  So they adopted a very naive scheme for converting logs into time series of surprise.

They discard everything in the log except the intervals between the messages.  Then they keep a long-term and a short-term histogram.  The surprise is a measure of how different these appear.  The only domain knowledge is setting up what short and long-term means.

The talk includes a delightful story about applying this to a complex robot’s naughty behaviors, drawing attention first to the portion of the system at fault and further revealing the  existence  of a hidden component where the problem actually was hiding out.  Good fun!

I gather that they don’t currently have a code base you can download and apply in-house, but the system seems simple enough that cloning it looks straight forward.

They would love to have more data to work on, so if you have a vast pile of logs for a system with lots and lots of modules, and your willing to reveal the inter-message timestamps, module names, and some information about when mysterious things were happening.  I suspect they would be enthusiastic about sending you back some pretty influence graphs to help illuminate your mysterious behaviors.

It would be fun to apply this to some social interaction data (email/im/commit-logs).  I suspect the histograms would need to be tinkered with a bit to match the distributions seen in such natural systems better.  Just trying various signals as to what denotes a surprising behavior on the part of the participants in the social network would be fun.  But it would be cool to reveal that when Alice acts in a surprising way shortly there after Bob does; and a bit later the entire group  descends  into a flame war.

Wave – Part 3

Google has  signaled  that they would like to see wave widely adopted.  In service of that goal they have a dot-org site which reveals a rough draft for a design that enables wave servers to interoperate so their users can collaborate.  But, the whole story is much more complex.  There is a lot to do before their signaled desire can become a widespread reality.

Let’s start with that picture.  Google has revealed some of the plumbing that might form a standard on the boundary between Wave Service providers and any Wave Federations that may emerge.  Lower in the drawing we have some hints about how the traffic between members of these federations might move.  The term XMPP is mentioned.  But that is really not enough.  I’m not bothered by that, it’s early days and rough specs are to be expected.

Let’s move up in drawing, into the Wave Service Providers box.  It would have been a more credible industrial standard move if Google had one or two other players willing to run up on stage and signal their intent to become Wave Service Providers.  Alternately they might have released a complete reference implementation for a Wave Service Provider and, of course, they should place it under an open source license.  The word providers is plural, but at the moment we can only be confident that Google will deploy.  Until some other names signal that they are, at minimum, seriously considering wave I think it is fair to say: Google’s open platform signal isn’t really credible.  It’s not a party if your peers don’t come.  It’s a certain kind of federation if all your partners are tiny and you are huge.  But, again, it’s early days and it’s a big world out there so I’m sure Google can find somebody to come on board.

All the cut points in that layer cake is full of make or break options for Google.  Take a historcial example up at the top, whe the Macintosh shipped in 1984 they set a stellar example of getting that bit right.  They provided beautiful example applications and they provided clear and concise user interface guidelines.  Right now all we have is an example application, one which almost a research proof of concept.  It certainly isn’t as elegant a user experiance as the other Google web apps.

Much as my layer cake implies we will see multiple wave service providers it implies we will see  multiple  wave aware  applications.  How many?    I think, and hope, it’s many.  But were is the signal from Google about this?  One wonders where the Google Search, Maps, Mail, Calendar, Docs, Voice etc. teams stand on all this?  Now, I think it would be insane for Google to take the risk of forcing all and sundry through out the company into a forced march to adopt wave.  But it’s very odd that you could make the argument that there will be only one wave aware application.  We need a much clearer signal about this.

Say you wanted to build an application for collaborative bill payment.  Questions start popping up fast.  Do you design it as plugin to some master wave application’s UI?  There was a period in the 80s and early 90s when a lot of the desktop OS vendors tried to create unified application frameworks; these didn’t work out.  Often it didn’t work out due to market power dynamics between the app vendors and the OS vendors and to a lesser degree it didn’t work out due to execution issues; but it looks to me like the same questions arise here.

Say I’m a vendor of web forum software, my customers install it on their systems.  Say I’m contemplating building a next generation version that’s wave aware, can it be installed on any of N wave service providers?  Does that sentence even meaningful?  This looks like a rerun.  We spend a lot of time in the 80s building software so it could run across multiple platforms.  These wave service providers look to me very similar to cloud computing vendors, very similar to platform vendors, and very similar to desk top OS vendors with their UI minimally interoperable user interface conventions.

Google has revealed a bit of the API provided by their sandbox Wave Service Provider.  How to build automated participants (aka robots), and how to build smallish widgets that provide little visualizations and games.  There is bit of hint that it will be possible how to build entirely unique kinds of wave documents.  That is one of the signs that it would be possible to build, say, a wave aware accounting application, a wave aware college admissions system, a wave aware travel agency.

It’s early days and none of the above should be taken as critical.  It is my intent to see if I can block out what I’d like to see happen.  Or maybe to just start to block out where to ask questions about what’s to be desired.

Wave – Part 2

This posting isn’t really about Wave, it’s about a lovely detail in the protocol design, in particular in the crypto spec.

Imagine that Germany, Britian and France are running wave servers for their citizens, and of course many wave documents have participants from all three countries.  These three servers are all federated and a torrent of traffic is flowing between the servers as citizens of various countries collaborate.

Federation is always a bit like  Goldilocks (not too hot, not too cold).      These three trust each other enough to exchange messages but they don’t really trust each other not to mess with those messages.  So obviously it would be nice if the crypto design assured that all the messages had a cryptographic signature on them.  Then when the German server got a message from a French by user way of the Brits it could be checked to be sure the Brits didn’t mess with it.  Nice idea, sure, but these messages typically constitute a single character, so signing every message is going to be aweful expensive.

The design has a lovely trick for solving this problem.  The French server bundles up batches of messages, signs the batch, and sends it off to the Brits.  This batch will include the edits of numerous citizens on numerous wavelets.  These batches will also have conflict resolutions for edits on wavelets the French server is hosting.  The Brits chew on that bundle and some of it, but not all of it, gets sent on to the Germans.  So the Brits clip out all the parts of the bundle that the Germans don’t need (and in fact shouldn’t see) and pass it on.  The trick?  Well, usually if you delete a portion of a signed message the signature breaks, but by designing the bundles just right the design avoids that.

They do this with hash trees.  The bundles are a tree; the messages are the leaves.  Each node in the tree has a hash of it’s immediate children.  At the root we sign just the top most hash.  You can clip off any branch in this tree as long as you leave it’s hash behind.  Cute.

I’ve been having fun thinking of other applications for this trick.  It lets you sign a document and then selectively reveal portions of that document.  For example you could use it to play battleship; signing the overall board at the beginning and then revealing each square one at a time as the game proceeds.  You could use it to sign a secret document which is later declassified, but with portions of the document censored out.

Wave – Part 1?

Wave is neat and I currently think it will be very widely adopted.  This note is quick summary of what it appears to be.  This is very impressionistic.  The  specifications  are amazingly rough!  I’m not sure I’d call this a platform but I’m sure other people will.  It certainly creates a large option space for building things.  Wave certain mets one of the requirements of a great platform; it opens up so many options for things to do that if you ask “What is it for?” the right answer is: “I don’t know.”  Or as a friend of mine once observed: a good platform is about nothing.  The demos at Google IO obscure that.  That always happens.  For example when they first demo’d the Macintosh they had build one or two spectacular application, MacPaint for example.  People would look at their demos and think: “Oh so this is a machine for doing drawings.”

Wave provides the tools to do realtime distributed coordination of a complex activity.  For example that activity might be a game of checkers, or developing a product plan, a conversation, the distribution of a todo list.  So Wave provides tools to solve the coordination problems that arise when you have a data structure distributed around and multiple parties all modifying it.  Wave adopts the same technique we use for source control,  optimistic  concurrency.  Everybody edits their local copy.  These edits may turn out to conflict with each other.  The resulting conflicts are resolved by some mechanism, which in the Wave terminology are given the math like name operational transforms.  In source control systems I’ve always called that conflict resolution.

A Wave document is said to consist of a set of wavelets which in turn contain one or more XML documents.  For example a Wave document representing a game might have wavelets for all the players, spectators, officials, the score board, game state, the moves, advertising, discussion threads, and individual comments, etc. etc.  Nothing in the Wave  specification  blocks out how all those wavelets manage to relate to each other.  Different activities will, of course have different kinds of constitute parts.  Nothing I’ve read yet specifies even the building blocks for things like users, bits of HTML text, etc.

But the spec does block out what the primitives are for editing the XML documents that constitute the atomic elements of the Wave.  Those operations are a small set of  editing  operations: move pointer, insert text, insert XML tag, split XML element.  It reminds you of using a text editor.

These are the operations which might give rise to conflict.  If Alice and Bob are working on the plan for a party and both change the budget for snacks those edits might both be represented by a series of operations (move to char 120, delete 5 characters, insert “25.00”); with Alice entering “25.00” and Bob entering “45.00”.  The protocol has a scheme for resolving this conflict.  It does not average the two!  It just picks one,  deterministically, and move on.

That’s about it.  But there are some entertaining bits piled on top that are fun, and necessary.  I’ll mention three of these: historical background, how this all gets federated, how access rights are managed.

Optimistic  concurrency  goes back at least into the 1970s, at least that’s the first time I saw it.  I think the first time I saw it used to for a realtime application with human users was drawing system out of Parc in the 1990s – and one of Google’s whitepapers on Wave mentions that.  These days there are two very nice applications that I’ve used to coordinate activities Subethaedit and Etherpad.  I highly recommend Etherpad to anybody who’s working on an agenda or meeting notes jointly with other people – it’s fun.

While it is possible to imagine implementing Wave entirely as a peer to peer system with no central coordination, Subethaedit actually does that.      Wave implementors are all going to have a server that labors on behalf of the users participating in the activity the Wave represents by storing the Wave document, and  orchestrating  the on going edits and naively resolving conflicts as they arise.  The plan is to allow a user’s wave server to collaborate with other such servers.  That works by having one server act as master for each wavelet.  It’s worth noting that every participant in a Wave document is not  necessarily  a participant in every wavelet of that document.  In our example game, two spectators at a game can have a private chat within the game’s Wave document.  To be responsive each server caches copies of the wavelets his users are participating in, and to for reasons of  thrift and privacy these are limited to just those.  The  authoritative  server is  responsible  for retaining the master copy of the wavelet and for resolving conflicts.  So every edit flows to this master and then back out to the other  participating  servers.  There is a bit of crypto complexity layered on on top of that to assure that that bad actors can’t  masquerade  as another participant.

It is very unclear at this point how access rights to waves are managed.  Obviously wavelets will have participating users.  The participation will be asserted along with some rights; for example the right to view but not the right to modify.  In addition there will be groups of users, and these too will be asserted as participating in a wavelet.  If that’s the case then determining if a user has the right to do something will involve searching the user and group assertions on the wavelet.  Remember above that a Wave Document consists of just a set of Wavelets.  Obviously for any given kind of Wave Document there will be more structure than just a set.  For example our spectators’ private conversation would consist of the conversation and then a thread of comments.  It is totally unclear if how access rights are  propagated  or computed across those structures.

Everything is amazingly raw at this point.  Which signals virgin territory.  It’s unclear how good a landlord Google will be, but no doubt a lot of people are going to go forth and attempt to stake some claims on this landscape.

Webmachine

Recently I’ve been taking a recreational climb up the Erlang learning curve.  I’ve been up this hill before, it’s a lovely walk.  This afternoon I took a restful detour to see one of the sights.  Gosh, it’s so pretty.  Webmachine an elegant implementation of an HTTP server (code and wiki, blog, talk and video) written by Andy Gross and Justin Sheehy from Basho Technologies along with  numerous  contributors.  Bryan Fink added this lovely view, a flow chart for visualizing a HTTP request:

The slightly darker line in that traces out a single very simple request; it starts on the lower left, rises up and flows down in a staircase until right at the end it turns back and into a box which you might notice is outlined in red. That is the end state, the 200 OK response.

Back when I was actually sculpting web servers out of blocks of raw bits my white board had baked on drawing like that, but mine wasn’t as nicely laid out.  There are other ways to get it right, but most serious HTTP server implementors have, or at least had, some version of that which haunted their dreams.  These folks have been doing their homework.

What make’s Bryan drawing totally amazing is that that it is traced out an actual request.  One my instance of Webmachine did a few minutes ago.  It’s drawn onto a canvas element.  Tracing is cool, but wait,  there’s  more!

You’ve doubt heard about how Erlang’s a functional language, which to the naive listener means no side effects.  But here you get to see what that can enable.

Unsurprisingly as that request is handled the http server has a data structure which represents the request, and as the trace proceeds that state matures until finally a response can be sent.  For example at some point it has to pick what character set to respond in, while at another point it needs to pick if and how to compress the result.  So all HTTP server’s I’m familiar with have a request data structure around which this process is organized.

Because Erlang is a functional language each time we augment the request state we actually create a fresh data structure (bla, bla, bla, sharing substructure, whatever).  Which makes it easy for the trace to record how the request state looked at each step along the way.

Bryan’s trace is alive.  If you click on nodes in that graph it will show you what request state function got and produced.  Which I must say takes all the challenge out of writing this stuff.

The win here is that you can quickly make sure your resources are served up in ways that actually use all the sweet features in HTTP.

QA: Impulse Control

People lose control and act impulsively all time.  It is important to forgive ’em.  If you never act  impulsively  then you seem humorless, uptight, officious,  bureaucratic.  When professionals act impulsively we wonder: should let this guy steer the ship? One scheme to temper impulses is to smooth things a bit using a group.  When your managerial team act out impulsively it’s a signal to go short.

Here is an example that is kind of meta.

“I have a friend who works in a small company who have just put the three testers on notice of redundancy. The developers have been told that they will have to do all the testing. The testers have been told they have five days to write a letter justifying why they should be kept.

I need as many reasons possible why getting rid of the testers is a bad idea in a scrum environment please.”

The managers in question probably should have run that idea thru a bit of quality assurance.

Boy is that bad personal management, totally unprofessional.  It’s going to be impossible to get much enthusiasm for the job going forward.  I’d file that under “examples of trying to control behavior by raising the stakes.”

Whenever you do layoffs the remaining staff is soured.  Dissipating the emotional cloud that falls over the team critical.  It is really hard.  The people laid off are the most  susceptible.  That is one reason firms try to get them out of the building as quickly as possible.  That bum’s rush is another example of what the public health guys call social distancing, and the tactic outlined above just about assures maximal infection.

It also signals that management lacks much clue about the role of QA, or that things are much worse than they are saying.

I can’t help thinking that these poor testers have been invited to write their own obituary.  It could be like one of those scenes in a comedy of bad behavior where the lawyer reads the will that  going on for pages as the  deceased  enumerates each and every flaw of his  descendants.  But yeah, here’s a impulsive suggestion:  Once they are gone, who will fill the role of canary for their code miners?

Git

I’m really blown away by how nice a bit-o-work git is.

What Eric von Hippel taught me works both ways.  Real innovation requires close contact between a interesting problem and talent.  When you encounter innovation it signals an interesting problem and engaged talent.  Ignore the story told.  Look for that problem and why the talent had to fix it.  Ask, without the snark: “so what’s his problem?”

It’s a guess, but I think Linus’ problem was two fold.  First was a deep passionate desperate need to encourage other developers to take risks with the code.  I think his guilty foxy phrase for this is: “They do the work so I can take the credit.”  He wants to encourage forking!  That’s obvious, once I recognized it.  But it’s an insight that was denied me because forking has such a bad reputation.  I knew a guy once.  He forked, later he had a nervous breakdown trying to rejoin the main branch.    An exagerated story sure, but I have suffered dozens of cases where-in good labor branched off and nothing came back.    So given those experiances the insight that forking is something an Open Source project would want to encourage, v.s. temper, has left me gob smacked.

But it’s absolutely true.  To suppress forking is scarcity thinking.  Inside a closed system where you need to husband resources in an open system you need to court it.  I know that, I just didn’t get it!  Almost the whole point of open source is to cast forth the code so a million eyes and hands can improve it.  And every one of those improvements will be a fork.  It would be insane to try and keep that from happening.  If you don’t enable billions of tiny edits/forks then your killing the seed corn.  Since the entire cascade starts there (and it’s scale free) failure to encourage forking undermines the flow back toward the main branch(s).

I didn’t see that, at first.  I came to that in a round about way.  And damn if I did not have to puzzle out the second insight in a really round about way.  I’m embaressed to admit I was not trying to figure out what “his problem” is.  No, I was confused by this scenario that appears in most of the tutorials.

Your working on some complex change and suddenly your Boss steps into the room and demands a quick bug fix.  What do you do?

... working on complex change ...
git checkout deployed_version
... make quick fix ...
git checkout branch_of_complex_change
... back to work ...

My reaction to that was “Huh, what? you don’t got any diskspace?”  Just check out the main branch into a fresh directory and do the work there.  In fact I’d be surprised if you didn’t already have a copy checked out.  So it took me a while to accept the shocking part was that switching between branches in the same working directory is a common operation.  It was only then that I asked “why would Linus want that?”  That was the “what’s his problem” moment.

This story is a lie.  Linus doesn’t have a boss like the one in that story.  Linus lives on the boundry between “they do the work” and “i take the credit.”  His boss, and this is critical, is “they.”  “They” burst into his virtual office and make demands; in the form of patches.  Each of those demands/patches is branch.  Managing them is Linus’s problem.  At any given time you might have a hundred, thousands even, of such demands/branches.  It’s not your Boss coming thru the door that triggers switching from one branch to another; it’s email, irc, and the whims of your attention that do it.  When ever your brain thinks “Oh, I wonder if patch Foo does Bar?” you do git checkout Foo, look into the Bar question.  A moment later, buffeted by another boss/demand/patch you switch off to another branch.

These two are complementary.  That git encourages forking energizes the periphery of your project; that it empowers you to manage a blizzard of patches lets you deal with the consequences.  But even if you don’t need to have a vast army of contributors I find that rapid context switching useful.  My damn brain is full of contributors too.  I can give all these fever’d demons their own branch.    You can cast those hot ideas out of your head an into git, stew them over time.  It maybe a chaotic mess, but git provides the tools to help manage all that.

While this is a totally different model of branching and forking from the one in traditional source control systems, it is absolutely better.  It is better at assuring the improvements are enabled, captured, managed, and nurtured.  Full stop.

There is a social aspect to git that deserved it’s own posting.  But leave it to say that it’s actually brilliant, from the point of view of somebody more familiar with the ASF’s development models, because it enables and encourage the forming of small groups of common interest around forks.  Brilliant because it’s scale free.  Brilliant because it creates a locus for socially constructed validation tied to that common interest.  Brilliant because it distills out the flow of commits in a canonical form that enables the forks to bud off and remerge smoothly.  Brilliant because it removes a huge “ask permission” cost; i.e. in this system you don’t submit patches you mearly reveal them.  Notice that word “submit.”

I wrote an essay years ago about what could be done to improve the dynamism of open source.  I wrote that there was a virtous cycle between the code base and the user/developers and one thing that we seriously needed was to look at all the friction in that cycle and see if better tooling and practices couldn’t ease them.  Git delivers!