Category Archives: Uncategorized

Eight Dollars a Month

The New York Times has an article about prepaid cell phone plans.  Mostly it’s a PR peice triggered by the roll out of MetroPCS in New York (and Boston).  The article is pretty worthless.  But the chart is pretty good.

It’s interesting how a decade ago the demographics of the  discriminatory  pricing was  primarily  done by targeting different buyer groups via branding.  T-mobile’s prepaid was targeted to poor urban youth, for example.  That’s not me, and I had trouble even finding somebody to sell it to me.  These days the virtual mobile phone operators do that, more so than the majors.  Page Plus Cellular is a good example, and they sell mostly thru small urban store fronts.

When I bought the t-mobile service all those years ago I was routing around their  discriminatory  filters.  They, of course, would prefer that I not be able to do that.  The rise of text messaging and data services is making that easier.  Three services (text, voice, data) makes for a rich pallet for market segmentation.  I love Adam’s term for industries that depend on these practices: confusopolies.

Twisted Nevermind

I have been looking at Python’s Twisted.  It’s a framework for writing event-driven systems.  I have built far too many event drive frameworks in my day.

One of the core abstractions in Twisted is an object they call a “Deferred.”    I really wish they had called it a Promise.  Routines that need time to gather their results  immediately  return a Deferred object.  Later they give that object a kick to signal a result.  Here’s an imaginary bit of code.  Remind_me returns a deferred that presumably will get a kick next Tuesday.

pending_reminder  = reminded_me("next Tuesday")

Back in the 1960s I learned a language called SNOBOL, a precursor awk, Perl, icon, and the amazing SL5. One of SNOBOL’s amusing features was that you were encouraged to write, for every statement, a clause to handle both success and failure.  It’s a good fit if you are doing a lot of pattern matching.

These days we use exceptions.  Every time you call a function you know that it can either work out, or it will throw an exception.  The methodical Twisted programer has to worry like a SNOBOL programmer.  But unlike SNOBOL your forced to bundle up everything in functions.  And, since Python’s lambda expressions are quite limited.

pending_reminder.callback(pay_the_bills)
pending_reminder.errback(reschedule_the_bill_payments)

These Deferred instances are, in effect, a pipe from the process working on getting your result and the party that requested for that result. The Deferred is effectively a contract between these two parties; call the two parties the buyer and the seller.  In the above example they buyer purchased his wake up call, and the seller has promised to either deliver that wake up, or to at least signal an error.

What Twisted is doing solves only half the problem.  What if I want to cancel my wake up call.  Twisted has no defined mechanism for the buyer to notify the seller that his order should be canceled.  Say I’m building a web page for the user.  I ask a number of parties to go gin up bits of that display.   Before they finish the user loses interest and closes the window.  I haven’t any way to tell the providers that they should stop.

The supply chains like this are typically more complex than ordering/wait/error, or even order/wait/error/cancel.  The more general problem is flow control.   The quality of the communication channels varies by orders of  magnitude.    It is invaluable to be able to tell the downstream provider about that variability.   An extreme case is when the user’s client goes to sleep.

No doubt these problems can be layered in on top of Twisted’s abstractions.  But, I found it discomforting that these is no way for the buyer say ‘never mind’ to the seller.

Glimpse, a missing Unix Command

Here is a Unix command I often yearn for.  Let’s call it glimpse.  You put it into a pipeline and it provides a glimpse of what’s happening in that pipeline.  In it’s simplest form it would work just use termcap to show what’s happening; overwriting a single line in the terminal with the lines flowing thru the pipeline.  If a window system was to be found it could pop up a transient window showing a bit of what’s flowing past.  Extra points for charting, progress displays, filtering, etc.

I have a very primitive version:

#!/bin/sh
trap 'rm -rf /tmp/glimpse.$$*' EXIT
FIFO=/tmp/glimpse.$$.fifo
mkfifo $FIFO
xterm -title glimpse -fg blue -e sh -c "cat $FIFO ; sleep 30" &
tee $FIFO

Which presumes you have X11 at hand. This has the advantage that it will work across ssh connections.  For example: tail -f /var/log/messages | glimpse > /dev/null

Update: pv mentioned in the comments is sweet!  Thanks Ted!

Distracting Embarrassing UI

I’m sure you’ve been in many a meeting were the presenter’s IM client pops up a message “Dude? Had Lunch Yet — sexyGril12”.  Ok, usually it’s LoserFriend3; but still.  It’s facinating how far we need to travel yet before we get this stuff right.  The stupid computer has no idea that it’s owner is standing in front of two hundred people trying to make a good impression.

This stuff is distracting.  Some time ago I was watching a screen cast only to notice, as the author cycled past his email client, that he had a message from me!  I still haven’t figured out why.  Someday advertisers will puzzle out how to dynamicly slip affliation clues like that into their ads; bleck!

Too often I’m ignoring the presentation and watching these distractions.  What is that widget in their menu bar?  Golly they have a lot of apps open, oh wait why are they still using that?

But the most distracting thing by far is the feature in Firefox where it reveals little bits of your browsing history as you type.  I hate that!  It’s amazing how your entire audience will suddenly grow quiet and attentive as you start to type into the URL bar.  I need a firefox plugin.  Like the old unix fortune script it would populate the pull down with on topic aphorisms.  It would amuse the audience while advancing my point.

Someday we will learn how to create user interfaces that aren’t so blind to the context we are working in.  Interfaces that don’t strive to distract us from what we are working on.  That don’t distract our audience.  That don’t leak private, even embarrassing, information all the time.

Meanwhile, somebody must have made a check list of things to do before you give a presentations.  Turn off IM; Set browser.urlbar.maxRichResults to zero, turn off the sound and video …  Better yet there really ought to be a tool.

See also: the Firefox HistoryBlock plugin, Caffeine, RescueTime, productivity

Listening to the System

I have a rule of thumb that data in motion is more interesting than data at rest.  Both from a business architecture point of view and when designing, managing, or diagnosing a system.  Thus my interest in middlemen, who intermediate transaction flows.  Thus my interest in the ping problem, aka how to do forward chaining on the Internet.  Content isn’t king, the hubs are king.  The conversation is more important than the library.

Recently I’ve been kicking the tires on a bit of technology that goes by the name AMQP, or Advances Message Queue Protocol.  It is for all intents an open standard for building your enterprise message bus.  There are a couple reasonably mature open source implementations at this point.  Active communities.  Active standards process which, and this is important, are driven by the users of the system and haven’t yet been coopt’d by the vendors.

Regularly through out my life I’ve worked on real time control systems.  So I have big tangled set of design patterns for how those get built.  Big sophisticated industrial control systems full of three problems I find interesting.  They are very heterogenous, they are all about data in motion, and they feature power-law distributions in the event rates.  Recently I’ve been finding it amusing to observe how much cloud computing is full of the same tangles.  There is a hell of lot of  commonality  across these problems: real time control, enterprise message bussing, managing all the moving parts in your cloud computing application.

To stay sane you can say there are three design patterns that stand atop your message bus.  Broadcast,  enqueuing  work in progress, and the ever popular remote procedure call.

Work in progress Q’s are everywhere.  You see them at the bank when you Q up for a teller, at the grocery story with the check out lines, or when you s stick your mail into the mailbox on the corner.  There is a nice term of art: “Fire and Forget”.  When things go according to plan you slip your mail into mailbox, the magic happens, and your valentine gets your card.  Fire and forget is great because you can decouple the slow bits from the quick (user response time) bits.  It also enables  separation  of concerns (you don’t have to run a postal system).  It also is  trivial  to add scaling (just hire a few more clerks, or spin up a few more computers).  So one thing you can do with AMQP is set up virtual simulations of the queue at the bank.  And AMPQ implementations provide dials you can adjust to decide how reliable (v.s. fast) you want that to be.  For example you might set the dials to assure the messages are replicated across disk drives in  multiple  geographic locations.  You might set the dials so the messages never leave wire and ram.  There is a of latency/reliability  trade off here.

The fire and forget pattern doesn’t work of course.  We all love to worry.  You buy something online.  You fire off your order and then you forget about it.  Ah, no you don’t.  You put it on the back burner.  You get a tracking number.  From time to time you poll to see how it’s going.  Sometimes the vendor sends you status reports.  Sometimes he sends you bad news.  While AMQP has lots of nice and necessary  mechanism  it doesn’t have tools for handling the range of semi-forget modalities: monitoring, tracking, status reporting, raising exceptions.  (As an aside, it is interesting to tease apart the attempts to address these found in SMTP.)

In any case systems built around the Queue of Tasks design pattern are everywhere.  This is the model seen in factories for everything: batch production, forms processing,  continuous  production lines, unix pipelines, etc. etc.  I once heard a wonderful story about a big factory at the end of a pipeline.  Pretty regularly the sun would come out and warm the pipeline.  At that point a vast slug of vile material would rapidly explode out of the pipe and into multi-million dollar holding tank.  They wished the tank was larger.

When you build realtime control systems you often arrive after the fact.  The factory already is chugging along and your goal is to try and make it run better, faster, etc.  The first thing you do is try to get some  visibility  on what’s going on.  At first you thrash around looking for any info that’s  available.  In software systems we look at the logs.  We write code to monitor their tails.  We tap into the logging system, which is actually just yet another message bus.

That logging and monitoring are similar but different is, I find, a source of frustration.  It is common to find systems with lots of logs but very little monitoring.  What monitoring is going on is retrospective.  Online, live, monitoring is  sufficiently  different from logging; that it drives you toward a different architecture.  It is one of the places that data in motion becomes distinct from data at rest.

One of the textbook examples of AMQP usage is the distribution of market data.  A vast amount of data flows out of the worlds financial markets. Traders in those markets need to tap selectively into that flood  so their trading systems can react.  Which is exactly what you need when doing real time control.  The architecture for this pushes the flood of data, contrast to what is commonly seen in log analysis.  There you see a roll up of logs into an aggregated, archival, set where offline processes can then do analysis. 

 

The distinction between data at rest v.s. data in motion is identical to the distinction between recording and broadcasting.  I find you need both.  In real time control systems it tends to be common to find good infrastructure for the broadcast.  In software system I seem to encounter good infrastructure for the recording side.  What the drawing demonstrates is how many more moving parts a system  accretes  as soon as you start to address these issues.  In the drawing our simple ping-pong between workers and task queues now has now sprouted a fur of  mechanism  so we can get a handle on what it’s doing.  Each component of the system needs to participate in that.  Each part has to cough up a useful log; which we then have to capture, record, and broadcast.  Standardizing all that would be good; but it tends to be at minimum tedious at at worse intractable.  First off, it is a lot to ask of any component that it  enumerate  all possible situations it might fall into.  Exceptions, and hence logging, are all about the long tail.  Secondly a good log is likely to run at many times the frequency of the work; i.e. when the worker does one task he will generate multiple log messages.

The long-tail nature of log entries means that our online monitoring, etc. has to be very forgiving and heuristic.  One common trick for solving the problem that logging runs at higher rates than then work is to situate this part of the system at a lower-latency less-reliable point when you set the dials on your messaging hub.  All that said it’s often a problem that these things get build, and spec’d out, late in the game.

AMQP has some nice technology for implementing that messaging hub for the broadcasting side of things.  One of the core abstractions in AMQP is the exchange, a place that accepts messages and dispatches them.  Exchanges do not store messages; which is done by queues.  In a typical broadcast setup market data floods into an exchange where different consumers of that have subscribed to get what they are interested in.

For example I’ve recently been playing around with a system for keeping a handle on a mess-o-components running at EC2.  I flood the logs from every component to a single AMQP exchange which I call the workroom.  For example to get the machine’s syslog I add a line to syslog’s configuration so it routes a copy of every logging message to a unix pipe.  On the other end of that pipe I run a python program that pumps the messages to the workroom.  These messages are labeled with what AMQP calls a routing key, for example “log.syslog.crawler.i-234513.”  At the same time I have  daemons  running on each machine that are mumbling at regular intervals into the work room messages about swapping, process counts, etc.  If I want to listen on on all the messages about a single machine then I subscribe to the work room asking for messages who’s routing_key match “#.i-234513.#” or, if I want to listen in on all the syslog traffic can tap in “#.syslog.#’ messages.  That for example revealed that one of my machines was suffering a dictionary attack on it’s ssh port.  This framework makes it easy to write simple scripts that raise the alarm if there is a sudden change in the swapping, or process counts.

One thing I like to do is to attempt to assure that every component mumble a bit.  That way I can listen to the workroom to see who’s gone missing; and as new components are brought on line I can notice their arrival.  I like to use jstat, vmstat, even dtrace, to get the  temperature  of various system components.  It’s nice to know when that java process  descends  into a garbage collection tar pit.

The workroom message hub is a huge help getting some modularity into the system.  It’s easier to write single purpose scripts that tap into the workroom to keep an key on this or that aspect of the system.

Broken Windows :: Alice’s Resturant

This morning’s paper triggers the realization that the Broken Window’s Theory is the conservative backlash to Alice’s Restaurant.

“… And I, I walked over to the, to the bench there, and there is, Group W’s where they put you if you may not be moral enough to join the army after committing your special crime, and there was all kinds of mean nasty ugly looking people on the bench there. Mother rapers. Father stabbers. Father rapers! Father rapers sitting right there on the bench next to me! And they was mean and nasty and ugly and horrible crime-type guys sitting on the bench next to me. And the meanest, ugliest, nastiest one, the meanest father raper of them all, was coming over to me and he was mean ‘n’ ugly ‘n’ nasty ‘n’ horrible and all kind of things and he sat down next to me and said, “Kid, whad’ya get?” I said, “I didn’t get nothing, I had to pay $50 and pick up the garbage.” He said, “What were you arrested for, kid?” And I said, “Littering.” And they all moved away from me on the bench there, and the hairy eyeball and all kinds of mean nasty things, till I said, “And creating a nuisance.” And they all came back, shook my hand, and we had a great time on the bench, talkin about crime, mother stabbing, father raping, all kinds of groovy things that we was talking about on the bench…. “

I’m too proud of having collected this newspaper clipping:

Stimulus Watch

StimulusWatch.Org is very thought provoking.  Some folks really had their act together, Brockton Ma, and Blue Island, IL are two random examples; while some places go unmentioned Lowell Ma for example (our fourth largest city).  Since I presume the Senators from Maine are now nearly infinitely powerful I’m suprised how little money Maine appears to have gotten.  $/citizen by various voting areas would be a fun.  Then it would be easier to see which cities got nothing.  They have a ranking for most discussed, but I think it would be fun to have a score for most stupid discussion – i.e. most coments per dollars spent created.  There are plenty of errors in the data.  For example double entries or projects that create 1 job for every five thousand dollars.