Category Archives: programming

grep -o

If only one could know which skills will be worth learning! The time I invested in learning about APL, Ada, Cobol, RSX/11, Focal, Snobol, 6809 assembler, etc. etc. didn’t return much value beyond amusement, entertainment, and experiance.

Learning all the Unix tools back in 1977 was fun too it has been extremely valuable too!

Sometimes I discover that since then somebody has added a new tool; or a new switch to existing one. Today I learned about the -o switch to grep. It prints out only the matching string; not the whole line.


536$ grep -o '[a-z][a-z]*:[a-z][a-z]*' cascades.rdf | sort | uniq
foaf:name
my:author
my:editor
ow:address
ow:author
ow:booktitle
ow:editor
ow:institution
ow:journal
ow:month
...

Useful.

Bummer, my Car is just a Software Applications

John Robb asks if Cars are becoming O/Ss. Absolutely not. Operating systems are intermediaries between hardware and software applications – two sided network effects. Cars with 700 page users manuals are becoming applications. Applications with very weak network effects. (As an aside their collision avoidance systems can have network effects since the more cars/planes/etc. that adopt them the more effective they become.)

Failing to find a auto-OS they can rendezvous around is not a particularly good architecture for the auto industry. Retaining control over that software frustrates innovation on both the hardware and user facing sides. I’m reminded of the old story of Adobe, who wiped out a lot of in-house printer software teams. Maybe somebody will do the same to the in-house auto application teams. A similar mess frustrates scale and innovation in the real time control industry.

Of course the most amusing element of all this would be terminology. Imagine a future where you can swap out your car’s engine for an alternate just as we swap graphic cards and disk drives on computers today. Of course at that point you couldn’t sit down and drive away until you loaded up some new drivers.

Lower Latency Secondary Storage

These three items come together in my head.

Saw this today:

Samsung Electronics said Tuesday that it will launch two mobile computers in early June that will do away with hard drives altogether, replacing them with 32 gigabytes of NAND flash memory. … Unfortunately for U.S. consumers, both will be sold in Korea only.

While previously I saw this go by:

Owners of Apple Computer’s new MacBook consumer notebooks will find that upgrading or replacing the computer’s hard disk is as simple as adding more memory.

Together those reminded me of this…

…prompted South Korea’s Samsung to offer Apple a deep discount and be willing to dedicate 40% of its flash-memory manufacturing capacity to seal the deal.

Even though I recall the details of that last item being very murky, I think it’s pretty clear we can expect the first US release of a hard driveless portable to be from Apple. August MacWorld?

Flash based secondary storage creates some interesting options in physicality, power, programing, database, and OS design; it will be interesting to see how that shakes out. Can anybody think of some new applications it enables?

What ever your doing, it’s wrong!

This delightful summary of Extreme Programming goes a long way toward explaining why it is so attractive to a certain managerial personality type.

  • Integration is a nightmare, so integrate continuously
  • Writing tests is tedious, so write them first
  • Social interaction for developers is often difficult, so pair all the time
  • If it ain’t broke don’t fix it, so refactor mercilessly

It has Purtian over tones; e.g. that what every your staff is doing it’s very likely selfish and they are ignoring the real work.

Framing things up in that manner is good fun until sombody gets a stick in their eye. Easy to wound your enterprise by taking those rules too much to heart. For me framing the principles of extreme programming is dangerous. It councils that you ought not play to the strengths of your talent. You should strives to supress them. That’s unlikely to either attract talent or build momenteum.

  • integrate continiously assures you take only small steps
  • writing tests first assumes the code and platform has nothing to say about the problem
  • pairing programmers assures you leave a large swath of good talent for your competors to hire
  • refactor mercilessly assumes you have large code bases rather large installed bases, poor you.

That said, all the above are true to a surprising degree. In fact these same four have mimics in open source. For example we don’t pair program but we know that situating the work in a highly public way is effective both for social, inspection, and testing reasons.

Refactoring, though, is a challenge in open source because we are more typically entangled in a larger more vocal installed base. The often lower and more difuse code ownership also creates a web of entanglement (a social web) that makes refactoring a more costly exercise. In extreme programming the test suite substitutes for having a live installed bases; so there the immovable installed base problem is replaced by the immovable test suite. The good news for open source is that the public process can increase the number of voices advocating a serious refactoring.

It’s worth noting that encouraging refactoring runs counter to the advice to integrate continously. Refactoring is fundimentally a choice to buy a bag of disintegration. So the pair of them is a cruel catch 22. For open source: developers can operate in private and they can fork we do license these kinds of disintegration moments. Though we probably don’t encourage enough of them and we lack rituals from bringing them home.

I’ve got to thinking recently that there are times in the life of an open source project were the community would be wise to encourage forking in search of a way out of the box of their current architecture.

Screen

Since I got some positive reenforcement for my posting on Nagios I’ll go ahead and point to this posting on screen; which I use as well. I use screen on headless servers, a lot.

Screen is much lighter than what I used to do. I’d run an X server on the headless machines with an in memory virtual display device. On that server I’d run a XEmacs to which I would typically connect using gnuclient. Occationally I’d connect to the X server using VNC. The nice thing about this approach is you can run a rich visual display of the server’s status in the memory resident X server and pop it up on demand.

HA! I see that Justin Mason has some screen tricks as well.

The Perception of Risk

Another fun item from Chandler Howell’s blog about how people manage the risk. People try to get what they percieve to be the right amount of risk into their lives, but they do this on really really lousy data. So there are all kinds of breakdowns.

For example you get unfortunate scenarios where actors suit up in safety equipment, this makes them feel safer so they take more risks and after all is said and done the accident rates go up. Bummer!

I’ve written about how Jane Jacobs offers a model for why Toronto overtook Montreal as the largest city in Canada. After the second world war Toronto was young and niave with a large appetite for risk; while Montreal was more mature and wise. To Toronto’s benefit and Montreal’s distress the decades after the second world war were a particularly good time to take risks and a bad time to be risk adverse.

I’ve also written about how limited liablity is a delightful scheme to shape the risk so that corporations will take more of it. All based on a social/political calculation that risk taking is a public good that we ought to encourage.

What I hadn’t apprecated previously is how this kind of thinking is entriely scale free. Consider the fetish for testing in many of the fads about software development. The tests are like safety equipment, they encourage greater risk taking. Who knows if the total result is a safer development process?

Selenium

Yoav’s post on Nagios reminds me I’d been meaning to post about Selenium. He’s right, by the way, about Nagios.

The best open source emerges when a group of “buyers” have a desperate need and no patients or budget to wait for a vendor to show up and bumble around cluelessly trying to figure out why they are miserable and how to make money off that. OpenQA looks like just such a project.

If you develop web apps and your not aware of OpenQA then your not happy. Particularly Selenium IDE, a Firefox plugin that will record and playback automated tests.

This system is marvalously hoky, plenty of “worse is better” here! But the price is right and it works! In a perfect example of how open source is more likely to have the features you must have v.s. the features that would make the salesman happy this system has horrible horrible doc, but actually works in most browsers.

These testing systems work by exercising the web app. using java script. The tests can be stored in a few formats, but the original format was an HTML table. The testing harness steps thru the HTML table interpreting instructions on how to do the test found in the table rows. Starting from there you get assorted hackery to let you write and run your tests in assorted ways that you might find more comfortable.

Highly recomended.

The Process is the Product

Darn! I missed this confrence.

Come to the Waterfall 2006 conference and see how a sequential development process can benefit your next project. Learn how slow, deliberate handoffs (with signatures!) between groups can slow the rate of change on any project so that development teams have more time to spend on anticipating user needs through big, upfront design.

IP Piracy – the business model

Nice sophisticated article from the Business section of the LA Times about how it appears that they are very conscious that software piracy is good for Microsoft.

“They’ll get sort of addicted, and then we’ll somehow figure out how to collect sometime in the next decade.” — Bill Gates

I’ve written about that before both here to illustrate how exactly the pirate price appears to be set and about how some nations might prefer to enforce IP rights to as a form of classic home industry protectionism.

I particularly like this quote from an IP lawyer.

“Is widespread piracy simply foregone revenue, a business model by accident or a business model by design?” he asked. “Maybe all three.”

I love that because when your firm is executing on a model like that it becomes totally keystone cops inhouse with people running in all directions. The article outlines how the people who are chasing lost revenue managed to encourage adoption of free software in the Islamic world.

The effort even prompted Islamic clerics in Saudi Arabia and Egypt to declare fatwas, or religious edicts, against software piracy.

I have no doubt that if Microsoft could strictly enforce their software licenses that would be great for open source. It would raise barriers to entry high enough that the majority of users on the planet could not get over them. Given the necessity of regular security patches it is also clear to me that Microsoft is intentionally not enforcing their licenses.

Popularity Hashing

As regulars know my favorite distribution is power-law.

I think there is a lot of system design that would play out a lot better if people admitted that one or another key statistic about the load on the system will be power-law distributed. It troubles me to read about system designs that implicitly assume that the traffic loads will be uniform. In most cases I suspect the designers haven’t thought this thru.

One way that plays out is that I suspect the folks that designed most of the internet just did not understand that graphs of communication links will condense into power-law graphs. So if you want to build a resilient network your going to have to take steps to make sure that the hubs don’t become single points of failure. Similarly I don’t think they understood that for similar reasons the only a few vendors would capture most of the DNS, most of the email, etc. etc.

I got to rant about this to Ben Laurie the other day. One example I was giving was that the distributed hash table designs all appear to assume that the frequency of looking up individual keys is uniform. I think that’s vanishingly unlikely. I presume that a handful of elite keys will get looked up a lot more than others – for example looking up Star Wars is a lot more common than looking up Julie London’s verson of Sway. Since I presume the traffic is power-law distributed then the elite keys will account for a disproportionate amount of the traffic. If your node in the peer to peer network implementing the distributed hash table happens to get stuck with one of the elite keys your reward is, in effect, a denial of service attack.

The obvious solution is to spray the popular keys across more nodes in the network. Ben had the clever idea that if you could look up the popular stuff one way and the regular stuff in the traditional way. In effect running 2 or more hash tables one for each tier of popularity. Clients of the distributed hash table would, of course, start by trying the popular table and then fall back on the less popular one. Servers of particular keys would monitor their traffic and shift load onto their neighboring tables as necessary.

Tricks like this would presumably be useful even for some simple single process in memory hash tables. A two tiered hash table is likely to get the elite entries densely packed into the fast cache memory where they are never paged out.

It pulls my cord that designers continue to ignore the prevalence of power-law distributions in the populations they are designing for. For example all the economics text books show the price/volume curve as a straight line. Setting aside my irritating I bet there are some really cool algorithms to be discovered that take this to heart.