Author Archives: bhyde

Market Signals

Getting the annual before Thanksgiving hair cut I amused myself by trying to convince the Barber that he should introduce surge pricing.  I mean!  How are Uber drivers to know they should lay down their keys and take up their clippers.  Who is he to deny the market it’s signal? Does not the market have rights?

Optical character recognition for PDF files.

Pypdfocr is very nice.   The input is a PDF file, for example the output of scanner.  The output is another PDF, which looks like the original but now has the words recognized in it.  That lets you can search it, and if you index all your documents then that’s very useful.  Spotlight on my Mac sees into these.

You can extract the raw text using pdftotxt, which is nice for reading on the train.

I was delighted the it understands columns pretty well.  It is not so good at paragraph breaks though.

I gather that a some of people use this to scan all the paper, receipts, et. al. that comes into their home.  It has some clever switches to help with that usecase.

It is a bit of a pain to install, lots of homebrew packages and pip packages are required; and then – at least in my case – it works but it complains that I didn’t get it right.  There are pages that talk about these things; but I’m happy enough now.

Claw Back

Recently I’ve added Ponzi Tracker to my RSS reader and it’s fun in that way that we all enjoy things that feed our confirmation bias.  And who doesn’t love a story about a criminal.  Today’s post included this bit:

“… the court-appointed Receiver, Kenneth D. Bell, begins his quest to recover “false profits” from thousands of victims that were fortunate enough to profit from their investment.  The receiver’s efforts to recover these “false profits” will become markedly easier in the event that Burks pleads guilty to the fraud, since the guilty plea or conviction of a Ponzi schemer allow the use of the “Ponzi presumption” that significantly simplifies the burden of proof required in the so-called “clawback” actions.”

I didn’t know that.  It seems like a big gaping hole in the investor protections that encourage corporate risk taking.  The reason we have limits on investor liability is that it lets the investors delegate risk taking to the corporation while avoiding the worst case scenarios that they will be held responsible for the evil that firm does.  Their risk is limited to the amount of their investment.  Back in the day only the king had the power to get away with murder, but then it devolved to his friends.

So I’d love to know why Ponzi schemes are unique in this regard.  And I’d love to know that if we convicted a few large financial firms of just the right crimes we could then claw back the money from the “lucky” ones who cashed out early.

Any amateur social scientist knows the next question: What about incentives?  If you threaten investor class it creates an incentive.  Presumably the kings friends let this loophole appear because the victims of Ponzi schemes are somehow unique when compare to the other victims of corporate malfeasance.  Maybe it’s about affinity.  Which is ironic, as affinity is a common feature of Ponzi schemes, but in this case I think it might be that the Ponzi victims are called “investors.”

If only the victims of the mortgage crisis had called themselves investors.  If only we could learn use that phrase “false profits” more.

Bash Quoting

For years I’ve been frustrated by my inablity to puzzle out how to write this in bash:

H=$(dirname $0)

so it’s safe if $0 has spaces in it.

Clearly

H="$(dirname $0)"

is better, but still the $0 isn’t usefully quoted.

I finally complained to the shellcheck author that this should be in the FAQ, though there isn’t a FAQ. And he assured me if you do it right it isn’t a problem.  He also happened to mention the answer:

H="$(dirname "$0")"

I complained that just because it wasn’t a question if you’d ask if you knew the answer didn’t mean that it wasn’t a commonly asked question.  So he added (or updated) a wiki page to say that the $(…) creates a new “context”.

So there you go.

stand up an instance

The phrase “stand up an instance” crossed my awareness the other day.  Here’s an example usage “I could really use some guidance / hand-holding to figure out how to stand up an instance of SureStep Online on top of a SharePoint repository.”

Phrase like this are markers.  They tell us what group a person is a member of.  Or, trying to be. Or, where he’s coming from.  This article at Slate had sensitized me to that.

The Slate author talks about fingerprint words.  We all tend to pick unusual words and turns of phrase. They spread like fads thru our networks.  Micro information cascades.

“…I went home after work and asked my wife if there were any weird, fingerprint-type words I used often.

“You mean like iteration?” she said, without the slightest pause. Then the floodgates opened. “You also say tangential all the time. Oh, antiquated, too! And you’re always talking about the extent to which someone did this or that.”

“Stand up an instance?”  Where did that one come from?   We have stand up desks, stand up comedians, stand up guys, standing up for someone/something, stand up to the boss, and the list goes on.

It took me a surprisingly long time to track down what metaphor “stand up an instance” is referencing.  I wasted some time thinking it might be a sports term.  Cricket maybe?

It’s from military.  Here a example picked at random from Google books: “Needless to say, the 9th Engineer Battalion was ordered to stand-down on 19 July, 1970 for preparation to redeploy to Camp Pendleton as part of President Nixon’s Phase IV redeployment schedule.”   Examples of stand up are harder to find since military writing is full of standing up against various opponents, or in trenches.  But here’s one: “With a staff of fewer than fifty personnel in late 2003, CMATT had to stand up the Iraqi armed forces with completely inadequate resources.”

I find myself thinking this is part of the swing back toward the data center( see this 2005).  Big planning is back, I guess.


And now for our regularly scheduled clickbait:

  • Good essay about cloud service security, using Apple as a counter example.
  • Unbelievable good essay about how sometimes there appears on the boundary of the evolutionary niche your living inside of a trap and it swallows your entire species.
  • Somebody must have enjoyed making this collection of gifs suitable for many occasions, but I couldn’t find one for the reaction we all have to clickbait.
  • Most hated industry.

 

When ten commandments are not enough.

There are good reasons why people love a good set of rules about how to go about their jobs.  Here’s a new one:  12 Factor Micro-Services.  It’s part of the enthusiasm for containerizing everything, and it seems to live off the energy produced by the eternal tension between development and operations.

  • Codebase: One codebase tracked in revision control, many deploys
  • Dependencies: Explicitly declare and isolate dependencies
  • Config: Store config in the environment
  • Backing Services: Treat backing services as attached resources
  • Build, release, run: Strictly separate build and run stages
  • Processes: Execute the app as one or more stateless processes
  • Port binding: Export services via port binding
  • Concurrency: Scale out via the process model
  • Disposability: Maximize robustness with fast startup and graceful shutdown
  • Dev/prod parity: Keep development, staging, and production as similar as possible
  • Logs: Treat logs as event streams
  • Admin processes: Run admin/management tasks as one-off processes

I find this an odd list.  I don’t per-say have much issue with it, but are these really the top twelve things to keep in mind when designing the modern cloud based monster?

For example that point about codebase.  We all build systems out of many many 3rd party components, which means there are numerous codebases.  Why would that modularity be appropriate only at the project or firm boundary?

Which brings us to the question of how you take on fresh releases of third party components, and more generally how you manage refresh to the design of the data stores or the logging architecture, or the backup strategy.  All these are pretty central but they aren’t really on this list.

Which maybe why this is a list about “micro-services.”  Which again, I’m totally on board with. And yet, it seems to me like a fetishization of modularity.  Modularity is hard, it’s not cheap, and damn-it sometimes it just is not cost effective.  This is not about the dialectic between dev and ops, this is about the dialectic between doing and planning, or something.

It is  like the way people assume there is optimal size for a function when it is the outliers that are full of interest.

Generally when designing the outliers are.


Meanwhile, who says I can’t have teasers after my blog postings:

Graphical Programming and yEd

Graphical programming languages are like red sports cars.  They have lots of curb appeal, but they are rarely safe and reliable.

I long worked for a company whose product featured a very rich graphic programming. It allowed an extremely effective sales process.  The salesman would visit the customer who would sketch a picture of his problem on the whiteboard, and the salesman would enquire about how bad things would get if the problem didn’t get solved.

Meanwhile in the corner the sales engineer would copy the drawing into his notebook.  That night he would create an app in our product who’s front page looked as much like that drawing as possible.  It didn’t really matter if it did anything, but it usually did a little simulation and some icons would animate and some charts’ would scroll.  The customers would be very excited by these little demos.

I consider those last two paragraphs a delightful bit of sardonic humor.  But such products do sell well.   Customers like how pretty they look.  Sales likes them.  Engineering gets to have mixed feelings.  The maintenance contracts can be lucrative.  Thathelps with buisness model volatility.  So yeah, there is plenty of value in graphical programming.

So one of the lightning talks at ILC 2014 caught my attention.  The speaker, Paul Tarvydas, mentioned in passing that he had a little hack based on a free drawing application called yEd.  That evening I wrote a similar little hack.

Using yEd you can make an illustrations, like this one showing the software release process for most startups.

My few lines of code will extract the topology from the drawing, at which point you can build whatever strikes your fancy: code, ontologies, data structures.  (Have I mentioned how much fun it is to use Optima to digest into a glob of XML?  Why yes I have.)

I was also provoked by Fare Rideaus‘ talk.  Fare is evangelizing the idea that we ought to start using Lisp for scripting.   He has a package, cl-launch, intended to support this.  Here’s an example script.   Let’s dump the edges in that drawing:

bash-3.2$ ./topology.sh abc.graphml
Alpha -> Beta
Beta -> Cancel
Beta -> Beta
Beta -> Beta
bash-3.2$

I’ve noticed, dear Reader, that you are very observant.  It’s one of the things I admire about you.  So you wondering: “Yeah Ben, you found too many edges!”   Well, I warned you that these sports cars are rarely safe.  Didn’t I?

Precariat watch

This long profile of reasonably successful member of the precariat in the Times is worth reading, if you are curious about this trend.

“If you did the calculations, many of these people would be earning less than minimum wage,” says Dean Baker, an economist who is the co-director of the Center for Economic and Policy Research in Washington. “You are getting people to self-exploit in ways we have regulations in place to prevent.”

“These are not jobs, jobs that have any future, jobs that have the possibility of upgrading; this is contingent, arbitrary work,” says Stanley Aronowitz, director of the Center for the Study of Culture, Technology and Work at the Graduate Center of the City University of New York. “It might as well be called wage slavery in which all the cards are held, mediated by technology, by the employer, whether it is the intermediary company or the customer.”

fake bots and standards

I read this morning about bots that pretend to be Google.   I’m surprised to realise that I’m unaware of any standard scheme for a bot (or other HTTP client) to assert it’s identity in a secure way.  This seems like a kind of authentication, i.e. some sites would prefer to know they are being crawled by the authentic bot v.s. an imposter.

There is a list of the standard authentication schemes.  But none of them handle this use case.

This doesn’t look too difficult.  You need a way for agents to sign their requests.  So, you make another auth scheme.  Authentication schemes using this scheme include a few fields.  A URL to denote who is signing, and presumably the document associated with that URL has the public key for the agent.  A field that allows the server to infer exactly what the agent signed.  That would need to include enough stuff to frustrate various reply attempts (the requested url and the time might be sufficient).

More interesting, at least to me, is why we do not appear to have such a standard already.  We can play the cost benefit game.   There are at least four players in that calculation.

The standard’s gauntlet is a PIA.  For an individual this would be a pretty small feather in one’s cap.  And a long haul.  So the cost/benefit for the technologist is weak.  And this isn’t just a matter of technology, you also have to convince the other players to play into the game.

What about the sites the bot is visiting.  They play a tax for serving the bad bots.  The size of that tax is the benefit they might capture after running the gauntlet.  But meanwhile have alternatives. They aren’t very good alternatives, but they are probably sufficient for example they can whitelist the IP address ranges they think the good bots are using.  That’s tedious, but it’s in their comfort zone v.s. the standards gauntlet.

What about the operators of the big “high quality” bots.  It might be argued that the fraudulent bots are doing them some sort of reputation damage, but I find that hard to believe.  An slightly disconcerting thought is that they might pay some people in their standards office to run the gauntlet because this would create little barrier to entry for other spidering businesses.

The fourth constituency that might care is the internet architecture crowd.  I wonder if they are actually somewhat opposed, or at least ambivalent, to this kind of authentication.  Since it has the smell of an attempt to undermine anonymity.