Category Archives: Uncategorized

A Good Day

A few misc. items…

Happiness & Economics … What a wonderfully weird chart this is.  It’s weird in two ways.  First off what the heck is going on in the US?  Secondly this is basically the inverse of the chart of happiness v.s. income.

Programming languages – There is a very nice dialect of Lisp build on top of the Python ecology.   Sort of analogous to the way clozure is built on top of the Java ecology.  It’s called hy. Very smooth interoperability with Python, across many Python implementations.  For example you can casually load libraries written in hy into python code and visa versa.  Macros, backquote, real lambdas, everything is value returning, etc.  Surprisingly it even works pretty well with the Python debugger, such as it is.

Pricing games – As an collector of amusing pricing games this article that attempts to puzzle out the details of MTA ticket pricing is fun.

Programming – I wish I could find a standard tool that would let me make a compressed archive and then insert a descriptive header of unpredictable size at the front of it.  Something suitable for when you are building an archive by streaming and after the fact you want to prepend the cataloging metadata.  I guess I’m just a bit surprised that this use case isn’t so common that we don’t have a widely used tool that supports it.

Current events – NYPD?  What a bunch of babies!

Tourist Info:  The Brooklyn Art Museum is amazing.

Optical character recognition for PDF files.

Pypdfocr is very nice.   The input is a PDF file, for example the output of scanner.  The output is another PDF, which looks like the original but now has the words recognized in it.  That lets you can search it, and if you index all your documents then that’s very useful.  Spotlight on my Mac sees into these.

You can extract the raw text using pdftotxt, which is nice for reading on the train.

I was delighted the it understands columns pretty well.  It is not so good at paragraph breaks though.

I gather that a some of people use this to scan all the paper, receipts, et. al. that comes into their home.  It has some clever switches to help with that usecase.

It is a bit of a pain to install, lots of homebrew packages and pip packages are required; and then – at least in my case – it works but it complains that I didn’t get it right.  There are pages that talk about these things; but I’m happy enough now.

Claw Back

Recently I’ve added Ponzi Tracker to my RSS reader and it’s fun in that way that we all enjoy things that feed our confirmation bias.  And who doesn’t love a story about a criminal.  Today’s post included this bit:

“… the court-appointed Receiver, Kenneth D. Bell, begins his quest to recover “false profits” from thousands of victims that were fortunate enough to profit from their investment.  The receiver’s efforts to recover these “false profits” will become markedly easier in the event that Burks pleads guilty to the fraud, since the guilty plea or conviction of a Ponzi schemer allow the use of the “Ponzi presumption” that significantly simplifies the burden of proof required in the so-called “clawback” actions.”

I didn’t know that.  It seems like a big gaping hole in the investor protections that encourage corporate risk taking.  The reason we have limits on investor liability is that it lets the investors delegate risk taking to the corporation while avoiding the worst case scenarios that they will be held responsible for the evil that firm does.  Their risk is limited to the amount of their investment.  Back in the day only the king had the power to get away with murder, but then it devolved to his friends.

So I’d love to know why Ponzi schemes are unique in this regard.  And I’d love to know that if we convicted a few large financial firms of just the right crimes we could then claw back the money from the “lucky” ones who cashed out early.

Any amateur social scientist knows the next question: What about incentives?  If you threaten investor class it creates an incentive.  Presumably the kings friends let this loophole appear because the victims of Ponzi schemes are somehow unique when compare to the other victims of corporate malfeasance.  Maybe it’s about affinity.  Which is ironic, as affinity is a common feature of Ponzi schemes, but in this case I think it might be that the Ponzi victims are called “investors.”

If only the victims of the mortgage crisis had called themselves investors.  If only we could learn use that phrase “false profits” more.

Bash Quoting

For years I’ve been frustrated by my inablity to puzzle out how to write this in bash:

H=$(dirname $0)

so it’s safe if $0 has spaces in it.

Clearly

H="$(dirname $0)"

is better, but still the $0 isn’t usefully quoted.

I finally complained to the shellcheck author that this should be in the FAQ, though there isn’t a FAQ. And he assured me if you do it right it isn’t a problem.  He also happened to mention the answer:

H="$(dirname "$0")"

I complained that just because it wasn’t a question if you’d ask if you knew the answer didn’t mean that it wasn’t a commonly asked question.  So he added (or updated) a wiki page to say that the $(…) creates a new “context”.

So there you go.

Precariat watch

This long profile of reasonably successful member of the precariat in the Times is worth reading, if you are curious about this trend.

“If you did the calculations, many of these people would be earning less than minimum wage,” says Dean Baker, an economist who is the co-director of the Center for Economic and Policy Research in Washington. “You are getting people to self-exploit in ways we have regulations in place to prevent.”

“These are not jobs, jobs that have any future, jobs that have the possibility of upgrading; this is contingent, arbitrary work,” says Stanley Aronowitz, director of the Center for the Study of Culture, Technology and Work at the Graduate Center of the City University of New York. “It might as well be called wage slavery in which all the cards are held, mediated by technology, by the employer, whether it is the intermediary company or the customer.”

fake bots and standards

I read this morning about bots that pretend to be Google.   I’m surprised to realise that I’m unaware of any standard scheme for a bot (or other HTTP client) to assert it’s identity in a secure way.  This seems like a kind of authentication, i.e. some sites would prefer to know they are being crawled by the authentic bot v.s. an imposter.

There is a list of the standard authentication schemes.  But none of them handle this use case.

This doesn’t look too difficult.  You need a way for agents to sign their requests.  So, you make another auth scheme.  Authentication schemes using this scheme include a few fields.  A URL to denote who is signing, and presumably the document associated with that URL has the public key for the agent.  A field that allows the server to infer exactly what the agent signed.  That would need to include enough stuff to frustrate various reply attempts (the requested url and the time might be sufficient).

More interesting, at least to me, is why we do not appear to have such a standard already.  We can play the cost benefit game.   There are at least four players in that calculation.

The standard’s gauntlet is a PIA.  For an individual this would be a pretty small feather in one’s cap.  And a long haul.  So the cost/benefit for the technologist is weak.  And this isn’t just a matter of technology, you also have to convince the other players to play into the game.

What about the sites the bot is visiting.  They play a tax for serving the bad bots.  The size of that tax is the benefit they might capture after running the gauntlet.  But meanwhile have alternatives. They aren’t very good alternatives, but they are probably sufficient for example they can whitelist the IP address ranges they think the good bots are using.  That’s tedious, but it’s in their comfort zone v.s. the standards gauntlet.

What about the operators of the big “high quality” bots.  It might be argued that the fraudulent bots are doing them some sort of reputation damage, but I find that hard to believe.  An slightly disconcerting thought is that they might pay some people in their standards office to run the gauntlet because this would create little barrier to entry for other spidering businesses.

The fourth constituency that might care is the internet architecture crowd.  I wonder if they are actually somewhat opposed, or at least ambivalent, to this kind of authentication.  Since it has the smell of an attempt to undermine anonymity.

Docker, part 2

The San Francisco Hook

I played with Docker some more.  It’s still in beta so, unsurprisingly, I ran into some problems.   It’s cool, none the less.

I made a repository for running OpenMCL, aka ccl, inside a container.   I set this up so the Lisp process expects to be managed using slime/swank.  So it exports port where swank listens for clients to connect.  When you run it you export that port, i.e. “-p 1234:4005” in the example below.

Docker shines at making it easy to try things like this.  Fire it up: “docker run –name=my_ccl -i -d -p 1234:4005 bhyde/crate-of-ccl”.   Docker will spontaneously fetch the everything you need.   Then you M-x slime-connect to :1234 and you are all set.  Well, almost, the hard part is  .

I have run this in two ways, on my Mac, and on DigitalOcean.  On the Mac you need to have a virtual machine running linux that will hold your containers – the usual way to do that is the boot2docker package.  On Digital Ocean you can either run a Linux droplet and then installed Docker, or you can use the application which bundles that for you.

I ran into lots of challenges getting access to the exported port.  In the end I settled on using good old ssh LocalForward statements in my ~/.ssh/config to bring the exported port back to my workstation.  Something like “LocalForward 91234 172.17.42.1:1234” where that IP address that of an interface (docker0 for example) on the machine where the container is running.  Lots of other things look like they will work, but didn’t.

Docker consists of a client and a server (i.e. daemon).  Both are implemented in the same executable.  The client chats with the server using HTTP (approximately).  This usually happens over a Unix socket.  But you can ask the daemon to listen on a TCP port, and if you LocalForward that back to your workstation you can manage everything from there.  This is nice since you can avoid cluttering you container hosting machine with source files.  I have bash functions like this one “dfc () { docker -H tcp://localhost:2376 $@ ; }” which provides a for chatting with the docker daemon on my Digital Ocean machine.

OpenMCL/ccl doesn’t really like to be run as a server.   People work around by running it under something like screen (or tmux, detachtty, etc.).  Docker bundles this functionality, that’s what the -i switch (for interactive) requests in that docker run command.  Having done that you can then uses “docker log my_ccl” or “docker attach my_ccl” to dump the output or open a connection to Lisp process’ REPL.   You exit a docker attach session using control-C.  That can be difficult if you are inside of an Emacs comint session, in which case M-x comint-kill-subjob is sometimes helpful.

For reasons beyond my keen doing “echo ‘(print :hi)’ | docker attach my_ccl” get’s slightly different results depending on Digital Ocean v.s. boot2docker.  Still you can use that to do assorted simple things.   UIOP is included in the image along with Quicklisp, so you can do uiop:runprogram calls … for example to apt-get etc.

Of course if you really want to do apt-get, install a bundle of Lisp code, etc. you ought to create a new container built on this one.  That kind of layering is another place where Docker shines.

So far I haven’t puzzled out how to run one liners.  Something like: “docker run –rm bhyde/crate-of-ccl ccl -e ‘(print :hi)'” doesn’t work out as I’d expect.  It appears that argument pass thru, arg. quoting, and that the plumbing of standard IO et. al. is full of personality which I haven’t comprehended.  Or maybe there are bugs.

That’s frustrating – I undermines my desire to do sterile testing.

 

Docker is interesting

Somebody mentioned Docker during a recent phone interview, so I went off to have a look.  It’s interesting.

We all love sandboxing.  Sandboxing is the idea that you could run your computations inside of a box.  The box would then protect us from whatever vile thing the computation might do.  Visa versa it might protect the computation from whatever attacks the outside world might inflict upon it.   There are many ways to build a sandbox.   Operating systems devote lots of calories to this problem.  I recall a setting in an old Univac operating system that set a limit on how many pages a user could print on the line printer.   Caja tries to wrap a box around arbitrary JavaScript so it can’t snoop on the rest of the web page.  My favorite framework thinking about this kind of thing is capabilities.  Probably because I was exposed to them back at CMU in the 1970s.

Docker is yet another scheme for running stuff in a sandbox.  They call these containers, like a standardized shipping container.  I wonder if they actually took the time to read “The Box: …“, since it’s an amazing book.

Docker is also the usual hybrid open-source/commercial/vc-funded kind of thing.  Of course it has an online hub/repository.  Sort of like the package managers do; but in this case run by the firm.  Sort of like github.  The business model is interesting, but that’s – maybe – for another post.

Docker stands on a huge amount of work done over the last decades by operating system folks on the sandboxing problem.  It’s really really hard to retrofit sandboxing into an existing operating system.   The redesigned thing is likely to have a lot of rough edges.  So – on the one hand – Docker is a system to reduce the rough edges.  Let meer mortals can play with sandboxes, finally.  But it is also trying to build a single unified API across the diversity of operating systems.  In theory that would let me make a container which I can then run “everywhere.”   “Run everywhere” is perennial eh?

Most people describe docker as an alternative to virtual hosting (ec2, vmware, etc. etc.).  And that’s true.  But it’s also an alternative to package managers (yum, apt, homebrew, etc. etc.).   For example say I want to try out “Tiny Tiny RSS,” which is a web app for reading RSS feeds.  I “just” do this:

docker run -name=my_db -d nornagon/postgres
docker run -d --link my_db:db -p 80:80 clue/ttrss
open http://localhost/

Those three lines create two containers, one for the database and one for the RSS reader.  The 2nd link links the database into the RSS container, and exposes the RSS reader’s http service on the localhost.  The database and RSS reader containers are filled in with images that are downloaded from the central repository and caches. Disposing of these applications is simple.

That all works on a sufficiently modern Linux since that’s where the sandboxing support docker depends on is found.  If your are on Windows or the Mac then you can install a virtual machine and run inside of that.  The installers will set everything up for you.

Sorting Hat

I enjoyed this graphic showing the association between a person’s profession and what they studied in school.  If you click on one of the boxes on the left or right you can focus on that. Reload to return to the first view.

Where do salesmen come from?

The distributions are what I found interesting.  If you study “mass media” it is hard to predict what you’ll be doing down the road.  If you study electrical engineering it’s easier to predict.  You’d think that would be something schools ought to tell their students.  If you meet somebody who’s a lawyer, a manager, or in sales it’s hard to predict what they studied in school.  It makes me wonder if there is unmet demand for schools aligned to some of these professions.

The distributions inside of professions must say something about the dynamics for workers in the profession.  Obviously if 50% of your peers all studied the same things in school it must make it a bit tedious for the 50% who didn’t study that narrow speciality.  This kind of data lest you measure how “professionalized” a trade is.

Looking at these for a bit you start thinking that some of the categories are pretty arbitrary.  For example about a third of the salesmen in the sample above studied variations on “business.”  That just leads to wondering how much difference is there between these category names?