fake bots and standards

I read this morning about bots that pretend to be Google.   I’m surprised to realise that I’m unaware of any standard scheme for a bot (or other HTTP client) to assert it’s identity in a secure way.  This seems like a kind of authentication, i.e. some sites would prefer to know they are being crawled by the authentic bot v.s. an imposter.

There is a list of the standard authentication schemes.  But none of them handle this use case.

This doesn’t look too difficult.  You need a way for agents to sign their requests.  So, you make another auth scheme.  Authentication schemes using this scheme include a few fields.  A URL to denote who is signing, and presumably the document associated with that URL has the public key for the agent.  A field that allows the server to infer exactly what the agent signed.  That would need to include enough stuff to frustrate various reply attempts (the requested url and the time might be sufficient).

More interesting, at least to me, is why we do not appear to have such a standard already.  We can play the cost benefit game.   There are at least four players in that calculation.

The standard’s gauntlet is a PIA.  For an individual this would be a pretty small feather in one’s cap.  And a long haul.  So the cost/benefit for the technologist is weak.  And this isn’t just a matter of technology, you also have to convince the other players to play into the game.

What about the sites the bot is visiting.  They play a tax for serving the bad bots.  The size of that tax is the benefit they might capture after running the gauntlet.  But meanwhile have alternatives. They aren’t very good alternatives, but they are probably sufficient for example they can whitelist the IP address ranges they think the good bots are using.  That’s tedious, but it’s in their comfort zone v.s. the standards gauntlet.

What about the operators of the big “high quality” bots.  It might be argued that the fraudulent bots are doing them some sort of reputation damage, but I find that hard to believe.  An slightly disconcerting thought is that they might pay some people in their standards office to run the gauntlet because this would create little barrier to entry for other spidering businesses.

The fourth constituency that might care is the internet architecture crowd.  I wonder if they are actually somewhat opposed, or at least ambivalent, to this kind of authentication.  Since it has the smell of an attempt to undermine anonymity.

ssh-keyscan and waiting for servers to come online

Here’s a little trick.  Ssh-keyscan is useful for asking ssh daemons for their host keys.  People use it to provision their known_hosts file.

You can also use it to poll a ssh daemon – very use when waiting for a new server to boot up, including cloud servers.  That might look like this:

LOG waiting for sshd and get hostkey
while echo ".." ; do
    HOSTKEY="$( ssh-keyscan -T 20 $IP 2> /dev/null )"
    if [[ ouch != "ouch$HOSTKEY" ]] ; then
        echo "$HOSTKEY"

LOG Add $NAME to known_hosts
cp ~/.ssh/known_hosts ~/.ssh/known_hosts.old
sed "/^$IP /d" ~/.ssh/known_hosts.old > ~/.ssh/known_hosts
echo "$HOSTKEY" >> ~/.ssh/known_hosts

That code is too optimistic, it assumes that the server will start.

And also: there are scenarios where ssh’s timeout parameters don’t work right.   So you can hang, inspite of that -T timeout.  Fixing that requires getting fresher versions of sshd.

auth-source: getting my secrets out of my emacs init file

I do not reveal my emacs init file publicly because it has secrets in it. For passwords (particularly for various APIs), and decryption keys in it.

But, the other day I discovered auth-source.   I used in this example to launch my IRC setup:

(defun start-erc ()
  "Wrapper for ERC that get's password via auth-source."
  (let* ((server (erc-compute-server))
         (port (erc-compute-port))
         (credentials (auth-source-search :host server 
                                          :port (format "%s" port)
                                          :max-tokens 1)))
      (erc :password (funcall (plist-get (car credentials) :secret))))
      (message "auth-source-search failed to find necessary credentials for irc server")))))

Auth-source-search will find my credentials in ~/.authinfo.gpg.  A line there like that looks like this: “machine irc.example.org port 12345 login luser password aPasWurd“.

Curious about hard it would be to fold that directly into the M-x erc</code> I read enough code to discover it calls thru to a function which does in fact call auth-source-search; so you can revise my function like so:

(defun start-erc ()
  "Start erc computing all the default connection details, which might get the password via auth-source."
  (let ((password? nil))
    (erc-open (erc-compute-server)
              t  ;; connect

I'm delighted.  But it, looks like this facility isn't used as much as I'd expect.

I found it because the helm-delicious package advised using it for my delicious password.

I was making good progress getting all my secrets out of the init file by have a function that would load all the secrets on demand loading an encrypted elisp file (load "secrets.el.gpg"). That works nicely too.

Maybe I should go read up on the secret storage scheme of vree desktop.

Docker #5 – benchmark? not really…

Given a Docker image you can spin up a container in lots of places.  For example on my Mac under Boot2Docker, at Orchard, or on Digital Ocean.   I don’t have any bare metal at hand, so these all involve the slight tax of virtual machine.

I ran the same experiment on these three.  The experiment launches 18 containers, serially.  The jobs they varied; but they are not very large.

180 seconds Boot2Docker
189 seconds Orchard
149 seconds Digital Ocean

These numbers are almost certainly meaningless!  I don’t even know what might be slowing things down: CPU, I/O, Swap, etc.

Interestingly if I launch all 18 containers in parallel I get similar results +/-10%.  The numbers vary only a few percent if I run these experiments repeatedly.  I warmed up the machines a bit by running a few jobs first.

Yeah.  Adding Google App Engine and EC2 would be interesting.

While Orchard charges $10/month v.s. Digital Ocean’s $5 their billing granularity is better.  You purchase 10 minutes, and then a minute at a time, v.s. Digital Ocean which bills an hour at a time.  Orchard is a little more convenient to use v.s. Digital Ocean.  A bit-o-scripting could fix that.

I’m using this for batch jobs.  Hence I have an itch: a batch Q manager for container runs.  That would, presumably assure that machines are spun up and down to balance cost and throughput.

Queue of Containers

Docker #4: Accessing a remote docker daemon using socat&ssh

Well!  This unix tool is pretty amazing.  Socat let’s you connect two things together, there the two things are pretty much anything that might behave like a stream.  There is a nice overview article written in 2009 over here.  You can do crazy things like make a device on machine A available on a machine B.  Running this command on A will bring us to machine B’s /dev/random:

socat \
  PIPE:/tmp/machine_a_urandom  \
  SYSTEM:"ssh machine_a socat - /dev/urandom"

What brought this up, you ask.

I have been up machines to run Docker Containers on, at Digital Ocean, for short periods of time to run batch jobs.  Docker’s deamon listens on a unix socket, /var/run/docker.sock, for your instructions.  I develop on my Mac, so I need to transmit my instructions to the VM at Digital Ocean.  Let’s call him mr-doh.

One option is to reconfigure mr-doh’s Docker deamon  to listening on localhost tcp port.   Having done that you can have ssh forward that back to your Mac and then your set/export the DOCKER_HOST environment variable and your good to go.

The problem with that is it adds the work of spinning up mr-doh, which if your only going to have him running for a short period of time adds to the tedium.

Well, as we can see in the /dev/urandom example above you can use socat to forward things. That might look like this:

socat \
    "UNIX-LISTEN:/tmp/mr-doh-docker.sock,reuseaddr,fork" \
    "EXEC:'ssh -kTax root@mr-doh.example.com socat STDIO UNIX-CONNECT\:/var/run/docker.sock'" &

Which will fork a soccat to manages /tmp/mr-doh-docker.sock.  We can then teach the docker client on the Mac to use that by doing:

export DOCKER_HOST=unix:///tmp/mr-doh-docker.sock
When the client uses it socat will fire up ssh and connect to the docker deamon's socket on mr-doh.  Of course for this to work you'll want to have your ssh key installed in root@mr-doh's authorized_keys etc.

For your enjoyment is a somewhat raw script which will arrange to bring the /var/run/docker.sock from mr-doh back home. get-docker-socket-from-remote prints the export command you’ll need.

It’s cool that docker supports TLS.  You have to setup and manage keys etc.  So, that’s another approach.

Docker, part 3

Docker containers don’t support multicast, at least not easily.   I find that a bummer.

It’s unclear why not.  Well, the most immediate reason is that the networking interfaces they create for the containers don’t have the necessary flag to enable multicast.  That, at least according to the issue, is because that’s how Linux defaulted these interfaces.    Why did they did they do that?

This means that any number of P2P or (masterless) solutions don’t work.  For example zeroconf/mdns is out.  I guess this explains the handful of custom service discovery tools.  Reinventing the wheel.

In other news… Once you have Boot2Docker setup you need to tell the docker command were the docker daemon is listening for instructions. You do that with the -H switch to docker, or via the DOCKER_HOST environment variable.   Typically you’d do:

export DOCKER_HOST=tcp://

But if your feeling fastidious you might want to ask boot2docker for the IP and port.

export "DOCKER_HOST=tcp://$(boot2docker ip 2> /dev/null):$(boot2docker info | sed 's/^.*DockerPort.:\([0-9]*\).*$/\1/')"


Boot2Docker lets you run Docker Containers on your Mac by using VirtualBox to create a stripped down Linux Box (call that DH) where the Docker daemon can run.   DH and your Mac have a networking interface on a software defined network (named vboxnet) created by Virtual Box.  The containers and DH have networking interfaces on a software defined network created by the Docker daemon.  Call it SDN-D, since they didn’t name it.

The authors of boot2docker did not set things up so your Mac to you connect directly to the containers on sdn-d.  Presumably they didn’t think it wise to adjust the Mac routing tables.  But you can.  This is very convenient.  It lets you avoid most of the elegant, but tedious, -publish or -publish-all (aka -p, -P) switches when running a container.  They hand-craft special plumbing for ports when running with containers.  It also nice because DH is very stripped down making it painful to work on.

So, I give you this little shell script: establish-routing-to-boot2docker-container-network.   It adds routing on the Mac to SDN-D via DH on vboxnet.  This is risky if SDN-D happens to overlap a network that the Mac is already routing to, and the script does not guard against that.  See bellow for how to deal if you have that problem.

If your containers have ssh listeners then you can put this in your ~/.ssh/config to avoid the PIA around host keys.  But notice how it hardwires the numbers for SDN-D.

Host 172.17.0.*
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null
  User root

The numbers of SDN-D are bound when the Docker daemon launches on DH.  The –bip switch, used when the docker daemon launches, can adjusts that.  You setting it in /var/lib/boot2docker/profile on DH via EXTRA_ARGS.    Do that if you have the overlap problem mentioned above.  I do it because I want SDN-D to be small.  That let’s nmap can scan it quickly.

If you’ve not used ~/.ssh/config before, well you should!  But in that case you may find it useful to know that ssh uses the first setting it finds that Host block should appear before your global defaults.


Demand Generation

The standard model for economic recessions is that people become nervous and then proceed to hide under the mattress with their money.  That reduces economic demand, which in turn makes everybody more nervous.  What you need is to tempt them, and their money, out of hiding.

If you are a commercial enterprise and you want to tempt people to spend their money you labor to generate more demand for your product.  Demos, freebies, events to entice ‘em into the store, advertising, coupons, etc. etc.

Recently I stumbled upon an interesting variation on this.  Like most vendor loyalty clubs, Sears/KMart has a loyalty club where you earn points.  I’m quite impressed by the range of tricks they play with this micro-currency.

For example they use classic intermittent conditioning to train customers to come into the store.  When you enter the store their app, on your smart phone, randomly rewards you with some points.  I got $10 the first time, presumably so I’d be sure to know that visiting the store might be valuable.

For example they used the class trick learned during the great depression that if your want your micro currency to work well you should design it so it loses value if you horde it.  So many of the points you get have a limited lifetime.

One trick they use that I find particularly thought provoking how they pay you to shop.  They have dozens of “contests” where you answer a few questions (“How old is your Vacuum Cleaner?”  “How often do you use your Vacuum Cleaner?” …) and then you must “like” a few items in the Vacuum cleaner section of the catalog.  Having done this you are then entered into a drawing for – oh – a 100$ worth of points.   They actually tell you how many people have entered, so I can see that for a very low cost they got a thousand people to think about their Vacuum Cleaner and browse the catalog for a new one.

It’s a glimpse of the future.  For example the micro currency lets them make micro payments for all kinds of activities that aid their business.  For example 50 points (5 cents) for reporting your weight to their little fitness club.  They also let you register your fitness device or app and get micro-payments for exercise, which only raises the question why it is taking so long for the fitness device makers to sell that service to health insurance companies?

Why don’t airlines or the grocery stores play these games?  Why doesn’t Amazon?  I think mostly because they haven’t realized the potential in their loyalty card/points programs to engage in behavioral gimmicks to train customers and draw customers into the shopping.  But of course to a limited extent they do play these games.

I find the idea that the Government could use techniques along these lines to generate address a recession extremely provocative.   Hire 5% more workers and it enters your firm into a sweepstake for $250 thousand dollars, etc. etc.



Things like this make me nervous.

A quick install via a bash script pulled from a URL:

curl https://mr-trusty.org/install-fun-thing.sh | sudo bash

New to me: this quick trick to install something into your /usr/local/bin with docker’s help.

docker run -v /usr/local/bin:/target my-trusty/fun-thing

And then we have this handy way to install via emacs…

 (lambda (s)

It’s all about short term benefits and hardly about the risks.