Category Archives: programming

Why FIFO?

Amazon’s AWS has a message queue system, aka SQS, to which they have recently started adding a variant which assures that your messages are delivered in the same order that they got sent.  I.e. first in first out.

If you are surprised that they don’t do that by default you may enjoy thinking about what would be involved if the Post Office was to decide to offer this feature.

That said, I’ve been surprised that it took so long for this to show up and I remain surprised how slowly they are rolling it out.

So it is with some amusement I read in their doc an example of why you might want this feature.

“Display the correct product price by sending price modifications in the right order.”

Which I think helps explain why Seattle is one of the few AWS sites that supports it so far.

Tarsnap Notes

I set up tarsnap to backup one of my small small cloud servers.   Some notes on the hick-ups:

  1. Tarsnap’s install involves compiling it – that tells you about the overall tone :).  The compile requires this include file: “ext2fs/ext2_fs.h”.  My little server lacked that.  It took a while to find how to get it.  In this case the answer was: yum install e2fsprogs-devel
  2. There are two keys.  One is used to access your account on his server.  The second is used to encrypt etc. your backups.  I was puzzled about this file since I’d assumed it would encrypt the backups with one key (which would be installed on the machine(s) you backing up), and then a second key (the private key) would be used to decrypt them later.  Turns out the behavior is – sort of – optional.  The 2nd key you get fills both roles, and you need to use the key management tool if you want to make this distinction.

Estimating is hard.

Guesstimate is a delightful first draft of a tool to help clarify why we don’t know the answer to your question.  Here for example the user has tried to get a handle on how: “Taking down the tree, how long?”

taking down the christmas tree

I was a bit taken aback that my immediate reaction to this. A flashback to Simula.  What a lovely language.  I used back in the late 1960s, and then once more in the mid 70s.  It had things that didn’t become common in other languages for a long long time, like threads and classes.  It was very good at this kind of computation.  Simulation accounted for a big slice of all computing, back in the day.

I didn’t see any loops.

JMESPath is sweet

Back in 60s card images were the canonical format for data.  Unix inherited that and tools like, sed, awk, and even perl all have carry forward that legacy, though they call ’em lines.

There have been many attempts to get something more meaty to pump thru our pipe fittings. For example: Erlang’s got something, Lisp’s got something,  XML, Protocol buffers, JSON, and many many others.  JSON is quite popular this morning.

JMESPath is analogous to XML’s XPaths, but for JSON.  I first encountered JMESPath in the AWS CLI tool’s –query arguments.  But it is totally a stand alone thing.  There are implementations in many popular languages and the licenses are quite generous.   I like using it to pluck data out of API end-points.

One aspect of network effects is that when you capture one your rewards (or possibly your curse) is an immovable installed base.   JSON (and XML) have succeeded where better schemes designed in the past have failed is because they both now have an installed base, i.e. the vast number of HTTP based API end-points.  Getting those to switch to something better would be a PIA.  Worse is better and all that.

While JMESPath is targeted at JSON, you can often use it to dig data out of the native data structures in what ever language you’re using today.

It was a nerd prank when I was a freshman in college to blow large quantities punch card chad under another’s dorm room door.

Like any good DSL JMESPath take a bit of getting used to.  The interactive tutorial is sweet.

Frequency-hopping a server’s port

Here’s one of those ideas you have when you are not sleeping: why don’t we use frequency hopping to make it hard for attackers to find listeners to attack?

In scenarios where you want to keep the port number a secret,  you could randomly vary it’s location.  You could use TOPT, so both sides can rendezvous.  Seems this wouldn’t be that hard to add to ssh.  The sshd_config file might look something like this:

# Enable dynamic port listening, and the TOPT secret
Port dynamic 6000 16000
PortSecret 12345678901234567890

And the user’s ~/.ssh/config file would then have something like this in it

Host crazy.example.com
   Port dynamic
   PortSecret 12345678901234567890

You could let the PortSecret default to something derived from host key.

When ten commandments are not enough.

There are good reasons why people love a good set of rules about how to go about their jobs.  Here’s a new one:  12 Factor Micro-Services.  It’s part of the enthusiasm for containerizing everything, and it seems to live off the energy produced by the eternal tension between development and operations.

  • Codebase: One codebase tracked in revision control, many deploys
  • Dependencies: Explicitly declare and isolate dependencies
  • Config: Store config in the environment
  • Backing Services: Treat backing services as attached resources
  • Build, release, run: Strictly separate build and run stages
  • Processes: Execute the app as one or more stateless processes
  • Port binding: Export services via port binding
  • Concurrency: Scale out via the process model
  • Disposability: Maximize robustness with fast startup and graceful shutdown
  • Dev/prod parity: Keep development, staging, and production as similar as possible
  • Logs: Treat logs as event streams
  • Admin processes: Run admin/management tasks as one-off processes

I find this an odd list.  I don’t per-say have much issue with it, but are these really the top twelve things to keep in mind when designing the modern cloud based monster?

For example that point about codebase.  We all build systems out of many many 3rd party components, which means there are numerous codebases.  Why would that modularity be appropriate only at the project or firm boundary?

Which brings us to the question of how you take on fresh releases of third party components, and more generally how you manage refresh to the design of the data stores or the logging architecture, or the backup strategy.  All these are pretty central but they aren’t really on this list.

Which maybe why this is a list about “micro-services.”  Which again, I’m totally on board with. And yet, it seems to me like a fetishization of modularity.  Modularity is hard, it’s not cheap, and damn-it sometimes it just is not cost effective.  This is not about the dialectic between dev and ops, this is about the dialectic between doing and planning, or something.

It is  like the way people assume there is optimal size for a function when it is the outliers that are full of interest.

Generally when designing the outliers are.


Meanwhile, who says I can’t have teasers after my blog postings:

ssh-keyscan and waiting for servers to come online

Here’s a little trick.  Ssh-keyscan is useful for asking ssh daemons for their host keys.  People use it to provision their known_hosts file.

You can also use it to poll a ssh daemon – very use when waiting for a new server to boot up, including cloud servers.  That might look like this:

LOG waiting for sshd and get hostkey
while echo ".." ; do
    HOSTKEY="$( ssh-keyscan -T 20 $IP 2> /dev/null )"
    if [[ ouch != "ouch$HOSTKEY" ]] ; then
        echo "$HOSTKEY"
        break
    fi
done

LOG Add $NAME to known_hosts
cp ~/.ssh/known_hosts ~/.ssh/known_hosts.old
sed "/^$IP /d" ~/.ssh/known_hosts.old > ~/.ssh/known_hosts
echo "$HOSTKEY" >> ~/.ssh/known_hosts

That code is too optimistic, it assumes that the server will start.

And also: there are scenarios where ssh’s timeout parameters don’t work right.   So you can hang, inspite of that -T timeout.  Fixing that requires getting fresher versions of sshd.

auth-source: getting my secrets out of my emacs init file

I do not reveal my emacs init file publicly because it has secrets in it. For passwords (particularly for various APIs), and decryption keys in it.

But, the other day I discovered auth-source.   I used in this example to launch my IRC setup:

(defun start-erc ()
  "Wrapper for ERC that get's password via auth-source."
  (interactive)
  (let* ((server (erc-compute-server))
         (port (erc-compute-port))
         (credentials (auth-source-search :host server 
                                          :port (format "%s" port)
                                          :max-tokens 1)))
    (cond
     (credentials
      (erc :password (funcall (plist-get (car credentials) :secret))))
     (t
      (message "auth-source-search failed to find necessary credentials for irc server")))))

Auth-source-search will find my credentials in ~/.authinfo.gpg.  A line there like that looks like this: “machine irc.example.org port 12345 login luser password aPasWurd“.

Curious about hard it would be to fold that directly into the M-x erc</code> I read enough code to discover it calls thru to a function which does in fact call auth-source-search; so you can revise my function like so:

(defun start-erc ()
  "Start erc computing all the default connection details, which might get the password via auth-source."
  (interactive)
  (let ((password? nil))
    (erc-open (erc-compute-server)
              (erc-compute-port)
              (erc-compute-nick)
              (erc-compute-full-name)
              t  ;; connect
              password?))

I'm delighted.  But it, looks like this facility isn't used as much as I'd expect.

I found it because the helm-delicious package advised using it for my delicious password.

I was making good progress getting all my secrets out of the init file by have a function that would load all the secrets on demand loading an encrypted elisp file (load "secrets.el.gpg"). That works nicely too.

Maybe I should go read up on the secret storage scheme of vree desktop.

Docker #5 – benchmark? not really…

Given a Docker image you can spin up a container in lots of places.  For example on my Mac under Boot2Docker, at Orchard, or on Digital Ocean.   I don’t have any bare metal at hand, so these all involve the slight tax of virtual machine.

I ran the same experiment on these three.  The experiment launches 18 containers, serially.  The jobs they varied; but they are not very large.

180 seconds Boot2Docker
189 seconds Orchard
149 seconds Digital Ocean

These numbers are almost certainly meaningless!  I don’t even know what might be slowing things down: CPU, I/O, Swap, etc.

Interestingly if I launch all 18 containers in parallel I get similar results +/-10%.  The numbers vary only a few percent if I run these experiments repeatedly.  I warmed up the machines a bit by running a few jobs first.

Yeah.  Adding Google App Engine and EC2 would be interesting.

While Orchard charges $10/month v.s. Digital Ocean’s $5 their billing granularity is better.  You purchase 10 minutes, and then a minute at a time, v.s. Digital Ocean which bills an hour at a time.  Orchard is a little more convenient to use v.s. Digital Ocean.  A bit-o-scripting could fix that.

I’m using this for batch jobs.  Hence I have an itch: a batch Q manager for container runs.  That would, presumably assure that machines are spun up and down to balance cost and throughput.

Queue of Containers

Docker #4: Accessing a remote docker daemon using socat&ssh

Well!  This unix tool is pretty amazing.  Socat let’s you connect two things together, there the two things are pretty much anything that might behave like a stream.  There is a nice overview article written in 2009 over here.  You can do crazy things like make a device on machine A available on a machine B.  Running this command on A will bring us to machine B’s /dev/random:

socat \
  PIPE:/tmp/machine_a_urandom  \
  SYSTEM:"ssh machine_a socat - /dev/urandom"

What brought this up, you ask.

I have been up machines to run Docker Containers on, at Digital Ocean, for short periods of time to run batch jobs.  Docker’s deamon listens on a unix socket, /var/run/docker.sock, for your instructions.  I develop on my Mac, so I need to transmit my instructions to the VM at Digital Ocean.  Let’s call him mr-doh.

One option is to reconfigure mr-doh’s Docker deamon  to listening on localhost tcp port.   Having done that you can have ssh forward that back to your Mac and then your set/export the DOCKER_HOST environment variable and your good to go.

The problem with that is it adds the work of spinning up mr-doh, which if your only going to have him running for a short period of time adds to the tedium.

Well, as we can see in the /dev/urandom example above you can use socat to forward things. That might look like this:

socat \
    "UNIX-LISTEN:/tmp/mr-doh-docker.sock,reuseaddr,fork" \
    "EXEC:'ssh -kTax root@mr-doh.example.com socat STDIO UNIX-CONNECT\:/var/run/docker.sock'" &

Which will fork a soccat to manages /tmp/mr-doh-docker.sock.  We can then teach the docker client on the Mac to use that by doing:

export DOCKER_HOST=unix:///tmp/mr-doh-docker.sock
When the client uses it socat will fire up ssh and connect to the docker deamon's socket on mr-doh.  Of course for this to work you'll want to have your ssh key installed in root@mr-doh's authorized_keys etc.

For your enjoyment is a somewhat raw script which will arrange to bring the /var/run/docker.sock from mr-doh back home. get-docker-socket-from-remote prints the export command you’ll need.

It’s cool that docker supports TLS.  You have to setup and manage keys etc.  So, that’s another approach.