Category Archives: identity

Identity – Magic Happens

The simple business model for all businesses in the identity space:

IDBizModel.png

The identity business resides inside the cloud labeled “magic happens.” Identity business fulfill demand that comes from web sites. They only indirectly serve users. The thrives if it can generate sufficient magic to satisfy the web site owner. He, of course won’t be satisfied if the user isn’t reasonably happy. Of course if the business can’t get enough users nobody will be happy.

Sufficient magic is actually the sum of all magic over the set of all users served. So a little magic per user will be enough if the number of users is huge.

That drawing is a bit misleading. Treat the idea of a customized experience very flexibly. A web site that denies a credit charge based on the user model magic makes is customizing the experience for the thief’s. Other users might not even notice that kind of customizing.

Pirvacy Illusion or Quiz?

The marvelously clueful Jon Udell writes about the return of hailstorm like systems. One line caught my eye.

“Re-entering the basic facts each time perpetuates an illusion of privacy. Yet the reality, for many of us, is that these facts are public.”

Yes! “The illusion of privacy.” Very nice.

But .. four additional things too.

1. It’s very very hard to pre-fill with 100% accuracy. Even small error rates are enough to overwhelm any increase in sales that may arise from greater ease of use, particularly if the blame for those errors falls to the vendor.

2. The vendor wishes to frame the relationship with the customer as respectful to the customer. Both vendor and customer may know that such information is widely available but polite people don’t bring it up. If you do then it’s unclear if you can be trusted with the more semi-private information. The vendor that pre-fills you home phone number seems more likely to reveal your pants size, or your color preference to strangers.

3. Sales is like poker. You don’t want to reveal anything during the negotiation that your not absolutely positive is going to advance the deal toward closure. For example consider you are buying a car and it has floor mats with a stain resistant coating on them. No salesman in his right mind will tell you that unless you explicitly reveal that your worried about the carpets getting dirty. Unless he’s sure that the information will fill a need the customer has mentioned it goes mentioned. Who knows, maybe this customer is afraid of cancers caused by such coatings. Vendors are very ambivalent about revealing their hand.

4. Finally some of these forms are actually part of a quiz. The form is a means of reducing identity theft. The credit card company can use any information the vendor collects at point of sale to reduce the chance of fraud.

You have to get authorization prior to all form pre-filling; not just from the customer but from everybody who has a stake in the data. For example your going to have trouble get medical records without the care giver’s permissions as well as the patient’s permission.

Even very mundane seeming revealing of information have strong systemic barriers casual revealing.

Anonymous Reputation

I’m finding it very interesting to look at the challenges of creating a reputation system that allows it’s participants to remain anonymous. I think this is key. The right solution to the Internet identity design problem must support keeping the users identity compartmentalized. Only that can maintain privacy. If on the one hand we want to have communications that are more usefully tied to an actor’s reputation while on the other hand we to keep that actor’s total identity fragmented then we must find a way for him to maintain a number of persona in the net. The basic persona he adopts should be quite private, quite anonymous.

Consider as a benchmark the spam problem. This is the problem of guarding open systems from bad actions and bad actors. This problem arises in both open comment systems (i.e. blog comments), open web site editing systems (i.e. wiki’s), open messaging systems (i.e. internet email), and of course open source, and open science.

All the solutions focus on sorting. Sorting actions into good ones and bad ones. Sorting actors into good ones and bad ones.

Lots of tricks exist for sorting the actions. For example filtering out postings with bad words; or links to bad sites. For example training statistical recognizers to let good things thru and shuttle bad looking things off for further analysis or disposal. Having a moderator or editor that passes judgment on the individual actions.

The bad actor mechanisms work by building a model of various actors. Then when sorting actions we inform that by the reputation of the actors involved. “Oh look it’s the 10th posting from the same IP address in 30 seconds.” You might glance at the sender of an email message and say “Oh, Bob. He’s a good egg.” or “Ah email from apache.org, they’re cool.”

By design, for privacy reasons, most internet protocols make mapping from actions back to actors is very sloppy. It wouldn’t be hard, technically, to fix this. For example sender could sign every message using a private key. Then recipients could, with the help of some directory services, map that back to the sender and from there to any number of services that could vouch for his reputation.

This hasn’t happened both because shifting the installed base to some standard solution would be hard; but more so because the this would assure the total collapse of any privacy for senders. It would make every message they send part of their record. That this record is highly distributed today is small comfort. It would enable big brother.

Any system that is going to be popular with real people for casual usage needs to allow for anonymous senders. And it’s not just the senders who desire this. If I’m running any one of the many kinds of open systems enumerated at the head of this message I don’t wish to demand full disclosure by my contributors. I only want two things: I want lots of contributions and I want a way to temper the damage done to my systems from bad actors. If I’m running a retail store I don’t want to demand that my visitors reveal their entire persona just to browse my offerings!

Is it possible to have useful actor reputation systems without demanding that the actors give up their privacy? This is a key design problem.

It appears that the answer is yes. Consider as an example. Let’s say I have an excellent reputation in some community. I request that community write me a letter of introduction to the anonymous community. This letter says nothing more than the bearer of this letter is a good guy. I take the note to the anonymous community and they provide me with an reputation/identity that I can use to on anonymous actions. Recipients of those actions can then check that anonymous reputation. If I act badly in that persona then they place bad marks on the anonymous reputation; but it these do not go back to my original reputation – there is no back pointer. The only back pointer available is the link to the original community. I have damaged the reputation of my home community, and only that.

It’s an interesting cryptographic design problem. Could we design a system where sufficiently bad actions on the part of the anonymous actor can be feed back to his original persona but that does not require that we trust the anonymous reputation communities to guard his privacy otherwise.

privacy stories

This morning I read an article about the resignation of a VP at a discount airline Let’s call it airline B. I loved the story as a marvelous example of secrecy and privacy in the modern world.

Three airlines. Guy once worked for airline C; which was later acquired by airline A. So this guy was allowed a nice discount for flights on airline A. Favored folks, like this guy, were allowed to fly cheap on seats that would be otherwise empty. How was he to know what seats were empty? He could log into a special website.

So this guy goes to work for airline B; which competes with airline A. Time passes and the folks at airline B write some software to log into the special website and using that info they know which routes are highly booked and which aren’t. Presumably they use that in their own route planning.

Airline A finds out about Airline B’s clever scheme. What do they do? They hire some guys to pick up the trash at the home of some VP at Airline B. Curious about his eating habits? Nope, they are looking for paper.

Oh no! The Trash has been thru a shredder! No problem. They hire a company to digitally reconstruct his trash. (Don’t you just love that phrase: “digitally reconstructed his trash”?”

Of course this is all hearsay since but I read it in the New York Times.

Where’s the crime? Is it a crime to use data you glean from a web site? Is it a crime to collect somebody’s trash? Are there limits to what you can do with that trash? Well?

LENS

I’m making a collection of model aggregators; i.e. firms that collect information about a group of people from many parties and then turn around and reveal that to another group.

The New York State driver’s license system is interesting. The regulations that govern the revealing of driver’s license information are found in a law: DPPA or Driver Privacy Protection Act. “Permissible use” enumerates who’s allowed to get the data. Boy! They permit revealing to a lot of parties.

I was particularly struck by LENS; or License Event Notification System. It appears that you can get a speeding ticket on the way to work and before you get into the office your boss’s Human Resources department can know about it. And people complain about inefficient government!

Here some other examples:

I’d love to know about other examples!

Making a collection of members of this species is helps in discovering a list of what attributes they tend to have. Here’s a very short list, as an example.

  • Event notification.
  • Permissible use.
  • Dispute resolution.
  • Foo Privacy Protection Act (i.e. DPPA, HIPPA, …)

A future so bright there is no place to hide!

Identity/Privacy – This week’s model

I’ve spent much of the week playing with different model of the identity problem than I usually use. This model arose because I wanted to draw some pictures to help people visualize how an Joe’s internet identity is the union of models held by the firms: your bank has one, DoubleClick has another, etc.

I spun this story: Joe couldn’t get a mortgage from Mort, his mortgage company.  Mort found something that in his his credit report. Mort bought that report from Cret, a credit reporting firm. Cret got the troubling information about Joe from his a bank.  Seems Joe opened an account at the Bank and then immediately had a NSF (no sufficient funds) event.  I happen to know that this happened because the bank charged Joe for his new checks before this very same bank had cleared the funds for his initial deposit; but yeah Mort and Cret don’t know that.
Now initially my plan was just to use that story to point out how there were four models of joe in the story. Joe’s, the bank, Cret, and Mort’s.  But, my attention wandered.  I drew the picture below of the relationships. At that point I got interested in the relationships, the transactions flows, and the governance rules around them.  Notice how Cret has no relationship with Joe.
JoeMortgage4Players.png
This helps you to think clearly about the rules the privacy puzzles in this story. For example Joe, probably unknowingly, licensed the bank to reveal information when he signed up for his account. In addition to the contracts that govern the relationships between pairs in that drawing there are the laws and regulations of various governments and industry consortium in play; for example my state has laws on the books to give Joe at least a chance to deal with Cret. Of course there are also ethical and cultural norms.
We can also add in the various models and the cycle of revealing that got Joe’s mortgage request declined.
JoeMortgageWithModelsEtc.png
When thinking about problems like this I try to sympathize with each of the roles. The mortgage company, for example, is seeking to do due diligence, or a background check, on Joe. They are looking for a trusted third party or at least a disinterested third party.  Notice how Cret’s lack of a direct relationship with Joe adds to their claim of being a disinterested 3rd party.

I also like to engage in various exaggerations of the model. See what turns up when you look at edge cases. So here’s another story. Later that day Joe got turned down for a date with Sally. Sally asked her friends about Joe. The rumor mill reported back that Joe was a slob. Since this is my story I happen to know that the rumor mill came to know this because fastidious Mary once saw Joe when Joe was having a particularly bad hair day. This story has the same four players and the same schematic as Joe’s tough time with the mortgage company.
JoeDate.png
Joe’s desired to fix the model of the credit reporting firm. Now he want’s to fix the model the rumor mill has of him. Joe’s got some real challenges ahead! My state has rules and regulations to help him with the first. My culture, the American one, has rules too – we love to hand out second chances.

In the dating story the rumor mill fills the role of disinterested third party when Sally goes off to get a background check on Joe. Joe’s model of that is that they are talking behind his back. Which they are.

As a break from the fun of playing with the stories these are possible names for the four roles shown in these examples.

  • Model Revealer – aka Joe’s Bank, or Fastidious Mary.
  • Model Aggregator – aka the credit reporting firm, or the rumor mill.
  • Model Builder – aka the mortgage company, or Sally.

Then I read this article about alibi clubs (see also). Average Joe’s solving these problems for them selves!

So a third story, this time with an alibi club. Joe turns out to be a jerk. In fact he’s married! His wife suspects he’s running around. She accuses him of trying to get a date with Sally. But wait! Joe is a member of an alibi club. He sends a text message to the club. “Quick need alibi!” and one of the members volunteers and gives him a call. They work out an alibi and later as the argument with his wife proceeds he says. “Look you don’t trust me? Call the garage! They’ll back me up. Here I’ll, damn it, I’ll even get you the damn number.” Wife calls ‘garage’ and alibi is delivered.

In this story the alibi club has captured the role of trusted disinterested third party. Joe and his friends in the alibi club have cut out the middleman and fraudulently simulated the top part of the revealing cycle. Oh my God! It’s another example of the Internet disintermediating!

As one of my friends pointed out at about this point; this model is an excellent generator of stories and crimes.

The Five Roles of Identity

At it’s heart the problem of network identity is how to manage the model of the user available to web sites. User’s dream of a design that’s explicit, practical, and respects their privacy. Web sites covet different aspects of the user-model model. The fashion web site may desire to know the user’s hair color. The travel web site may desire to know when your employeer is planning a summer shutdown. The bank site may desire to know a statement of account of your current mortgage.

The demand for better models of visitors is what drives the market for solutions in the identity market. For example it’s what keeps DoubleClick in business. DoubleClick aggregates a statistical model of users from their browsing habits and then sell that to web sites. Web sites then use that to target their marketing. For all I know if you tell one of their clients your hair color then DoubleClick may well add that to their model.

Such implicit, statistical models of users don’t scale up to handle the revealing of more serious information (i.e. medical records, mortgage statements, video rental records, etc.), because of regulatory protections. Sadly some cases these regulatory protections are no more solid than the community expectations. I would certainly make a fuss if L.L. Bean sold information about my pants size to Amazon; but I wouldn’t actually be surprised. Few of us are surprised that if you reveal your a wish list or rate a product at Amazon it effects how they customize the web site for you.

The design challenge here is how to make the management of this revealing more explict. Something that users can understand, manage, manipulate, control. Something that regulators can then write practical rules about. Something that can be governed well. Something that tempers how much power concentrates into a few hands. If such hubs are absolutely necessary we presumably want to assure they are well governed. Tough problems.

Any solution will have to respect and balance the concerns of all the market particpants. Broadly there are five roles in this passion play. In the long run none of these is weak. Users, though, tend to be slow in exercising their power.

IDFiveRoles.png

Intermediaries get a lot of the attention here. DoubleClick, Passport, or Gator are comercial examples. These players dream of solutions that tend to concentrate the power in the market into their hands. The regulatory foundation also gets a lot of attention. That includes: standards bodies like the Liberty Alliance; pseudo-standards certification organizations like eTrust; governments (e.g. EU’s privacy regulations). The regulators tend to dream of getting a single standard to “rule them all.” They also tend to work to limit how much market concentration emerges in the roles above them.

The solution vendors, i.e. the folks that don’t actually run services but instead provide tools to those that do, may dream of owning the entire market but they are also very interested in assuring that a large number of customers for their tools survive. There a huge number of examples of in this role just to pick to random examples: the authentication tools found thru-out the open source middleware community; and Novell’s Oblex solution that is widely used inside firms. There is a notable subgroup in the solution provider space, the patent holders. Note also that standards bodies often provide a means to aggregate a patent portfolio.

Many real world examples are hybrids of these five classes. For example Yahoo, which is primarily a site, also does authentication ala Passport for some partners. These hybrids seem to have internal tensions between their roles.

Market concentration in all these catagories is, presumably, power-law distributed. For example DoubleClick and Passport are both in the top hundred traffic sites.

Interesting market, interesting design problem.

Obfuscated Revealing, Hiding Identity, LOAF

LOAF is a clever attempt to design a system that would assign a reputation score to a email address. You use these to help reduce spam. The reputations are assembled out of contributions made by your correspondents. They contribute their address books- well not quite. That would be far too revealing. Instead they obfuscate their address books before revealing them.

How’s this work? When correspondents volunteer to reveal their address book via LOAF they actually reveal just a useful bit vector. You collect these bit vectors. Later when you get email from sender X who you’ve never heard of and your wondering is X a spammer you can use these bit vectors and LOAF to make a rough approximate check to see if X is already in any of your friend’s address books. The LOAF calculation might reply there is an 80% chance X is in Bob’s address book. Obviously this is useful input to automation trying to decide if X is a spammer. Obviously there are limits on it’s value. For all you know some spammer has been forged X’s address.

The trouble in that scenario is that you also just invaded Bob’s privacy a little. You since you know have an 80% certainty that Bob and X are correspondents. At least enough so that Bob put X into his address book. But wait! X is your mistress! What’s Bob doing corresponding with her?

Consider how you could map over your address book to get an approximation of who’s in your address book that’s also in Bob’s. What! Bob doesn’t have the boss in his address book?!? Yeah it looks like Bob’s got the same bookie I do! Of course you can use other sets of email addresses to attempt to distill out a better model of Bob’s address book; for example the addresses in mailing lists you think he might lurk in.

So, the question this raises. How well can such a design work? How little of one’s identity can you keep private while contributing to such an enterprise. If you could get this design pattern to work well it might be a big help. It is clear though that it’s easy to do this badly.

The key to thinking thru how to attack such designs is recognizing that the attacker will have a very large set of obfuscated revealing to work with. That’s why the hashed email addresses in FOAF files are a lousy solution. It’s just too easy to start with that data and with a little sophistication and a bit of brute force the attacker can usually reverse the obfuscation.

This paper about how to figure out if a given laboratory is engaged in secret bio-weapons research is another example of how hard it is to avoid incidental revealing.

I’d be very interested to read some material that critiques the LOAF design! I’d be more interested in reading something about how to design systems that do obfuscated public revealing to enable various kinds of questions to be asked without enabling aggregation to overwhelm the partial revealing implicit in the obfuscated revealing. It’s not clear to me it’s actually possible to design such systems.

Update: All the attacks on Bob’s privacy work because we know that we have Bob’s bit vector. For the scheme to be useful what we need is bit vectors of presumably trust worthy people. We don’t need to know which bit vector goes with which person. So if we can launder the bit vectors to remove the identity of who contributed them while keeping some degree of confidence that the person was trust worth then the system works again. For example if your a member of a club you let everybody in the club drop their bit vector anonymously into a box. Then club members could use those to filter their incoming mail. That might tell you that 3 people in your club might be in correspondence with your mistress, but not who of the 200 members that might be. If the club is large enough then you need only trust the members on average for the scheme to work.

TypeKey – The Central Authority

Let me take a stab at solving the second of the two large problems I
see with the current TypeKey design.

The Problem

In it’s most brutal form the second problem with TypeKey is that it is
a land grab. It puts Six Apart in the position, intentionally or not,
of making exactly the same mistake that Microsoft made with Passport.

The design, presumably for simplicity, assumes that there is one
authority that everybody using TypeKey will turn to for their
authentication services; i.e. www.typekey.com. That makes
typekey.com the central authority for some universe of
authentication. Today this is blog comments. Tommorrow it might be
wiki contributions. The next day – who knows?

While at first blush this appears to be very valuable turf to grab, if
you are too greedy you destroy the value of the turf your getting.
That’s the lesson that Microsoft hopefully has learned from Passport.

Sure, you can build a system with a single central authority. Yes,
people will sign up for it. Trouble is then force other serious players
into a subordinate position. Other powerful players don’t like that.
They have no interest playing a subordinate role to you.

The second problem with having a single central authority is that it
encourages the emergance of a monopoly. While I may think very highly
of the folks at Six Apart, I don’t think so highly of them as to
believe they should be encouraged to grab a dominate role in teh
authentication of contributors of open/free content.

Solving the problem, technically, isn’t that hard. It is harder
than solving the design flaw of revealing a global unique identifier
for everybody, though.

A Solution

What is required?

Users and sites need to be able to sign up for with multiple
authentication services.

The more the merrier. In fact if you design with the presumption that
there will be a few hundred or thousands of these “authorities” that
would be best.

The TypeKey design bounces the user over to the single central authority.

In a design that avoids a single central authority the site needs to
infer one or more authorities to bounce the user over to.

The key added complexity is getting a list of these authorities before
we bounce the user over to them for authentication. This take two steps.
The simpler TypeKey design requires only one. First we lookup the
user’s perfered authorities. Secondly we ask one or more of them to
authenticate/vouch for the user.

A Possible Implementation

How to lookup the user’s perferred list of authorities?

There are lots of fanciful ideas for how to do this – most are not
practical. We might modify the installed base of web browsers so that
users could send their prefered set of authorities as part of their
browsing. We might have the user run thru a proxy server provided by
his ISP and that proxy server could insert the list of perfered
authorities.

There are two reasonably practical approaches.

First we could introduce a central authority who’s only role is to
return the user’s list of perfered authorities. The TypeKey folks
could volunteer to do that, in effect offering to redirect queries
about a given user to other authorities if that user has asked for
that.

Alternately we could use tricks involving browser cookies. Each site
would then use these cookies to get the user’s authority list. This
solution is somewhat better, at least it’s faster, than the first
solution. It has a similar design challenge that somebody would
have to manage the domain used to hold the cookies shared over all
these sites.

Neither of these solutions is too hard to implement. Both solve the
problem of enabling a single dominate vendor for the role that TypeKey
is working to fill.

One final point. I don’t believe that introducing the mechinisms will
reduce the success of TypeKey as a major player in the blog
authentication industry. In fact I suspect that making changes along
these lines will accellerate adoption because it will reduce the
paranoia that is created by making the role of authentication server
scarce.