Obfuscated Revealing, Hiding Identity, LOAF

LOAF is a clever attempt to design a system that would assign a reputation score to a email address. You use these to help reduce spam. The reputations are assembled out of contributions made by your correspondents. They contribute their address books- well not quite. That would be far too revealing. Instead they obfuscate their address books before revealing them.

How’s this work? When correspondents volunteer to reveal their address book via LOAF they actually reveal just a useful bit vector. You collect these bit vectors. Later when you get email from sender X who you’ve never heard of and your wondering is X a spammer you can use these bit vectors and LOAF to make a rough approximate check to see if X is already in any of your friend’s address books. The LOAF calculation might reply there is an 80% chance X is in Bob’s address book. Obviously this is useful input to automation trying to decide if X is a spammer. Obviously there are limits on it’s value. For all you know some spammer has been forged X’s address.

The trouble in that scenario is that you also just invaded Bob’s privacy a little. You since you know have an 80% certainty that Bob and X are correspondents. At least enough so that Bob put X into his address book. But wait! X is your mistress! What’s Bob doing corresponding with her?

Consider how you could map over your address book to get an approximation of who’s in your address book that’s also in Bob’s. What! Bob doesn’t have the boss in his address book?!? Yeah it looks like Bob’s got the same bookie I do! Of course you can use other sets of email addresses to attempt to distill out a better model of Bob’s address book; for example the addresses in mailing lists you think he might lurk in.

So, the question this raises. How well can such a design work? How little of one’s identity can you keep private while contributing to such an enterprise. If you could get this design pattern to work well it might be a big help. It is clear though that it’s easy to do this badly.

The key to thinking thru how to attack such designs is recognizing that the attacker will have a very large set of obfuscated revealing to work with. That’s why the hashed email addresses in FOAF files are a lousy solution. It’s just too easy to start with that data and with a little sophistication and a bit of brute force the attacker can usually reverse the obfuscation.

This paper about how to figure out if a given laboratory is engaged in secret bio-weapons research is another example of how hard it is to avoid incidental revealing.

I’d be very interested to read some material that critiques the LOAF design! I’d be more interested in reading something about how to design systems that do obfuscated public revealing to enable various kinds of questions to be asked without enabling aggregation to overwhelm the partial revealing implicit in the obfuscated revealing. It’s not clear to me it’s actually possible to design such systems.

Update: All the attacks on Bob’s privacy work because we know that we have Bob’s bit vector. For the scheme to be useful what we need is bit vectors of presumably trust worthy people. We don’t need to know which bit vector goes with which person. So if we can launder the bit vectors to remove the identity of who contributed them while keeping some degree of confidence that the person was trust worth then the system works again. For example if your a member of a club you let everybody in the club drop their bit vector anonymously into a box. Then club members could use those to filter their incoming mail. That might tell you that 3 people in your club might be in correspondence with your mistress, but not who of the 200 members that might be. If the club is large enough then you need only trust the members on average for the scheme to work.

Leave a Reply

Your email address will not be published. Required fields are marked *