Inference-based Identity

The AOL-search disclosure debacle is interesting in two ways. First, I’m amused that people are shocked with how easy it is to determine someone’s identity by way of their search queries. Second, I can’t fathom how some faction within AOL thought releasing this data would be a good idea. For some background, The New York Times has a nice article on how a reporter fairly easily tracked down the real-world identity of user 4417749.

But it got me to thinking… Since search history can pretty easily tell you the identity of someone, an IDS / IPS within an enterprise can tell you that and more. These capabilities within IDS and other “big brother” technologies have spawned privacy concerns among their controlled users. “Corporations are not democracies” is the general response: the computer you are using is not your property, nor are the software applications, or the network you access … and so on.

Setting aside what sorts of private things your employer could already know about you based on your network-use, imagine using the same set of data for identity. Let’s call it “inference-based” identity. An inference-based identity system could identify users or devices without their overt participation. Such systems can operate in a very similar way to an IDS. Companies like Great Bay can already do some of this. They determine whether a given MAC address is a Windows PC, a fax machine, a security card reader, or nearly anything else. They do this by increasing the confidence of their classification with each new piece of data they gather until voilá! They make an identification.

Great Bay is doing this to help organizations migrate to 802.1X but a system built specifically for the broader identity problem seems more possible all the time. This would be similar to what Arbor Network’s DoS prevention does with NetFlow, only doing it packet-by-packet with the intention of determining identity information. Each packet or flow inspected would provide clues to the identity of the user and host. IDS certainly becomes more powerful when informed by identity but contemplating an IDS inferring identity is something else entirely.

The trend towards encrypted communication certainly makes this sort of classification harder, but I imagine there’s enough sent in the clear from most workstations to make a clear identification–and to trend that identification over time as users roam on the network. How hard would it be for a network to determine that Tony from accounting isn’t at his normal workstation but is instead using a machine in a conference center? Technologies like 802.1X support mobility through an overt user action today but when paired with an inference-based identity system you could increase security and also deal with non-802.1X capable devices. Any inference-based system would suffer from delays in determining identity as the first few packets from a host are rarely going to be enough to establish a reasonable identity. The false positive problem of an IDS might also appear, but since you are merely using the various bits of information to make a broader identification decision, the challenges might be mitigated. I would expect to see more work in this space in the next couple years.

As an aside, I can’t help feeling that this same data can also be used over time for frightening things. Imagine your employer trending your network access as you do more and more online. They might be able to spot early signs of depression for example. Imagine the phone call from HR, “We’ve noticed a trend in your Internet use which might indicate you could use some help with your personal life. As a caring employer we’d like to do what we can.” Swap out your employer for your ISP or government and things get even scarier. Big brother indeed.

Technorati Tags: , ,

One Response to “Inference-based Identity”

  1. Sean Convery » Blog Archive » Schneier’s Wide-Open Wireless Argument Says:

    [...] Second, Schneier seems to think that the risks to him are as follows: someone breaks into his machine or someone does something illegal using his network. There is a significant third risk he doesn’t cover: the increased risk of identity theft / profiling. Watching the Internet use and search habits of a machine is very easy over an open wireless network. Watching that use over a long period of time could be very revealing (and profitable, just ask Google). What I find borderline hilarious is that the blogosphere proponents of open networks are the vary same folks that rightly went a bit bonkers when AOL released the search data of 650,000 users. This data was partially anonymized by removing the screen name of the searcher but as the New York Times and others reported, it is fairly trivial to analyze searches and derive identity. I wrote about how the same techniques might apply to enterprise Identity. What I find funny is while the damage done is at least self-inflicted in the open wireless case, the repercussions could be even more disastrous. With a persistent log of not just your searches but your internet traffic in total over a period of time, it would be very easy to tell an awful lot about you. If you think the bad guys need to be parked out front to do this, you haven’t spent enough time looking at snack-food wireless antennas. [...]

Leave a Reply