So, it’s been about two years since I added anything to this blog. I’ve been busy!! The awesome folks at SOURCE gave me a speaking slot at SOURCE Boston 2010 and that kicked-off a series of talks on methods consumer-facing companies/websites take to protect customers from online threats. And then later in 2010 was able to participate in some discussions on different types of threat modeling and situations in which modeling techniques can be useful.
In 2011 I wanted to talk about some more concrete topics, and so spent some time researching how threats/impacts can be better measured. This is an area I’d like to spend more time researching, because there’s still a gap between what we can do with the the high-frequency/lower-impact events (which seem to be easier to instrument, measure, and predict) and the lower-frequency/high-impact events (which are very difficult to instrument measure, or predict). –> I think the key is that high-impact events usually represent a series or cascade of smaller failures, but there’s more research into change management and economics to be done.
Later in 2011 I switched over to describing how analytics can be used to study and automate security event detection. I hope in the process I didn’t blind anyone with data science. (haha…where’s that cowbell?) So here’s what I did: Continue reading →
Risk management at a systemic level is complicated enough that many organizations deem it practically impossible. The mistake many risk managers make is to try to identify every potential exposure in the system, every possible scenario that could lead to loss. This is how risk managers go crazy, since not even Kafka can describe every potential possibility. Risk management as a discipline does line up nicely with probability theory, but holistic approaches to risk management deviate from the sister science of insurance.
Venice. Yeah. Try and get flood insurance *there*.
Insurance presents expected value of specific events taking place: what is the probability this car and this driver will be involved in a collision — and how much will the resulting damage cost to replace/fix? Factors include the age and quality of the car as well as the age and quality of the driver, average distance driven per day, geographic area and traffic conditions. The value of the vehicle is estimated, ranges of collision costs assumed. Flood insurance is similarly specific: what is the probability this property will sustain damage in flood conditions — and how much will it cost to protect/fix the property? Average precipitation, elevation, foundation quality, assessed property value are all factored into the decision.
As complicated as actuarial science is, insurance can be written because insurance is specific. Risk management is not specific: it is systemic.
This post is a first in a series I will be exchanging with Ohad Samet (ok, second, he’s a much quicker blogger than I am), one of my esteemed colleagues in Paypal Risk, and the mastermind behind the Fraud Backstage blog. Read Ohad’s article here.
Despite best efforts to protect systems and assets using a defense-in-depth approach, many layers of controls are defeated simply by exploiting access granted to users. Thus the industry is trying to determine not only how we protect our platforms from external threats, but also how we keep user accounts from being attacked. User credentials being the “keys” (haha) guarding valuable access to both user accounts and to our platfoms, a popular topic among the security-minded these days center around alternatives to standard authentication methods. Typically, the discussion centers not around how an enterprise secures its own assets and users, but about arming consumers who come and go across ISPs, search sites, online banking, social networks…and are are vulnerable to identity theft and privacy invasions wherever they roam.
How many information security professionals does it take to keep a secret?
While there are a number of alternatives out there, focusing on authentication as if it’s a silver bullet misses the point. When we assume that keeping our users secure means protecting (only, or above all other things) the shared secret between us, it leaves us over-reliant on simple access control (the fortress mentality) when as an industry we already know that coordinating layers of protection working together is a more effective model for managing risk. To clarify our exposure to this single point of failure, let’s consider:
1) How much exposed (public, or near-public) data is needed to carry out reserved (private) activities? Meaning, how much does a masquerader need to know that is private to approximate an identity?
– and –
2) How does our risk model change if we assume all credentials have been compromised?
Shall We Play a Game…of Twenty Questions?
Really all this nonsense started when we started teaching users to use “items that identify us” as “items that authenticate us”. Two examples, SSN and credit card numbers. SSN we know has been used by employers, banks, credit reporting agencies…as well as for its original purpose, to identify participation in social security (this legislation being considered in Georgia may limit use of SSN and DOB as *usernames* or *identifiers*, although it is silent on using SSN/DOB to verify/authenticate identity).
I’d never heard the term data exhaust (or digital exhaust, thank you wikipedia ), but it’s a handy idea. The proliferation of social media and the internet transformed “media” (generally media is considered a one-way push system) into “social” (the personalization and intimacy of narrowcast, or at least a many-to-many set of connections). Everyone set on broadcast, and all broadcasts stored for perpetuity. If you like data (and I do) and you are interested in how people act, interact, and think (and who isn’t) the idea that all those feeds and updates and comments would be going to waste — well, it’s heartbreaking.
The basic premise of the panel appealed to the entrepreneurial marketer in all web 2.0 hopefuls — consumers are telling you their preferences (loves, hopes, fears, feedback) faster than can be processed, and companies that analyze and parse the data the best will cash in by designing the best promotions and products for the hottest, most profitable market segments. In addition to the new faddish feeds of facebook, twitter, foursquare, and blippy, some classic sources of consumer preference were mentioned: credit card data, point-of-sale purchase (hello grocery store shopper’s club), and government-held spending data. Everyone knows there’s gold in dem dere hills — like mashing up maps and apartments for sale, or creating opt-in rewards programs so you can figure out not just the best offer for Sally Ann, but also for all of her friends.
Philosophically, the reason we’ve been stuck with an advertising model (broadcast brand-centric messages rather than customer-centric, tailored promotions) is because we’ve never had a mechanism for scaling up a business to consumer communication — that’s tuned to the consumer. Now with the Web and current delivery mechanisms (inbox, feed, banner, search placement, etc.) not only can companies talk to their consumers — the consumers will engage in conversation. The better you know your customers, the better your ability to reach them (promote) and meet their needs (product). As long as you know how to interpret the conversation correctly.
“What people say has no correlation whatsoever with real life, but what they do has every correlation with real life…” Mark Breier, In-Q-Tel
Quickly the conversation moved into some of the technier topics: while there’s a promise of reward for the cleverest analysts, there’s some tricky issues too. For one thing, processing exhaust data in a way that makes sense, at scale, requires some serious processing horsepower and an architecture that can accommodate “big data”. While less time was spent on the technical issues than I expected (with DJ from LinkedIn and Jeff from Cloudera on the panel, a full-on Hadoop immersion was a distinct possibility), time was spent both on the computer science research in the area (machine learning, natural language processing) as well as the (more interesting to me) current thinking on the most appropriate analytic methods. More time on sentiment analysis and entity extraction would have been great, but probably really require follow-up sessions.
Magoulas from ORA did a nice job outlining some of the issues consumers are having with social marketing: besides basic privacy and data ownership questions (who owns the pics you’ve uploaded to FB? And how should filters be set-up for consumers to remain in control of information they want to keep semi-private?), there are issues emerging around security/spoofing/hacking — meaning, how much public information is available for consumers to be targeted. (Marketers can guess your age and who your friends are, more nefarious entitites can try the same techniques to learn more private details). My favorite issue described was the “creepiness factor” — which I can definitely relate to — which is the idea that consumers may not react happily to services that know them *too* well.
Ultimately, analytic marketing is here to stay. Since consumers create data and leave digital detritus wherever they surf, leveraging that data will continue pay dividends as service providers nimbly design customer-centric offers and products. Still, concerns about how personal data (whether private public) is used by unknown intermediaries could either drive withdrawal from social media (if consumers opt to lock-down their profiles and share only in private circles) — or — a more hopeful thought, drive further innovation in this space. Look for savvy organizations to weave opt-in, consumer-driven agent technology with the power to aggregate (and segment) the behavior and preferences of wide communities of people.