Email Hashing: What Could Go Wrong?

hasshYou can’t un-ring a bell, but you might be able to un-hash an email, depending on whom you ask. In order for marketers to safely use hashes for targeted advertising, they must stay abreast of the latest hashing formats.

Email hashing involves converting an email address to a hexadecimal string. Each time an email address is run through a hashing algorithm, it produces a jumble of numbers and letters which, in theory, cannot be tied back to that email address or the individual it belongs to.

Facebook and Twitter use it for their respective CRM matching programs, Custom Audiences and Tailored Audiences.

AdExchanger asked a handful of technology chiefs, security researchers and privacy executives to explain the privacy risks associated with email hashing.

To what extent can email hashing be cracked, and what can the industry do to respect consumer PII?

Click below to read their responses.

 Santiago Pontiroli, security researcher, Kaspersky Lab

"A problem that arises from adopting this procedure is that, as more and more services become part of the [logging-in] trend, the consumer’s online history will be revealed slowly but surely. Since consumers rarely change their primary email address and use them for several years, marketing companies can know users' current and past online behavior. Email resembles a digital passport that many people can't hide, leaving breadcrumbs of information that targeted advertisers can use to follow consumers' activity."

Richard Maathey, business information security officer, Experian

"There are several different types of email hashing, and there are a few that we typically recommend based on the industry standard. MD5 and SHA1 are widely acknowledged to be insufficient today. The type of hashing we recommend as a company is SHA2, and there are other cutting-edge companies using things like bcrypt and scrypt. Those are generally the industry standard for good hashes today. … The processing power of the computers we use has grown exponentially. It’s important to keep in mind what’s currently considered acceptable. You could argue that SHA2 is probably on its way out, but for today, it’s fine.. … As the speed of computer processing increases, some forms of hashing that were previously thought to be unbreakable are breakable. … But the bigger concern in our industry is how we protect those emails in general, from a security standpoint. Security throughout the whole chain of custody is only as good as the weakest link. You have to look at it in totality. There is no silver bullet."

Jeff Northrop, chief technology officer, International Association of Privacy Professionals

"Hashing masks the email itself and prevents it from being used or infiltrated in a breach. But the hash still uniquely identifies an email, even if you don’t know the email address. … If you hash an email and you have a home address along with it, it’s not hard to personally identify the individual attached to that hashed information. Hashing is a good solution, but it’s by no means a perfect way to permanently mask someone’s identity. … There are different hashing algorithms and methods of reversing those algorithms, but I don’t think that’s the real risk with hashing. If you use the proper hash algorithm, it can be very difficult to extract the original value, and the value of the email address probably isn’t worth the effort. … If you have a unique identifier tied to an individual, you’re not going to necessarily be able to recover what the email address is, but that doesn’t mean you can’t use that identifier with other information that’s collected along with it to re-identify an individual. That’s the real privacy risk."

Aaron Kechley, SVP of products, DataXu

"There are always risks associated with handling sensitive information, and email hashing is no exception. If you are going to trust someone to do it, the most important thing is that they are following best practices for data encryption and access control. If someone is working with hashed emails in this way, then I believe the risk of something going wrong is low. But marketers should understand that as a category, they tend not to be especially security-minded, and so vendors are not typically asked lots of hard questions about data security. I believe this has led to complacency in the vendor community, which could imply there is higher risk than is technically necessary."

Anneka Gupta, VP of product, LiveRamp/Acxiom

"Since the same data going through a hashing recipe produces the same result, when two parties use the same hashing recipe, it produces the same representation. These one-way hashing functions are not reversible, which makes hashing technologies a very secure approach to protecting data. … In the online advertising space, it's essential to keep the privacy promises made to consumers, yet deliver advertising experiences that are relevant and positively impact the consumer journey. This means we protect consumers’ online anonymity, where promised, by creating technical barriers preventing associating PII with online devices. Hashing email addresses is a piece of the process that allows us to provide advertising capabilities in an anonymous world."

Dave Hendricks, president, LiveIntent

"Hashing is routine procedure that provides the safest way to share so-called ‘de-identified’ email address data between two parties. It happens when you log in to a website, for example. Hashed emails can’t be mailed – merely matched for what is called CRM retargeting. Hashing, when used for CRM remarketing purposes, has no soft spots. It’s only useful to match user data between two parties, typically a brand and a publisher. If you found a random thumb drive on the street and it contained a file consisting of hashes, it would be useless. Marketers should not fear the hash. Instead, they should revere its ability to rescue us from the world of third-party cookie targeting."

2 Comments

  1. We have used MD5 hash almost since our inception. As Richard states above, the leaders in the industry are now using SHA1 or SHA2 hashing. We have been using SHA2 for the past couple of years now. We are ready and eager to move to bcrypt. The challenge of moving to the better bcrypt or scrypt methods is education and adoption. Many companies just don't have the in-house experience with these stronger hashing algorithms.

    Reply
  2. To add to Dave's thoughts, a hash is only meaningful as a key between two data tables. And even WITH access to those two data tables, the data inside is still meaningless because those sets of hashes are only useful in an extremely specific context and timespan. It would be as if I pasted a product key for Windows 95 into this comment -- it's a useless string of characters, the value of which long ago dwindled to zero.

    Vendors and advertisers can "salt" email addresses (add random characters to them) before hashing so as to make one hash of my.name@emailaddress.com COMPLETELY different than any prior or subsequent hash of my.name@emailaddress.com, meaning that if you find that hypothetical thumb drive of email hashes (mwuh ha ha ha), unless you can locate its source data tables and hack into them, and THEN painstakingly track the daisy-chain of hashes, referenced data tables, API calls, and company firewalls back to non-obfuscated and decrypted P.I.I. (all before your little trove of hashes get re-salted or deleted), all you've got is a bunch of random characters and a free thumb drive. There's WAY more (i.e., greater than zero) potential payoff in cracking directly into companies' in-house CRM systems or user DBs, but that's old news.

    So, moral of the story: if you salt your hashes (and refresh those salts as often as is feasible), anyone's potential ability to maliciously track consumers online is no greater than it was before CRM targeting.

    Reply

Add a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>