A new service was just announced at the RSA conference that takes an interesting approach to hashing passwords. Called “Blind Hashing,” from TapLink, the technology is fully buzzword-compliant, promising to “completely secure your passwords against offline attack.” Pretty grandiose claims, but from I’ve been able to see in their patent so far, it seems like it has some promise. With a few caveats.
Traditionally, passwords are hashed and stored in place. First we had the the Unix cyrpt() function, which, though it was specifically designed to be “slow” on systems at the time, is now hopelessly outdated and should be killed with fire at every opportunity. That gave way to unsalted MD5-based hashes (also a candidate for immediate incendiary measures), salted SHA hashes, and today’s state of the art functions bcrypt, scrypt, and PBKDF2. The common goal throughout this progression of algorithms has been to make the hashing function expensive, in either CPU time or memory requirements (or both), thus making a brute force attack to guess a user’s password prohibitive.
So far, we seem to have accomplished that goal, but a downside is that a slow hash is still, well, slow. Which can potentially add up, when you’ve got a site that processes huge numbers of logins every day.
The “Blind Hashing” system takes a different approach. Rather than handling the entire hash locally, the user’s password is, essentially, hashed a second time using data from a cloud-based service. Here’s an excerpt from the patent summary:
A blind hashing system and method are provided in which blind hashing is used for data encryption and secure data storage such as in password authentication, symmetric key encryption, revocable encryption keys, etc. The system and method include using a hash function output (digest) as an index or pointer into a huge block of random data, extracting a value from the indexed location within the random data block, using that value to salt the original password or message, and then hashing it to produce a second digest that is used to verify the password or message, encrypt or decrypt a document, and so on. A different hash function can be used at each stage in the process. The blind hashing algorithm typical runs on a dedicated server and only sees the digest and never sees the password, message, key, or the salt used to generate the digest.
Thinking through the process, here’s one way this might work:
- The user provides their userid and password to the system.
- The password is hashed (optionally using a locally-stored salt, unique to the user)
- Traditionally, the hash is then stored locally, and this is what’s used to compare against the identically-generated hash at next login.
- In Blind Hashing, the hash is then sent to a remote service.
- This service uses the hash as an index into a massive (petabyte-sized) database, to retrieve a random number. Each hash thus points to some unique random number.
- The number is returned to the server, and used as a salt to hash the password a second time.
- This second hash is stored locally and used for future logins.
Put in a more functional notation, this might look like:
Salt1 = salt_lookup(Userid) Hash1 = Hash(Salt1, Password) Salt2 = remote_blind_hash_lookup(Hash1) Hash2 = Hash(Salt2, Hash1)
In the event of a compromise on the server, the attacker may recover all the Salt1 and Hash2 values. However, they will not be able to retrieve Salt2 without the involvement of the remote blind hash service. So a brute force attack will require cycling through all possible passwords and, for each password tested, requesting Salt2 from the remote service. This should, in theory, be significantly slower than a local hash / salt computation, and can also be rate-limited at the service to further protect against attacks.
On its surface, this seems a pretty solid idea. The second salt is deterministically derived from the first hash, but not in an algorithmic manner, so there isn’t a short-circuit that allows for immediate recovery of the salt. The database used to store Salt2 values is too large to be copied by an attacker. And the round trip process is (presumably) too slow to be practical for a brute force attack. Finally, the user’s password isn’t actually sent to the blind hash lookup service, only a hash of the password (salted with a value that is not sent to the service).
An attacker who compromises the (website) server gains only a collection of password hashes that are uncrackable without the correct password and the cooporation of the blind hash service. If they are able to collect all blind hash responses, they could build a dictionary of secondary salts to use in brute force attacks, but that would still be very slow (for a large site), as each password tested would be multiplied by the length of this secondary salt list. (Of course, if they can intercept the blind hash response data, then the attacker can probably also intercept the initial login process and just grab the passwords in plaintext.) Finally, an attacker who compromises the blind hash service gains access to a database too large to exfiltrate, and to an inbound stream of passwords hashed with unknown salts.
So in theory, at least, I can’t see anything seriously wrong with the idea.
But is it worth it? The only argument I’ve heard against “slow” hash algorithms like bcrypt or scrypt is that it may present too big a load to busy sites. But wouldn’t the constant communication with the blind hash service also present a fairly large load, both for CPU and especially for network traffic? What happens if the remote service goes down, for example, because of a DDOS attack, or network problems? This service protects against future breakthroughs that make modern hash algorithms easy to brute force, but I think we already know how to deal with that eventuality.
I think the biggest problem we have today, with regards to securely hashing passwords, isn’t the technology available, but the fact that sites still use the older, less secure approaches. If a site cares enough to move to a blind hash service, they’d certainly be able to move to bcrypt. If they haven’t already moved away from MD5 or SHA hashes, then I really don’t see them paying for a blind hashing service, either.
In the end, though I think it’s a very interesting and intriguing idea, I’m just not sure I see anything to recommend this over modern bcrypt, scrypt, or PBKDF-based password hashes.