Most of us use numerous services available on the net such as mails, social networking services, and blogs. So it is obvious to have the question “Can a database administrator working on Facebook can get the password of my Facebook account if he wishes?”. A worthy question it is. But the answer is “No”. Your passwords are never stored as it is. (Well, I don’t know the policy of Facebook regarding this; I mention it just for an example).
When you enter your password for login it should be verified by the server, at the same time, the password should not be stored as it is in the databases. Thinking about encryption is a good option here, but where will you store the key then? “If key can be stored securely in the database, then we could’ve stored password in the same manner!”.
The solution is Hashing. What is a hash function? A hash function is one in which when you pass data, it will produce an output string of a particular length.
Let A be the data and h(n) be the hash function.
h(A) = X
where X is called the message digest of the data A.
h(n) does not have an inverse function. So given X, you cannot arrive at A using any algorithm ideally.
i.e., h-1(X) does not exist.
At the same time, the algorithm should be as such that it does not produce the same message digest for two different messages. But achieving this is impossible as the message space is larger than the digest space, so the probability of getting same message digest for two different messages should be minimum (Pigeon hole principle explains why it is impossible to avoid collision). Two different messages having a same message digest is called a collision. There are various standard hash functions such as MD2, MD4, MD5, SHA0, SHA1, SHA2 are available which has the above mentioned property. I am not a very good mathematician so it is inappropriate for me to explain more about the mathematical properties of hash functions further, so I stop myself here.
MD5(‘Hi’) = c1a5298f939e87e8f962a5edfc206918 (Hexadecimal representation)
MD5 is one of the most famous hash functions and it gives the above output for ‘Hi’.
Now how to achieve our target? i.e., saving our password in the database server in a way the DBA can’t get it.
We saw that hash is a one way function and no inverse function exists for a hash function. So if we store the message digest of our password in the database, no one can convert it back into password even if they had full access to the database. When we login, the password we enter is converted into message digest and it is compared with the message digest of our original password which is already stored at the database.
It is much simpler to implement this as all the database management systems have built in functions to do this. For example, I will go with MySQL in which a specially designed hash function for this purpose exist called PASSWORD. Instead of entering the password into the database, all we have to do is to enter the password digest (message digest of password) into the database.
If ‘fareez’ is my username and ‘123456’ is my password then,
INSERT INTO user VALUES ('fareez',PASSWORD('123456'));
INSERT INTO user VALUES ('fareez',MD5('123456'));
Any standard hash function can be used for this purpose. However, using functions such as
SHA for this purpose are not encouraged as they are designed especially to work fast on larger amount of data. Hence brute force attack on this hash functions can be done on lesser time when compared to specially designed hash functions such as MySQL’s
PASSWORD is supported by MySQL alone, other database servers may have different hash functions.
Well, like most other security systems, this can be broken too. I would like to share some attacks on this technique here.
Hashing algorithms are open to everyone, hence if you have a message digest in your hand and you can break it with brute force assuming that you know the algorithm with which the message digest was generated. It is only a matter of time. Yes, but the time taken will not be small. “Well, What you do in brute force attack exactly?” You have the algorithm & password digest in your hand; all you have to do is to find a password which is having the same message digest as the one you have. So, you have to generate the hashes of all the possible passwords until you get the same one. Simple, but time consuming.
I tried an experiment with MySQL’s
OLD_PASSWORD function which has only 8 bytes of message digest (MD5 has 16 bytes). I made a program the produces random passwords and save the passwords & its message digest into the database. I did this to find at least one collision.
OLD_PASSWORD has only 8 bytes of message digest and so I thought it is easy to find a collision. But the fact is, I produced more than one lakh passwords and I was unable to find a single collision. If finding a collision in an 8 byte hashing algorithm is this difficult, then think about the strength of MD5, SHA1 and SHA512 etc.
There are methods which make use of the weakness of the hashing algorithm to find a collision. But I am not an expert to comment on that.