Feel the Binary Code

Jul 23, 2011

“There are 10 kind of people, One who knows Binary and the other who doesn’t.”

In the earlier days when electronic computers was first built, inputs were given in binary using tapes and outputs were also obtained as binary. So then, the people who where using the computers had to know binary ‘language’. But now, we have GBs of movies, pictures, audio, documents, text and more & more type of files that we even we couldn’t make a full list of. Do we ever know how 0s and 1s formed these things? At the least, when we start to work with Bitwise operators we should have the knowledge of how these media are represented by the binary language. In this article we will see how to does binary works with numbers and text.

Now, I’ll explain you how to write a program that prints the binary equivalent of numbers and characters. We would have seen many decimal to binary programs. Their output would be 101 if you give 5. What if you give -5 as input? 101 again? But this is not the exact representation. Our attempt is showing the binary equivalent of any data type as exact as they reside in the memory of a computer.

character 'A' is 00100001
integer  5 is 00000000 00000101
integer -1 is 11111111 11111111

Confused with -1? Negative numbers are stored by 2’s complement representation. The MSB(Most Significant Bit) is the sign bit, if it is 1 then negative number, otherwise positive number. Characters are encoded by ASCII codes. First I’ll start with a character.

char c = 'A'  // A is represented as  0010 0001
Step 1 : c & (1 << 7)
        0010 0001
     &  1000 0000
        ---------
 Result 0000 0000

 IF Result = 0 THEN PRINT "0" ELSE PRINT "1"
 
 Output of Step 1 : 0
 
 Step 2 : c & (1 << 6)
        0010 0001
     &  0100 0000
        ---------
 Result 0000 0000
 
 Output of Step 2 : 0
 
 Step 3 : c & (1 << 5)
        0010 0001
     &  0010 0000
        ---------
 Result 0010 0000
 
 Output of Step 3 : 1
 ...
 ...
 ...
 Step 8 : c & (1 << 0)
        0010 0001
     &  0000 0001
        ---------
 Result 0010 0000
 
 Output of Step 8 : 1
 
 Output : 00100001


void print_chartobin(char c) 
{
 int i;
    for(i=7;i>=0;i--) {
     (c&(1 << i))? printf("1"):printf("0");
    }
}

I’m starting from i=7 to i=0 just because we want to print in the order MSB to LSB. Now we will see the algorithm for this.

Hope you would not be confused with ternary operator, I used it for my convinience, they are indeed handy at many complex situations, so it is a must to learn and use. Whenever I try to give you an algorithm, I’ll give it in C otherwise I’ll specify the language. C has one handy advantage in this problem. It evaluates all non-zero numbers automatically to TRUE and zero automatically to FALSE. So you can easily pass integer values to conditional statements. But in languages like Java, this is not allowed. An expression to be placed in a condition check must return a boolean (Java has a datatype boolean to store true or false and it is never related to any other datatype as bool in C++ is an integer itself). So when you use Java change only one thing, insert a relational operator into c&(1<<i) and make it c&(1<<i)==0. Thats it.

We have printed a character successfully, an integer can be printed in the same way but what about a double or float with which we cannot make any bitwise operation. In fact, we want to feel the binary code of advanced data types like these. This is the reason why I chose a character conversion first and not an integer. A char is the minimal unit of data that can be addressed, i.e, 1 byte, so an int can be expressed as 2 char, float as 4 char, double as 8 char, etc. The following is a 4 byte integer’s storage representation in RAM.

Addr     1 byte data
          +--------+
2000  --> |00000100|
          +--------+
2001  --> |00000000|
          +--------+
2002  --> |00000000|
          +--------+
2003  --> |00000000|
          +--------+

Above is an integer of four bytes. The address of integer is 2000. Now, the least significant byte (LSB) of the integer is stored at 2000, the next byte at next address and so on. So we have to print the char(1 byte) at 2000, then char(1 byte) at 2001, then at 2002 and then at 2003. For this, we are going to use the same function print_chartobin that we wrote before.The fact is this, a four integer can be converted into an array of four characters by using pointers. Similary double can be converted into an array of 8 characters. The following function can print binary equivalent of any datatype. For your amusement, it can be structure or union too.

void print_binary(char *ptr,int size)
{
 int i;
    for(i=(size-1);i>=0;i--)
    {
     print_chartobin(ptr[i]);
       printf(" ");
    }
}

/* Usage (1): */
int i=5;
print_binary(&i,sizeof(int));


/* Usage (2): */
struct emp
{
 int empid;
 char name[20];
};

struct emp f={17,"Fareez Ahamed K.N."};

print_binary(&f,sizeof(struct emp));

In the above function, we have to print from the MSByte to the LSByte ans so i=(size-1) to i=0. Now you can see through the binary code of any datatype of C. I wish to give you an exercise. Please check the function giving any float or double data and note down the binary equivalent that it produces, now take the IEEE standard for floating point number (It is available in Computer Organization by Carl Hammacher) and verify the correctness of the binary code with the IEEE specification.

Note this, the above code are tested in GCC compiler in Ubuntu 10.10. Thats why the integer I specified here are 4 bytes, but in the case of Turbo C its just 2 bytes. I strongly recomment all of you to use GCC compiler if you want to code in C, because that complies with ANSI C standard. Remember Turbo C is not ANSI C standard and it is useless now from my point except for simple graphics program. What I have said here is indeed a critical topic, if you don’t understand, as I said before, “Please read again and again”. Hope you understood the meaning for the quotes that I put at the beginning…