Encoding, Collation and Character Sets

Many people don’t seem to know the difference between these terms, what they mean and do. So here is a brief explanation of encoding, collation and character sets.


Encoding is the algorithm that translates numbers into binary so they can be stored on disk. For example UTF-8 would translate the number sequence 1, 2, 3, 4 like this: “00000001 00000010 00000011 00000100″. Think of assigning numbers to characters, converting them into binary data and storing on the disk.

Character Set:

Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encoding for “A”, and the combination of all four letters and their encodings is a character set.


Collation is about comparison between characters. It defines a set of rules to compare characters of a character set. Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we’ve just done is apply a collation to our character set. Ref.

Hope this helps.
Did you know about miracle flights?

Leave a Reply

Your email address will not be published. Required fields are marked *