oooh Huffman coding (that takes me back

)

Ok lets see if i remember it all, basically most data will not have random distribution in the frequency count of the symbols. For instance in the English language the letter 'e' will occur more frequently than the letter 'h' which in turn will occur more frequently than 'q' and so on.

The Information theory (entropy linked) basically means that if you have say N symbols and a probablilty that symbol Si occuring in the data is Pi then in information content is bits of each symbol is equal to :

- sigma (i = 1, limit N) Pi log 2 Pi

remember that if all the symbols occur with equal probablility then Pi = 1/N and the sum is then -log 2 1/N = log2 N (to the upper bound).

Also remember that if the freq distribution in the data is non-random, then the information conveyed decreases, even if you are using the same number of bits to store the data.

Hope that makes sense, soz if i havent made it clear, and I hope i havent missed anything out, cos i did do it quite a while ago. You will def have to check out some books as well to give u a firm grounding in the topic, but overall its not too bad and is pretty basic once u get the overall concepts.