Loading

Module 1: Uncertainty, Compression and Entropy

Notes
Study Reminders
Support
Text Version

Uncertainty, Compression and Entropy - Lesson Summary

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

Uncertainty

Uncertainty or simplified, lack of certainty, is a manifestation of some deficient, imperfect or unknown information. The knowledge is limited and it is not possible to properly explain the current state, predict future and possible outcomes.

Measuring Uncertainty

One way to capture uncertainty in a random variable is to characterize the number of bits required to represent it. In that context, various measures of how compressible a random variable is presented:
• The minimum average length of a one to one map that can be used to represent random variables. The average length is the number of bits in the sequence.
• The minimum number of bits required to store almost all the sequences so it can be written as the minimum cardinality of a set A.
• The minimum number of bits that you need to store all the sequences in set A.
And the number of bits to which a sequence can be compressed is related to the entropy, so the fundamental limit for measuring uncertainty is called the entropy of a random variable.

Limits of Compression

Data compression is the process of encoding information using fewer bits than the original. It can be either lossy or lossless. Lossless compression means no information is lost during the compression process. Lossy compression reduces bits by removing unnecessary or less important information. The limits of compression are dictated by the randomness of the source.

Shannon Entropy

Entropy is the average number of bits required to represent or transmit an event drawn from the probability distribution for the random variable. Shannon entropy is a measure of uncertainty associated with random variables.

Random Hashing

Random hashing simply means picking the hash function randomly from a set of possible hash functions. The collision of a hash function is a pair of different inputs which give the same output. Weak collision resistance is bound to a particular input, whereas strong collision resistance applies to any two arbitrary inputs.