Entropy

Basic idea

Shannon entropy measures the average information content (in bits) of a random variable. It is the lower bound on the average code length for lossless compression.

Shannon entropy

$H(X)=-\sum_{i=1}^{n} P(x_i) \log{P(x_i)}$

Joint entropy

$H(X,Y)=-\sum_{x \in X}\sum_{y \in Y} P(x,y) \log{P(x,y)}$

Conditional entropy

$H(Y|X)=-\sum_{x \in X}\sum_{y \in Y} P(x,y) \log{P(y|x)}$

Mutual information

I(X;Y)=H(X)+H(Y)-H(X,Y)

Kullback–Leibler divergence

$D_{KL}(P|Q)=\sum_{x \in X} P(x) \log{\frac{P(x)}{Q(x)}}$

Cross-entropy

$H(P,Q)=-\sum_{x \in X} P(x) \log{Q(x)}$

Siblings

Compression