Entropy

Basic idea

Shannon entropy measures the average information content (in bits) of a random variable. It is the lower bound on the average code length for lossless compression.

Shannon entropy

H(X)=-\sum_{i=1}^{n} P(x_i) \log{P(x_i)}

Joint entropy

H(X,Y)=-\sum_{x \in X}\sum_{y \in Y} P(x,y) \log{P(x,y)}

Conditional entropy

H(Y|X)=-\sum_{x \in X}\sum_{y \in Y} P(x,y) \log{P(y|x)}

Mutual information

I(X;Y)=H(X)+H(Y)-H(X,Y)

Kullback–Leibler divergence

D_{KL}(P|Q)=\sum_{x \in X} P(x) \log{\frac{P(x)}{Q(x)}}

Cross-entropy

H(P,Q)=-\sum_{x \in X} P(x) \log{Q(x)}

Siblings