Logistic Regression

Maximum Likelihood Estimation

Consider the following dataset:

Z a b
0 0 1
0 0 1
0 0 0
0 1 1
1 0 0
1 1 1

Design matrix X:

x0 x1 x2
1 0 1
1 0 0
1 1 1

Equations to solve:
$$ \log\left(\frac{\pi_i}{1 - \pi_i}\right)=(x_{i0}, x_{i1}, x_{i2}) \cdot \beta, \quad i = 1,\ldots, N $$

Joint probability density function of dependent variable Y:


$$ f(y | \beta) = \prod_{i=1}^N \binom{n_i}{y_i}\pi_i^{y_i}(1 - \pi_i)^{n_i - y_i} $$

Cross-Entropy

The cross entropy for the distributions p and q over a given set is defined as $H(p, q) = \mathbb E_p[-\log q]$.

Cross-entropy error function and logistic regression

See wikipedia. With the logistic function $g(z) = \frac 1 {1 + e^{-z}}$ and y ∈ {0, 1} the probability of finding y = 1 is given by:
$$ q_{y = 1} = \hat y \equiv g(\mathbf w \cdot \mathbf x) $$

Let y ∈ {0, 1} be the true label and $\hat y$ the estimation. Having set up the notation p ∈ {y, 1 − y} and $q\in \{\hat y, 1 - \hat y\}$, we can use cross entropy to get a measure for similarity between p and q:
$$ H(p,q)\ = -\sum_i p_i \log q_i = -y \log \hat y - (1 - y) \log(1- \hat y) $$

The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample:
$$ L(\mathbf w) = \frac 1 N \sum_{n=1}^{N} H(p_{n},q_{n}) $$

The logistic loss is sometimes called cross-entropy loss. It is also known as log loss.

Further reading