Consider the following dataset:
Z | a | b |
---|---|---|
0 | 0 | 1 |
0 | 0 | 1 |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 0 |
1 | 1 | 1 |
Design matrix X:
x0 | x1 | x2 |
---|---|---|
1 | 0 | 1 |
1 | 0 | 0 |
1 | 1 | 1 |
Equations to solve:
$$
\log\left(\frac{\pi_i}{1 - \pi_i}\right)=(x_{i0}, x_{i1}, x_{i2}) \cdot \beta, \quad i = 1,\ldots, N
$$
Joint probability density function of dependent variable Y:
$$
f(y | \beta) = \prod_{i=1}^N \binom{n_i}{y_i}\pi_i^{y_i}(1 - \pi_i)^{n_i - y_i}
$$
Likelihood function: L(β|y)=f(y|β)
βmax : set of parameters for which the probability of the observed data is greatest
$$ \beta_\max = \underset \beta {\operatorname{argmax}} L(\beta | y)$$
The cross entropy for the distributions p and q over a given set is defined as $H(p, q) = \mathbb E_p[-\log q]$.
See wikipedia. With the logistic function $g(z) = \frac 1 {1 + e^{-z}}$ and y ∈ {0, 1} the probability of finding y = 1 is given by:
$$
q_{y = 1} = \hat y \equiv g(\mathbf w \cdot \mathbf x)
$$
Let y ∈ {0, 1} be the true label and $\hat y$ the estimation. Having set up the notation p ∈ {y, 1 − y} and $q\in \{\hat y, 1 - \hat y\}$, we can use cross entropy to get a measure for similarity between p and q:
$$
H(p,q)\ = -\sum_i p_i \log q_i = -y \log \hat y - (1 - y) \log(1- \hat y)
$$
The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample:
$$
L(\mathbf w) = \frac 1 N \sum_{n=1}^{N} H(p_{n},q_{n})
$$
The logistic loss is sometimes called cross-entropy loss. It is also known as log loss.