收集展示提速python代码性能的各种方式,欢迎大家提供其他的代码优化的方式,互相交流学习。 若有侵权,请告知删除。
(a) As described in the doc,
$\begin{aligned} - \sum_{w\in Vocab}y_w\log(\hat{y}_w) &= - [y_1\log(\hat{y}_1) + \cdots + y_o\log(\hat{y}_o) + \cdots + y_w\log(\hat{y}_w)] \ & = - y_o\log(\hat{y}_o) \ & = -\log(\hat{y}_o) \ & = -\log \mathrm{P}(O = o | C = c) \end{aligned}$
(b) we know this deravatives: $$ \because J = CE(y, \hat{y}) \ \hat{y} = softmax(\theta)\ \therefore \frac{\partial J}{\partial \theta} = (\hat{y} - y)^T $$
(c) similar to the equation above. $$\begin{aligned} \frac{\partial J}{\partial v_c} &= \frac{\partial J}{\partial \theta} \frac{\partial \theta}{\partial U} \ &= (\hat{y} - y) \frac{\partial U^Tv_c}{\partial U} \ &= v_c(\hat{y} - y)^T \end{aligned}$$
(d)