raskr / rust-autograd

Tensors and differentiable operations (like TensorFlow) in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

softmax_cross_entropy outputs shape [-1], when it should output shape [-1, 1].

AngelOfSol opened this issue · comments

I was attempting to optimize something with the softmax_cross_entropy loss function, but I kept getting broadcast shape errors, which confused me. I spent time digging into the code, and I realized that the output here isn't following the API which says it should output a 2-rank tensor, instead outputting a 1-rank tensor.

fn compute(&self, ctx: &mut crate::op::ComputeContext<T>) -> Result<(), crate::op::OpError> {
        let x = &ctx.input(0);
        let log_x: NdArray<T> = x - &tensor_ops::math_ops::logsumexp_forward(x, 1, true);
        // `t` must be one-hot
        let t = &ctx.input(1);
        assert_eq!(log_x.ndim(), 2, "x must be 2-ranked tensor");
        assert_eq!(t.ndim(), 2, "t must be 2-ranked tensor");
        // - t log x ( =(batch, num_classes))
        let minus_one = T::one().neg();
        ctx.append_output(
            (t * &log_x)
                .sum_axis(ndarray::Axis(1))
                .mapv(move |elem| elem * minus_one),
        );
        ctx.append_output(log_x);
        Ok(())
    }

I fixed it in my local copy by just reshaping the array, which seems to work, but I'm not super familiar with if this is how cross entropy calculations are usually resolved.

        let minus_one = T::one().neg();
        let result = (t * &log_x)
            .sum_axis(ndarray::Axis(1))
            .mapv(move |elem| elem * minus_one)
            .into_shape(ndarray::IxDyn(&[log_x.shape()[0], 1]))
            .unwrap();

        assert_eq!(result.ndim(), 2, "result must be 2-ranked tensor");

        ctx.append_output(result);

@AngelOfSol Could you send a PR? Your fix looks good to me. Thanks!