LabelSelect_Batch stream output type
maltanar opened this issue · comments
Yaman Umuroglu commented
In the LabelSelect_Batch
op I see that the Out_T is templated probably to allow using smaller output bandwidth (e.g. 16-bit ints enough to represent 1000 classes) which is good, but the actual output dtype from the layer seems to be 32 bits always:
template<
// tensor size parameters
unsigned int NumClasses,
unsigned int PECount,
unsigned int NumTop,
typename In_T,
typename Out_T>
void LabelSelect_Batch(stream<ap_uint<PECount * In_T::width> > & in,
stream<ap_uint<32> > & out, const unsigned int numReps) {
and the output isn't packed but written directly to the output stream
for(unsigned int topx = 0; topx < NumTop; topx++){
out.write(toplabels[NumTop - topx - 1]);
}
Would it make sense to change the output stream type to stream<Out_T>? or is there some special reason we cast it to 32 bit uint first?
giuliogamba commented
Fixed with PR #26