Xilinx / xfopencv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug on xf::resize after upgrading from 2018.3 to 2019.1

3togo opened this issue · comments

commented

prj_xf_dummy_2019.1.zip
The following function can run using xfopencv 2018.3 but not when using 2019.1. Why?
What is the meaning of "Call parameter type does not match function signature!"?

xf::resize <INTERPOLATION, XF_8UC1, wk_rows, wk_cols, dst_rows, dst_cols, XF_NPPC1, MAXDOWNSCALE> (wk_disp, disp);

Call parameter type does not match function signature!
  %wk_disp.data.V = alloca [38400 x i8], align 1
 [307200 x i8]*  call fastcc void @"xf::resize<0, 0, 160, 240, 480, 640, 1, 20>"(i32* %wk_disp.rows, i32* %wk_disp.cols, [38400 x i8]* %wk_disp.data.V, i32* %disp.rows, i32* %disp.cols, [307200 x i8]* %disp.data.V), !dbg !31033
Broken module found, compilation aborted!
Stack dump:
0.	Running pass 'Function Pass Manager' on module '/home/eli/git/bjp/sfsfsfsfsf/ips/xf_test/prj_xf_test_2019.1/solution1/.autopilot/db/a.g.1.bc'.
1.	Running pass 'Module Verifier' on function '@xf_filter_simple'
/tools/Xilinx/Vivado/2019.1/bin/rdiArgs.sh: line 267:  8517 Aborted                 (core dumped) "$RDI_PROG" "$@"

prj_xf_dummy_2019.1.zip

commented

The error may be because:

A component cannot include virtual functions, function pointers, or bit fields.

commented

Below is the output dumps

INFO: [XFORM 203-602] Inlining function 'xf::Mat<0, 480, 640, 1>::write' into 'resizeNNBilinear<0, 160, 480, 1, 480, 640, 1, 2>' (/opt/xfopencv/2019.1/include/imgproc/xf_resize_nn_bilinear.hpp:423) automatically.
INFO: [XFORM 203-602] Inlining function 'xf::Mat<0, 480, 640, 1>::read' into 'xf::xfMat2AXIvideo<24, 0, 480, 640, 1>' (/opt/xfopencv/2019.1/include/common/xf_infra.h:70->/opt/xfopencv/2019.1/include/common/xf_infra.h:83->/opt/xfopencv/2019.1/include/common/xf_infra.h:202) automatically.
Call parameter type does not match function signature!
  %ss.data.V = alloca [76800 x i8], align 1
 [307200 x i8]*  call fastcc void @"xf::resize<1, 0, 480, 640, 160, 480, 1, 2>"(i32* %src.rows, i32* %src.cols, [307200 x i8]* %src.data.V, i32* %ss.rows, i32* %ss.cols, [76800 x i8]* %ss.data.V), !dbg !26192
Call parameter type does not match function signature!
  %ss.data.V = alloca [76800 x i8], align 1
 [307200 x i8]*  call fastcc void @"xf::resize<1, 0, 160, 480, 480, 640, 1, 2>"(i32* %ss.rows, i32* %ss.cols, [76800 x i8]* %ss.data.V, i32* %disp.rows, i32* %disp.cols, [307200 x i8]* %disp.data.V), !dbg !26193
Broken module found, compilation aborted!
Stack dump:
0.	Running pass 'Function Pass Manager' on module '/home/eli/git/bjp/sfsfsfsfsf/ips/xf_dummy/prj_xf_dummy_2019.1/solution1/.autopilot/db/a.g.1.bc'.
1.	Running pass 'Module Verifier' on function '@xf_dummy_filter'
/tools/Xilinx/Vivado/2019.1/bin/rdiArgs.sh: line 267:  7313 Aborted                 (core dumped) "$RDI_PROG" "$@"
Warning: HLS Process returned an error, skipping report opening!
Aborted!

commented

syn.log

I have attached a testing program prj_xf_dummy_2019.1.zip above. Can anyone confirm whether the bug affects everyone or only affects me?

Attached is the synthesis log.
syn.log

@3togo
We acknowledge this issue. It is a bug with the 2019.1 Vivado HLS Synthesis flow, wherein it throws the error you reported, when a function with xf::Mat interfaces is instantiated more than once with different parameters (Basically creating different RTL for the two instances). In your case, the xf::resize function is called twice with different template parameters.

The fix has to come from Vivado HLS tool, which might take some time. However, you can consider the following workarounds:

  1. Create a Class and paste the xf:resize as a member function. Create new Objects of the class when multiple calls needs to be made to the function.

    example:

Class resizeWrapper
{
< copy the resize function definition here >

};

In the accel function:

resizeWrapper obj1, obj2;

obj1.resize<., 1080, 1920,..> (... , ...);
obj2.resize<., 2160, 3840,..> (... , ...);

  1. Copy the whole code of resize function into another function with a different name, like resize2. Use that function for the second call.

    example:

resize <., 1080, 1920,..> (... , ...);
resize2<., 2160, 3840,..> (... , ...);

commented

@bgouthamb ,

Many thanks for your reply. I did modify the code as suggested by your workaround 1. But, it makes no difference. The errors are still there. Below is the modified code.

Eli

#include "xf_dummy_accel.h"
#include "xf_config_params.h"
#include "imgproc/xf_resize.hpp"

#define MAXDOWNSCALE 2
#define INTERPOLATION   1
class JJ {
    public:
        template<int INTERPOLATION_TYPE, int TYPE, int SRC_ROWS, int SRC_COLS, int DST_ROWS, int DST_COLS, int NPC, int MAX_DOWN_SCALE>
        void resize (xf::Mat<TYPE, SRC_ROWS, SRC_COLS, NPC> & _src, xf::Mat<TYPE, DST_ROWS, DST_COLS, NPC> & _dst) {
            xf::resize<INTERPOLATION_TYPE, TYPE, SRC_ROWS, SRC_COLS, DST_ROWS, DST_COLS, NPC, MAX_DOWN_SCALE> (_src, _dst);
        }
};

void xf_dummy_filter(AXI_STREAM& src_data, AXI_STREAM& src_data_d)
{
#pragma HLS INTERFACE axis port=src_data
#pragma HLS INTERFACE axis port=src_data_d


XGRAY_SRC_IMAGE src(src_rows, src_cols);
XGRAY_SS_IMAGE ss(ss_rows, ss_cols);
XGRAY_DST_IMAGE disp(dst_rows, dst_cols);
JJ jj1, jj2;


#pragma HLS dataflow
xf::AXIvideo2xfMat(src_data, src);
//xf::resize <INTERPOLATION, XF_8UC1, src_rows, src_cols, ss_rows, ss_cols, XF_NPPC1, MAXDOWNSCALE> (src, ss);
//xf::resize <INTERPOLATION, XF_8UC1, ss_rows, ss_cols, dst_rows, dst_cols, XF_NPPC1, MAXDOWNSCALE> (ss, disp);
jj1.resize <INTERPOLATION, XF_8UC1, src_rows, src_cols, ss_rows, ss_cols, XF_NPPC1, MAXDOWNSCALE> (src, ss);
jj2.resize <INTERPOLATION, XF_8UC1, ss_rows, ss_cols, dst_rows, dst_cols, XF_NPPC1, MAXDOWNSCALE> (ss, disp);

xf::xfMat2AXIvideo(disp, src_data_d);
}

@3togo ,

The Class JJ has to have the whole definition of xf::resize including any sub-functions it calls internally. #include "imgproc/xf_resize.hpp" should be removed.
Assuming that you use BILINEAR interpolation method, here is how the class should be defined:


class JJ {

    public:

	/***********************************************************************/

	template<int DEPTH, int INTERPOLATION_TYPE, int NPPC>
	void interpolatePixel(XF_CTUNAME(DEPTH,NPPC) A0, XF_CTUNAME(DEPTH,NPPC) B0, XF_CTUNAME(DEPTH,NPPC) A1, XF_CTUNAME(DEPTH,NPPC) B1, ap_ufixed<12,2> Wx, ap_ufixed<12,2> Wy, XF_CTUNAME(DEPTH,NPPC) &pixel)
	{
	#pragma HLS inline
		if(INTERPOLATION_TYPE==XF_INTERPOLATION_NN)
		{
			pixel = A0;
		}
		else
		{
			ap_ufixed<12,2> Wxy;
			ap_int<16> val0,val1,val2;
			ap_fixed<28,18> P1,P2,P3,P4;
			ap_ufixed<28,18> one_num = 1.0;

			Wxy = (Wx*Wy);    // Wx - 0.32, Wy-0.32  (Wx*Wy-0.64)  Wxy - 0.32
			val0 = (A0+B1-(B0+A1));
			val1 = (B0-A0);
			val2 = (A1-A0);

			P1 = (val0*Wxy);		// val0(16.0) * Wxy(0.32) = P1(16.32)
			P2 = (val1*Wy);		// val1(16.0) * Wy(0.32) = P2(16.32)
			P3 = (val2*Wx);		// val1(16.0) * Wx(0.32) = P3(16.32)
			P4 = (A0);					// A0(8.0) P4(8.32)

			pixel = (XF_CTUNAME(DEPTH,NPPC))((ap_fixed<32,22>)(P1  + P2 + P3 + P4));
			// to get only integer part from sum of 8.32's , right shift by 32
		}
	}
	template<int DEPTH, int INTERPOLATION_TYPE, int NPPC, int T_INDEX_INT, int NUMBEROFINPUTWORDS>
	void computeOutputPixel(XF_TNAME(DEPTH,NPPC) A0[NUMBEROFINPUTWORDS], XF_TNAME(DEPTH,NPPC) B0[NUMBEROFINPUTWORDS], ap_uint<T_INDEX_INT> initIndex, ap_uint<T_INDEX_INT> indexx[XF_NPIXPERCYCLE(NPPC)], ap_ufixed<12,2> Wx[XF_NPIXPERCYCLE(NPPC)], ap_ufixed<12,2> Wy, XF_TNAME(DEPTH,NPPC) &pixel)
	{
	#pragma HLS inline
		const int PIXELDEPTH = XF_DTPIXELDEPTH(DEPTH,NPPC);
		/*if(indexx[XF_NPIXPERCYCLE(NPPC)-1] > (initIndex+NUMBEROFINPUTWORDS*XF_NPIXPERCYCLE(NPPC)-1))
			{
				std::cout << "Insufficient number of words to resize in X" << std::endl;
				return;
			}*/
		assert((indexx[XF_NPIXPERCYCLE(NPPC)-1] < (initIndex+NUMBEROFINPUTWORDS*XF_NPIXPERCYCLE(NPPC)-1)) && "Insufficient number of words to resize in X");

		XF_PTUNAME(DEPTH) unpackX1[XF_NPIXPERCYCLE(NPPC)*NUMBEROFINPUTWORDS];
	#pragma HLS ARRAY_PARTITION variable=unpackX1 complete dim=1
		XF_PTUNAME(DEPTH) unpackX2[XF_NPIXPERCYCLE(NPPC)*NUMBEROFINPUTWORDS];
	#pragma HLS ARRAY_PARTITION variable=unpackX2 complete dim=1
		XF_PTUNAME(DEPTH) outputPixel[XF_NPIXPERCYCLE(NPPC)];
	#pragma HLS ARRAY_PARTITION variable=outputPixel complete dim=1
		for(int k=0; k<NUMBEROFINPUTWORDS; k++)
		{
	#pragma HLS UNROLL
			for(int i=0; i<XF_NPIXPERCYCLE(NPPC); i++)
			{
	#pragma HLS UNROLL
				unpackX1[k*XF_NPIXPERCYCLE(NPPC)+i] = A0[k].range((i+1)*XF_DTPIXELDEPTH(DEPTH,NPPC)*XF_CHANNELS(DEPTH,NPPC)-1,i*XF_DTPIXELDEPTH(DEPTH,NPPC)*XF_CHANNELS(DEPTH,NPPC));
				unpackX2[k*XF_NPIXPERCYCLE(NPPC)+i] = B0[k].range((i+1)*XF_DTPIXELDEPTH(DEPTH,NPPC)*XF_CHANNELS(DEPTH,NPPC)-1,i*XF_DTPIXELDEPTH(DEPTH,NPPC)*XF_CHANNELS(DEPTH,NPPC));
			}
		}
		for(int i=0; i<XF_NPIXPERCYCLE(NPPC); i++)
		{
	#pragma HLS UNROLL

			for(int k=0; k<XF_CHANNELS(DEPTH,NPPC); k++)
			{
	#pragma HLS UNROLL
				XF_CTUNAME(DEPTH,NPPC) unpackX1temp[XF_NPIXPERCYCLE(NPPC)*NUMBEROFINPUTWORDS];
	#pragma HLS ARRAY_PARTITION variable=unpackX1temp complete dim=1
				XF_CTUNAME(DEPTH,NPPC) unpackX2temp[XF_NPIXPERCYCLE(NPPC)*NUMBEROFINPUTWORDS];
	#pragma HLS ARRAY_PARTITION variable=unpackX2temp complete dim=1
				for(int l=0; l<XF_NPIXPERCYCLE(NPPC)*NUMBEROFINPUTWORDS; l++)
				{
	#pragma HLS UNROLL
					unpackX1temp[l] = unpackX1[l].range((k+1)*PIXELDEPTH-1,k*PIXELDEPTH);
					unpackX2temp[l] = unpackX2[l].range((k+1)*PIXELDEPTH-1,k*PIXELDEPTH);
				}
				XF_CTUNAME(DEPTH,NPPC) currentoutput;
				interpolatePixel<DEPTH, INTERPOLATION_TYPE, NPPC>(unpackX1temp[indexx[i]-initIndex], unpackX2temp[indexx[i]-initIndex], unpackX1temp[indexx[i]-initIndex+1], unpackX2temp[indexx[i]-initIndex+1], Wx[i], Wy, currentoutput);
				outputPixel[i].range((k+1)*PIXELDEPTH-1,k*PIXELDEPTH) = currentoutput;
			}
		}

		for(int i=0; i<XF_NPIXPERCYCLE(NPPC); i++)
		{
	#pragma HLS UNROLL
			pixel.range((i+1)*XF_DTPIXELDEPTH(DEPTH,NPPC)*XF_CHANNELS(DEPTH,NPPC)-1,i*XF_DTPIXELDEPTH(DEPTH,NPPC)*XF_CHANNELS(DEPTH,NPPC)) = outputPixel[i];
		}
	}
	static uint64_t xfUDivResize (uint64_t in_n, unsigned short in_d)
	{
	#pragma HLS INLINE OFF
		uint32_t out_res = in_n/in_d;
		return out_res;
	}

	template<int NPPC, int T_SCALE_WIDTH, int T_SCALE_INT, int T_COMP_INDEX_WIDTH, int T_COMP_INDEX_INT>
	void scaleMult(ap_ufixed<T_SCALE_WIDTH,T_SCALE_INT> scalex, ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT> scaleXParallel[XF_NPIXPERCYCLE(NPPC)])
	{
	#pragma HLS INLINE
		for(int i=0; i<XF_NPIXPERCYCLE(NPPC); i++)
		{
	#pragma HLS PIPELINE
			scaleXParallel[i] = (ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT>)scalex*(ap_uint<8>)i;
		}
		return;
	}
	template<int T_INDEX_INT, int T_COMP_INDEX_WIDTH, int T_COMP_INDEX_INT, int T_SCALE_WIDTH, int T_SCALE_INT, int INTERPOLATION_TYPE>
	void scaleCompute(int currindex, ap_ufixed<T_SCALE_WIDTH,T_SCALE_INT> inscale, ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT> &ind_pre)
	{
		if(INTERPOLATION_TYPE==XF_INTERPOLATION_NN)
		{
			ind_pre = (ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT>)currindex*inscale + (ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT>)0.001;

		}
		else
		{
			ind_pre = ((ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT>)currindex + (ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT>)0.5)*inscale - (ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT>)0.5;
		}
	}
	template <int INTERPOLATION_TYPE, int T_COMP_INDEX_WIDTH, int T_COMP_INDEX_INT, int T_INDEX_INT, int T_SCALE_WIDTH, int T_SCALE_INT, int T_WEIGHT_WIDTH, int T_WEIGHT_INT, int NPPC>
	void computeInterpolation(int inrows, int incols, int j, int output_rows_count, ap_ufixed<T_SCALE_WIDTH,T_SCALE_INT> scalex, ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT> scaleXParallel[XF_NPIXPERCYCLE(NPPC)], ap_ufixed<T_SCALE_WIDTH,T_SCALE_INT> scaley, ap_uint<T_INDEX_INT> indexx[XF_NPIXPERCYCLE(NPPC)], ap_uint<T_INDEX_INT> &indexy, ap_uint<T_INDEX_INT> &nextYScale, ap_ufixed<T_WEIGHT_WIDTH,T_WEIGHT_INT> WeightX[XF_NPIXPERCYCLE(NPPC)], ap_ufixed<T_WEIGHT_WIDTH,T_WEIGHT_INT> &WeightY, ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT> indexx_pre_comp, ap_fixed<T_COMP_INDEX_WIDTH,T_COMP_INDEX_INT> indexy_pre_comp)
	{
		const int INDEX_INT = T_INDEX_INT;
		const int WEIGHT_WIDTH = T_WEIGHT_WIDTH;
		const int WEIGHT_INT = T_WEIGHT_INT;
		const int SCALE_WIDTH = T_SCALE_WIDTH;
		const int SCALE_INT = T_SCALE_INT;
		const int COMP_INDEX_WIDTH = T_COMP_INDEX_WIDTH;
		const int COMP_INDEX_INT = T_COMP_INDEX_INT;

		ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT> indexx_pre = 0;
		ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT> indexy_pre = 0;
		if(INTERPOLATION_TYPE==XF_INTERPOLATION_NN)
		{
			indexy_pre = indexy_pre_comp;
			nextYScale = indexy_pre+scaley;
			indexy = (ap_uint<INDEX_INT>)indexy_pre;
		}
		else
		{
			indexy_pre = indexy_pre_comp;
			nextYScale = indexy_pre+(ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT>)scaley;
			if(indexy_pre < 0)
			{
				indexy_pre = 0;
			}
			else if(indexy_pre > inrows-1)
			{
				indexy_pre = inrows-1;
			}
			indexy = (ap_uint<INDEX_INT>)indexy_pre;
			WeightY = ((ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT>)indexy_pre - (ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT>)indexy);
		}
		for(int i=0; i<XF_NPIXPERCYCLE(NPPC); i++)
		{
			ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT> indexy_pre = 0;
			if(INTERPOLATION_TYPE==XF_INTERPOLATION_NN)
			{
				indexx_pre = indexx_pre_comp + scaleXParallel[i];
				indexx[i] = (ap_uint<INDEX_INT>)indexx_pre;
			}
			else
			{
				indexx_pre = indexx_pre_comp + scaleXParallel[i];
				if(indexx_pre < 0)
				{
					indexx_pre = 0;
				}
				else if(indexx_pre > incols-1)
				{
					indexx_pre = incols-1;
				}
				indexx[i] = (ap_uint<INDEX_INT>)indexx_pre;
				WeightX[i] = ((ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT>)indexx_pre - (ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT>)indexx[i]);
			}
		}
	}

	template<int SRC_TYPE, int INHEIGHT, int INWIDTH, int NPPC, int OUTHEIGHT, int OUTWIDTH, int INTERPOLATION_TYPE, int MAX_DOWN_SCALE>
	void resizeNNBilinear(xf::Mat<SRC_TYPE, INHEIGHT, INWIDTH, NPPC> &imgInput,xf::Mat<SRC_TYPE, OUTHEIGHT, OUTWIDTH, NPPC> &imgOutput)
	{
	#pragma HLS ALLOCATION instances=scaleCompute limit=1 function
	#pragma HLS ALLOCATION instances=xfUDivResize limit=1 function
		const int INDEX_INT = 17;
		const int WEIGHT_WIDTH = 12;
		const int WEIGHT_INT = 2;
		const int SCALE_WIDTH = 32;
		const int SCALE_INT = 3;
		const int PRE_INDEX_WIDTH = 10;
		const int PRE_INDEX_INT = 17;
		const int COMP_INDEX_WIDTH = SCALE_WIDTH+PRE_INDEX_WIDTH;
		const int COMP_INDEX_INT = SCALE_INT+PRE_INDEX_INT;
		const int BUFFER_WORDS = MAX_DOWN_SCALE;
		const int BUFFER_DUP_FACTOR = (BUFFER_WORDS+1)>>1;

		uint64_t xnew,ynew;

		xnew = (imgInput.cols);///(float)(out_width<<XF_BITSHIFT(NPPC));
		ynew = (imgInput.rows);//(float)(out_height);

		xnew = xnew << 28;
		ynew = ynew << 28;
		ap_ufixed<SCALE_WIDTH,SCALE_INT> scalex,scaley;
		uint64_t Xscale64,Yscale64;
		Xscale64 = xfUDivResize (xnew , (imgOutput.cols));
		Yscale64 = xfUDivResize (ynew , (imgOutput.rows));
		ap_ufixed<64,32> temp_scale_conv;

		temp_scale_conv = Xscale64;
		temp_scale_conv = temp_scale_conv >> 28;
		scalex = temp_scale_conv;

		temp_scale_conv = Yscale64;
		temp_scale_conv = temp_scale_conv >> 28;
		scaley = temp_scale_conv;

		ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT> scaleXParallel[XF_NPIXPERCYCLE(NPPC)];
	#pragma HLS ARRAY_PARTITION variable=scaleXParallel complete dim=1
		scaleMult<NPPC,SCALE_WIDTH,SCALE_INT,COMP_INDEX_WIDTH,COMP_INDEX_INT>(scalex,scaleXParallel);

		XF_TNAME(SRC_TYPE,NPPC) line_buffer[3][BUFFER_DUP_FACTOR][INWIDTH>>(XF_BITSHIFT(NPPC))];
	#pragma HLS ARRAY_PARTITION variable=line_buffer complete dim=1
	#pragma HLS ARRAY_PARTITION variable=line_buffer complete dim=2
		int input_read_pointer=0;
		int read_rows_count = 0;
		int output_write_pointer = 0;
		for(int i=0; i<2; i++) //read two rows
		{
	#pragma HLS LOOP_TRIPCOUNT min=1 max=2
			for(int j=0; j<(imgInput.cols>>(XF_BITSHIFT(NPPC))); j++)
			{
	#pragma HLS PIPELINE
	#pragma HLS LOOP_TRIPCOUNT min=1 max=INWIDTH/NPPC
				for(int k=0; k<BUFFER_DUP_FACTOR ; k++)
				{
					line_buffer[i][k][j] = imgInput.read(input_read_pointer);
				}
				input_read_pointer++;
			}
			read_rows_count++;
		}
		int output_rows_count = 0;
		int first_row_index = 0;
		int second_row_index = 1;
		int read_row_index = 2;
		int loop_row_count = (imgOutput.rows > imgInput.rows)? imgOutput.rows : imgInput.rows;
		int loop_col_count = (imgOutput.cols > imgInput.cols)? imgOutput.cols : imgInput.cols;
		const int LOOPCOUNTROW = (INHEIGHT>OUTHEIGHT)? INHEIGHT: OUTHEIGHT;
		const int LOOPCOUNTCOL = (INWIDTH>OUTWIDTH)? INWIDTH: OUTWIDTH;
		ap_uint<INDEX_INT> indexx[XF_NPIXPERCYCLE(NPPC)];
	#pragma HLS ARRAY_PARTITION variable=indexx complete dim=1
		ap_uint<INDEX_INT> indexy = 0;
		ap_uint<INDEX_INT> nextYScale = 0;
		ap_ufixed<WEIGHT_WIDTH,WEIGHT_INT> WeightX[XF_NPIXPERCYCLE(NPPC)];
	#pragma HLS ARRAY_PARTITION variable=WeightX complete dim=1
		ap_ufixed<WEIGHT_WIDTH,WEIGHT_INT> WeightY = 0;
		XF_TNAME(SRC_TYPE,NPPC) P0Buf[BUFFER_DUP_FACTOR<<1];
	#pragma HLS ARRAY_PARTITION variable=P0Buf complete dim=1
		XF_TNAME(SRC_TYPE,NPPC) P1Buf[BUFFER_DUP_FACTOR<<1];
	#pragma HLS ARRAY_PARTITION variable=P1Buf complete dim=1

		ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT> indexx_pre_comp = 0;
		ap_fixed<COMP_INDEX_WIDTH,COMP_INDEX_INT> indexy_pre_comp = 0;

		for(int i=0; i<loop_row_count; i++)
		{
	#pragma HLS LOOP_TRIPCOUNT min=1 max=LOOPCOUNTROW
			scaleCompute<INDEX_INT, COMP_INDEX_WIDTH, COMP_INDEX_INT, SCALE_WIDTH, SCALE_INT, INTERPOLATION_TYPE>(output_rows_count, scaley, indexy_pre_comp);
			for(int j=0; j<(loop_col_count>>(XF_BITSHIFT(NPPC))); j++)
			{
	#pragma HLS PIPELINE
	#pragma HLS LOOP_TRIPCOUNT min=1 max=LOOPCOUNTCOL/NPPC

				scaleCompute<INDEX_INT, COMP_INDEX_WIDTH, COMP_INDEX_INT, SCALE_WIDTH, SCALE_INT, INTERPOLATION_TYPE>(j<<(XF_BITSHIFT(NPPC)), scalex, indexx_pre_comp);
				computeInterpolation<INTERPOLATION_TYPE, COMP_INDEX_WIDTH, COMP_INDEX_INT, INDEX_INT, SCALE_WIDTH, SCALE_INT, WEIGHT_WIDTH, WEIGHT_INT, NPPC>(imgInput.rows, imgInput.cols, j<<(XF_BITSHIFT(NPPC)), output_rows_count, scalex, scaleXParallel, scaley, indexx, indexy, nextYScale, WeightX, WeightY, indexx_pre_comp, indexy_pre_comp);
				int indexstores = first_row_index;
				XF_TNAME(SRC_TYPE,NPPC) read_pixel;
				bool flag_write = 0;
				if(read_rows_count != imgInput.rows)
				{
					if((nextYScale >= read_rows_count-1)) //check if the next index y needed needs to be read.
					{
						if(j<(imgInput.cols>>(XF_BITSHIFT(NPPC))))
						{
							read_pixel = imgInput.read(input_read_pointer);
							flag_write = 1;
							input_read_pointer++;
						}
						else
						{
							flag_write = 0;
						}
					}
					else
					{
						flag_write = 0;
					}
				}
				else
				{
					flag_write = 0;
				}

				if(indexstores == 0)
				{
					for(int k=0; k<BUFFER_DUP_FACTOR; k++)
					{
	#pragma HLS UNROLL
						int idx = (indexx[0]>>XF_BITSHIFT(NPPC))+(k<<1);
						int idx_nxt = idx + (indexx[0] == (imgInput.cols-1) ? 0 : 1);

						P0Buf[(k<<1)]   = line_buffer[0][k][idx];
						P0Buf[(k<<1)+1] = line_buffer[0][k][idx_nxt];
						P1Buf[(k<<1)]   = line_buffer[1][k][idx];
						P1Buf[(k<<1)+1] = line_buffer[1][k][idx_nxt];
					}
					if(flag_write)
					{
						for(int k=0; k<BUFFER_DUP_FACTOR; k++)
						{
	#pragma HLS UNROLL
							line_buffer[2][k][j] = read_pixel;
						}
					}
				}
				else if(indexstores == 1)
				{
					for(int k=0; k<BUFFER_DUP_FACTOR; k++)
					{
	#pragma HLS UNROLL
						int idx = (indexx[0]>>XF_BITSHIFT(NPPC))+(k<<1);
						int idx_nxt = idx + (indexx[0] == (imgInput.cols-1) ? 0 : 1);

						P0Buf[(k<<1)]   = line_buffer[1][k][idx];
						P0Buf[(k<<1)+1] = line_buffer[1][k][idx_nxt];
						P1Buf[(k<<1)]   = line_buffer[2][k][idx];
						P1Buf[(k<<1)+1] = line_buffer[2][k][idx_nxt];
					}
					if(flag_write)
					{
						for(int k=0; k<BUFFER_DUP_FACTOR; k++)
						{
	#pragma HLS UNROLL
							line_buffer[0][k][j] = read_pixel;
						}
					}
				}
				else
				{
					for(int k=0; k<BUFFER_DUP_FACTOR; k++)
					{
	#pragma HLS UNROLL
						int idx = (indexx[0]>>XF_BITSHIFT(NPPC))+(k<<1);
						int idx_nxt = idx + (indexx[0] == (imgInput.cols-1) ? 0 : 1);

						P0Buf[(k<<1)]   = line_buffer[2][k][idx];
						P0Buf[(k<<1)+1] = line_buffer[2][k][idx_nxt];
						P1Buf[(k<<1)]   = line_buffer[0][k][idx];
						P1Buf[(k<<1)+1] = line_buffer[0][k][idx_nxt];
					}
					if(flag_write)
					{
						for(int k=0; k<BUFFER_DUP_FACTOR; k++)
						{
	#pragma HLS UNROLL
							line_buffer[1][k][j] = read_pixel;
						}
					}
				}
				if((output_rows_count <= imgOutput.rows-1) && (((indexy == read_rows_count-1) && (read_rows_count == imgInput.rows)) || (indexy == read_rows_count-2)))
				{
					if(j<(imgOutput.cols>>(XF_BITSHIFT(NPPC))))
					{
						if(indexy == read_rows_count-1)
						{
							for(int k=0; k<BUFFER_WORDS; k++)
							{
	#pragma HLS UNROLL
								P0Buf[k] = P1Buf[k];
							}
						}
						XF_TNAME(SRC_TYPE,NPPC) temp_store_output;
						computeOutputPixel<SRC_TYPE,INTERPOLATION_TYPE,NPPC,INDEX_INT,BUFFER_WORDS>(P0Buf,P1Buf,((indexx[0]>>XF_BITSHIFT(NPPC))<<XF_BITSHIFT(NPPC)),indexx,WeightX,WeightY,temp_store_output);
						imgOutput.write(output_write_pointer,temp_store_output);
						output_write_pointer++;
					}
				}
			}
			if((output_rows_count <= imgOutput.rows-1) && (((indexy == read_rows_count-1) && (read_rows_count == imgInput.rows)) || (indexy == read_rows_count-2)))
			{
				output_rows_count++;
			}
			if(read_rows_count != imgInput.rows)
			{
				if((nextYScale >= read_rows_count-1)) //check if the next index y needed needs to be read.
				{
					first_row_index++;
					second_row_index++;
					read_row_index++;
					if(read_row_index == 3)
					{
						read_row_index = 0;
					}
					if(first_row_index == 3)
					{
						first_row_index = 0;
					}
					if(second_row_index == 3)
					{
						second_row_index = 0;
					}
					read_rows_count++;
				}
			}
		}
	}





	/***********************************************************************/

        template<int INTERPOLATION_TYPE, int TYPE, int SRC_ROWS, int SRC_COLS, int DST_ROWS, int DST_COLS, int NPC, int MAX_DOWN_SCALE>
        void resize (xf::Mat<TYPE, SRC_ROWS, SRC_COLS, NPC> & _src, xf::Mat<TYPE, DST_ROWS, DST_COLS, NPC> & _dst) {

#pragma HLS INLINE OFF

	assert(  ((INTERPOLATION_TYPE == XF_INTERPOLATION_NN)
	        ||(INTERPOLATION_TYPE == XF_INTERPOLATION_BILINEAR)
			||(INTERPOLATION_TYPE == XF_INTERPOLATION_AREA)) && "Incorrect parameters interpolation type");
	
	if(INTERPOLATION_TYPE == XF_INTERPOLATION_AREA)
		assert( (NPC == XF_NPPC1)  && "Supported Operation Mode for Area Interpolation is XF_NPPC1. XF_NPPC2, XF_NPPC4 and XF_NPPC8 are not supported ");
	else
		assert( ((NPC == XF_NPPC8) || (NPC == XF_NPPC4) || (NPC == XF_NPPC2) || (NPC == XF_NPPC1) )  && "Supported Operation Modes XF_NPPC8, XF_NPPC4, XF_NPPC2 and XF_NPPC1");

	if(NPC == XF_NPPC2)
		assert((((_src.cols & 1) == 0) && ((_dst.cols & 1) == 0)) && "Input and ouput image widths should be multiples of 2 in NPPC2 mode");
	if(NPC == XF_NPPC4)
		assert((((_src.cols & 3) == 0) && ((_dst.cols & 3) == 0)) && "Input and ouput image widths should be multiples of 4 in NPPC4 mode");
	if(NPC == XF_NPPC8)
		assert((((_src.cols & 7) == 0) && ((_dst.cols & 7) == 0)) && "Input and ouput image widths should be multiples of 8 in NPPC8 mode");

        	resizeNNBilinear<TYPE, SRC_ROWS, SRC_COLS, NPC, DST_ROWS, DST_COLS, INTERPOLATION_TYPE, MAX_DOWN_SCALE>(_src,_dst);
        }
};
commented

@bgouthamb,
Your workaround is workable

Many thanks

Joe

commented

Similar problem happens again when I call
xf::duplicateMat and xf::equalizeHist twice in a program
using a "class" is not workable this time. I guess it is because these function2 will call another xf:: function inside.
Any better workarounds?

commented

any workaround?