YangLing0818 / RPG-DiffusionMaster

I'd like to ask, during the stage of regional latent space fusion in different areas, is this method really resizing to the corresponding positions? Looking at the code, it seems that only the latent spaces of the corresponding positions in each regional image are fused, which is quite confusing?

Yes, I have the same question. The latent feature of the sub-region is directly cropped and not resized.

RPG-DiffusionMaster/cross_attention.py

Lines 127 to 128 in d2a26e9

    
           out = out[:,int(latent_h*drow.start) + addout:int(latent_h*drow.end), 
        
                       int(latent_w*dcell.start) + addin:int(latent_w*dcell.end),:]

Then, the cropped features are fused with the corresponding positions of the base latent features.

RPG-DiffusionMaster/cross_attention.py

Lines 129 to 133 in d2a26e9

    
           if self.usebase :  
        
               # outb_t = outb[:,:,int(latent_w*drow.start):int(latent_w*drow.end),:].clone() 
        
               outb_t = outb[:,int(latent_h*drow.start) + addout:int(latent_h*drow.end), 
        
                               int(latent_w*dcell.start) + addin:int(latent_w*dcell.end),:].clone() 
        
               out = out * (1 - dcell.base) + outb_t * dcell.base

It seems not resized as the paper say. And I'd like to know why this is done, is it because resize doesn't make sense?

	out = out[:,int(latent_hdrow.start) + addout:int(latent_hdrow.end),
	int(latent_wdcell.start) + addin:int(latent_wdcell.end),:]

	if self.usebase :
	# outb_t = outb[:,:,int(latent_wdrow.start):int(latent_wdrow.end),:].clone()
	outb_t = outb[:,int(latent_hdrow.start) + addout:int(latent_hdrow.end),
	int(latent_wdcell.start) + addin:int(latent_wdcell.end),:].clone()
	out = out * (1 - dcell.base) + outb_t * dcell.base

Is It Resizing or Just Fusion at Corresponding Positions?