princeton-vl / RAFT

In the paper, the full (RAFT 4.8M) encoder has residual blocks with output sizes of 64 (1 x 64), 128 (2 x 64), and 192 (3 x 64), which follows the same pattern as for the smaller (RAFT-S 1M) encoder with 32 (1 x 32), 64 (2 x 32) and 96 (3 x 32).

In the code, this matches for the small encoder:

RAFT/core/extractor.py

Lines 216 to 218 in aac9dd5

    
           self.layer1 = self._make_layer(32,  stride=1) 
        
           self.layer2 = self._make_layer(64, stride=2) 
        
           self.layer3 = self._make_layer(96, stride=2)

but for the full encoder, those values do not match:

RAFT/core/extractor.py

Lines 139 to 141 in aac9dd5

    
           self.layer1 = self._make_layer(64,  stride=1) 
        
           self.layer2 = self._make_layer(96, stride=2) 
        
           self.layer3 = self._make_layer(128, stride=2)

Instead of [1,2,3] x 64, the pattern is here [2,3,4] x 32, so that instead of doubling the size of the feature vectors from small to full encoder, there is only a linear increase of 32.

Is this intended?

	self.layer1 = self._make_layer(32, stride=1)
	self.layer2 = self._make_layer(64, stride=2)
	self.layer3 = self._make_layer(96, stride=2)

	self.layer1 = self._make_layer(64, stride=1)
	self.layer2 = self._make_layer(96, stride=2)
	self.layer3 = self._make_layer(128, stride=2)

BasicEncoder ResidualBlock dimensions do not match with paper