Unable to replicate Results

Question

Unable to replicate Results

Julianwustl opened this issue a year ago · comments

Hello,

I am currently trying to test your model's weights with some data, but I am encountering some issues. I have set up the final linear layer as per your design, and am using the Clip-Vit-L-14 weights from Hugging Face. Despite following the methodology outlined in your paper and implementing it in my own model, the results don't seem to match the expected outcomes.

Here is a brief outline of my current setup:

class SingleLayerHead(nn.Module):
    def __init__(self, input_size: int, output_size: int):
        super().__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, features, **kwargs):
        return self.fc(features)

class ImageBatchClassifier(LightningModule):
    def __init__(
        self,
        config: Dict[str, Any],
        encoder: nn.Module,
        head: nn.Module,
        loss: nn.Module,
    ):
        super().__init__()
        self.config = config
        self.encoder = encoder
        self.head = head
        self.loss = loss

        # Here we freeze all the weights of the encoder
        for param in self.encoder.parameters():
            param.requires_grad = False

if __name__=="__main__":
    model = load_model(args)
    state_dict = torch.load("your_weights",map_location='cpu')
    state_dict = {'fc.'+k: v for k, v in state_dict.items()}
    model.head.load_state_dict(state_dict)

As can be seen, I've frozen all the weights of the encoder. When running the code, the initial Binary Cross Entropy (BCE) Loss is surprisingly high, starting around 5.

I'd appreciate any guidance or suggestions on what could be the possible reasons for this issue, and how to resolve it.

Thank you in advance

Edit.: This is not the working code, just an simple implementation idea

Utkarsh Ojha · Answer 1 · Tue Aug 22 2023 23:22:59 GMT+0800 (China Standard Time)

Apologies for the delayed response. We have used the official CLIP models released by open-ai (see the beginning of models/clip/clip.py for the links to the weights). I suppose the weights released by huggingface are different compared to open-ai. That should be the main issue. But if the issue somehow still persists, please elaborate on which particular results seem to not match the ones in the paper.

semikonductor · Answer 2 · Tue Jul 16 2024 16:20:01 GMT+0800 (China Standard Time)

Hello,

I am currently trying to test your model's weights with some data, but I am encountering some issues. I have set up the final linear layer as per your design, and am using the Clip-Vit-L-14 weights from Hugging Face. Despite following the methodology outlined in your paper and implementing it in my own model, the results don't seem to match the expected outcomes.

Here is a brief outline of my current setup:
class SingleLayerHead(nn.Module):
    def __init__(self, input_size: int, output_size: int):
        super().__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, features, **kwargs):
        return self.fc(features)

class ImageBatchClassifier(LightningModule):
    def __init__(
        self,
        config: Dict[str, Any],
        encoder: nn.Module,
        head: nn.Module,
        loss: nn.Module,
    ):
        super().__init__()
        self.config = config
        self.encoder = encoder
        self.head = head
        self.loss = loss

        # Here we freeze all the weights of the encoder
        for param in self.encoder.parameters():
            param.requires_grad = False

if __name__=="__main__":
    model = load_model(args)
    state_dict = torch.load("your_weights",map_location='cpu')
    state_dict = {'fc.'+k: v for k, v in state_dict.items()}
    model.head.load_state_dict(state_dict)
As can be seen, I've frozen all the weights of the encoder. When running the code, the initial Binary Cross Entropy (BCE) Loss is surprisingly high, starting around 5.

I'd appreciate any guidance or suggestions on what could be the possible reasons for this issue, and how to resolve it.

Thank you in advance

Edit.: This is not the working code, just an simple implementation idea

Have you reproduced the outcome yet?