salesforce / progen

Official release of the ProGen models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate Sequences with Different runs

jamesrgraham opened this issue · comments

Hello,

I had some GPU RAM issues when trying to generate sequences. I found I can reliable produce only 15 sequences at a time.

But when I do separate runs to get up to 100 sequences, I find some of the output sequences to be 100% ID to sequences produced by other runs.

Is this process deterministic?

I ran six runs of --num-samples 15 and one of --num-samples 10 to get 100 sequences.

When I remove duplicates, I get a total of 21 unique sequences.