Cerebras / modelzoo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seek data management advise

nuoma opened this issue · comments

Hi! I'm trying to put slim pajama into mongoDB for easier post processing, however that would be too large for a single db. Would you mind give me some advise of what might be a better way of doing this? I'm thinking of separating into different collections based on either source type, or just based on your raw 'chunk1-10' approach.

Many thanks!

Hello @nuoma , I feel that this is a great question for our Cerebras discord channel. Have you posted this there yet?

thank you. will close the issue when I find a satisfying answer.