mlfoundations / dclm

DataComp for Language Models

mlfoundations/dclm Issues

Missing scale configs?
Closed 4 days ago1
How to train and fine-tuning model
Updated 5 days ago
BFF code？
Updated 6 days ago
Missing files or bugs in evaluation code?
Updated 8 days ago
Any web demo?
Updated 8 days ago
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Closed 8 days ago1
Which data file correspond to table 4 fasttext?
Closed a month ago7
Unable to run `eval/eval_openlm_ckpt.py`
Closed 13 days ago10
Ray Actor dies during tokenization process
Updated 14 days ago1
ArrowConversionError when running tokenization
Closed 15 days ago12
Causal Transformer for Perplexity
Closed 18 days ago3
Would you share the 0.28T token dataset for achieve highest scores in 7B-2x experiment?
Closed 18 days ago1
Tokenization file missing
Closed 24 days ago2
Accessing S3 bucket dcnlp-west
Closed a month ago4
Data download script
Closed 25 days ago2
Request to DCLM-Pool
Closed a month ago3
Duplicated licenses
Closed a month ago1
How to find CORE, MMLU, EXTENDED values in the eval json?
Closed a month ago2
Why are all of these leaderboard empty?
Closed a month ago1
Question regarding the evaluation
Closed a month ago1