EleutherAI / the-pile

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make treemaps

leogao2 opened this issue · comments

We should find a way to generate nice treemaps. I think it would be a great way of visualizing how space is allocated in the pile. Features we'd want would include being able to do color coded two-level hierarchy (i.e first we split by category, then by dataset), and it should actually look nice and not like someone drew it in paint.

Something like this but with words and programmatically generated would be perfect:
image

I know how to do this and can do it easily once we have the final sizes.