Implementation of principles from the Good Research Guide Handbook
https://goodresearch.dev/index.html
https://github.com/patrickmineault/zipf
Using poetry
rather than conda
.
It uses poetry
rather than conda
to package and manage dependencies.
I've created a command line to run this from. See the line in the toml file:
[tool.poetry.scripts]
zipfs-law = "zipfs_law.cli:main"
cli contains the workflow dag that creates the analysis.
I've separated most of the input and output into separate functions, as that big script was quite big.
I've downloaded these samples in the data directory.
- Dracula →
data/dracula.txt
- Frankenstein →
data/frankenstein.txt
- Jane Eyre →
data/jane_eyre.txt
Now run at a terminal that has poetry activated:
zipfs-law data output
Open the notebook (jupyter notebook
) and run all cells to check that the empirical distributions roughly look like the theoretical ones.