Miscellaneous data manipulation tools
Either pip install using command
pip install .
or use
python setup.py install
and test with
python setup.py test
Ideally the function name should be used as the filename and the argparse group. In the following example,
the function name is remove_dat_junk
.
- Add your script to directory
datafunk
e.g.datafunk/remove_dat_junk.py
- Update
datafunk/__init__.py
anddatafunk/subcommands/__init__.py
by adding the command name to theall
lists - Add a new command line parameter section to
datafunk/__main__.py
This should start by defining a new argparse group, e.g.
subparser_remove_dat_junk = subparsers.add_parser(
"remove_dat_junk",
usage="datafunk remove_dat_junk -i <input>",
help="Example command",
)
then include all the arguments, e.g.
subparser_remove_dat_junk.add_argument(
"-i",
"--input_file",
dest="input_file",
action="store",
type=str,
help="Input file: something about the input file format",
)
and end with the entry point
subparser_remove_dat_junk.set_defaults(func=datafunk.subcommands.remove_dat_junk.run)
- Create file
datafunk/subcommands/remove_dat_junk.py
which defines how torun
given the command line parameters. Alternatively, specify the entrypoint within the main script file. - If you have tests, add the test data to a subdirectory e.g.
tests/data/remove_dat_junk
, and add the test filetests/remove_dat_junk_test.py
. This file should contain unit tests which have namestest_*
and ideally be informative about which function they test/the result. - If the script has any new dependencies, update
install_requires
section ofsetup.py
- this means that it can be pip installed from a conda environment file without a hitch.
Function List
- clean_names
- merge_fasta
- remove_fasta
- filter_low_coverage
- process_gisaid_data
- sam_2_fasta
- phylotype_consensus