jezcope / pyrefine

Execute OpenRefine JSON scripts without OpenRefine (or Java)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyRefine

Documentation Status Updates

OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. However, in order to execute that script on a new dataset, you need to manually import it through the graphical interface or set up a BatchRefine server, neither of which is quick.

PyRefine allows you to execute OpenRefine JSON scripts against datasets without firing up a full Java/OpenRefine server. It has a commandline tool for quick use, or you can use it as a library to integrate it into your pandas-based data analysis pipeline.

More details in this blog post.

Please note: PyRefine is still very much alpha-quality. It probably doesn't work exactly how you're expecting right now. That said, please try it out, and consider :doc:`contributing`!

Features

  • Execute OpenRefine JSON against a dataset from the command line
  • Execute OpenRefine JSON from a Python script

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

Execute OpenRefine JSON scripts without OpenRefine (or Java)

License:MIT License


Languages

Language:Python 95.7%Language:Makefile 4.3%