recrm / ArchiveTools

A collection of tools for archiving and analysing the internet.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Python scripts are not importable as modules

ThomasA opened this issue · comments

I am developing a small tool that detects and classifies object in images extracted from WARC archives. I use some functionality from warc-extractor.py in my Python code. I order to do that, I have to rename 'warc-extractor.py' to 'warc_extractor.py' since the dash is not a valid character in a module name.
I propose to change the names of the Python scripts, replacing '-' by '_'. I can submit a pull request if you like?

Yes I would accept that pull request.

However, this project was never really intended to be an importable warc library. Ideally I would modify it to use the warcio library instead. I would suggest seeing if this repository has what you need https://github.com/webrecorder/warcio as it is a more maintained library.

Thanks for that pointer. I was looking for a WARC library and yours was the first I came across. I will try that first then.