timhutton / twitter-archive-parser

Python code to parse a Twitter archive and output in various ways

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Convert this to a Python package so it becomes a CLI installable by pipx

mikeckennedy opened this issue · comments

I recommend that you convert this to a Python package so it becomes a CLI installable by pipx. This will dramatically simplify usage for end users.

Let's assume you want to name it twitter_archive on pypi.org, then:

# Once pipx is install, can be done via homebrew 
# or via 'python -m pip install --user --upgrade pipx' if Python is installed

$ pipx install twitter_archive
$ twitter_archive_markdown # <-- run in the archive directory
$ twitter_archive_images   # <-- run in the archive directory

I created a proof of concept with these CLI names but they are just a name in a file, we can call them whatever you'd like.

https://github.com/mikeckennedy/twitter-archive-parser/tree/installable

Here it is in action:

Screen Shot 2022-11-14 at 9 27 48 PM

Are you interested in this? Just let me know what you want the package name, and CLI command names to be and I'll send you a PR.

Hello. Thanks for suggesting this! I am totally on board with the wish to make the project easier to use for less technical people.

But does this suggestion simplify the installation? Previously they had to:

  • install Python
  • download the script (e.g. right-click save-as)
  • open a command-line and navigate to the right place (people seem to manage this)
  • run python parser.py

With this suggestion they have to:

It does avoid the save-as issue (#17) that some people are hitting. But now people have to read two sets of installation instructions: for Python and then pipx. And for devs there's a maintenance burden of keep the pipx version up to date?

Edit: Also, it makes it harder for users to tweak the script. You don't even need to understand python to change Replying to to Répondre à for example, or Tweet-Archive to Archivo-de-tuits. The same objection applies to #12 and even to my thoughts about having this as a client-side web page.

I'd appreciate others' thoughts on this.

From the perspective of someone not used to handle Python scripts: the first (and current) scenario is simpler to understand in my eyes because it doesn't require and Python-specific tools (what's pipx?).

  • open a command-line and navigate to the right place (people seem to manage this)

Yes, because you also have to do this when dealing with scripts written in other languages. That's common ground.

Hi @timhutton

You're welcome to take it or leave it depending on how you see it improving or degrading the experience. But I think it is simpler in this regard. You suggested the steps are:

install Python
install pipx using the command-line: instructions for macOS, Linux, Windows
run pipx install twitter_archive
run twitter_archive_markdown

But pipx can be installed by package managers so it's:

  • brew install pipx (this installs Python if necessary)
  • pipx install twitter_archive (this creates a venv and installs requests, puts the command in the path)
  • twitter_archive_markdown

Compared to the current:

  • install python
  • put python in the path
  • install requests (into the correct Python, with the right permissions [--user, not global])
  • copy the python files
  • python parse.py (using the correct Python)

That said, I don't know if pipx is available in winget (thought it would be easy to make it so I'm sure).


One other thing that occurred to me after posting this:

This is not a replacement of how things work. It's flexibility. There is no reason you couldn't say:

To use this tool, just download parse.py and .... Or if you use pipx, you can simply pipx install twitter_archive (done).

Having it available to install as a CLI tool for Python doesn't mean they can't do what you are recommending now.

@mikeckennedy Yes, it could live alongside our current instructions, it's a good point.

Can we talk about the maintenance of a pipx package? It's not something I've done before. Does it require manual updates? Could we need to set up a github actions for it?

That's a good question. pipx doesn't own or maintain packages. It is simply a tool that installs python packages from pypi.org by automatically creating isolated venvs and puts them commands in the path.

So the question becomes: How is the maintenance of a python package on pypi.

It can be automated. Have a look at these GH Action details:

So if you can get that action set up (sorry I have zero experience with it), then it would be pretty automatic I think.

Alternatively, you could have a more complex install statement:

$ pipx install git+https://github.com/timhutton/twitter-archive-parser.git

Rather than just pipx install name, then maintenance is zero. It just installs straight out of this repo. That might be worthwhile to keep it simple.

The more I think about this, I would recommend just letting it pipx install out of the github repo. It's adds zero extra effort to maintain and keep in sync. And it still allows the pipx install + global command experience.

See #58 for an idea how to implment it (it works on my machine :-)).

Thanks @jankatins I already have it implemented with pyproject.toml and hatch (the latest recommendation from python) in the branch I referenced before:

https://github.com/mikeckennedy/twitter-archive-parser/tree/installable

I just needed two things from @timhutton before I did a PR.

  1. Is this a welcome contribution?
  2. What is the CLI names he'd like to use

BTW, yours and mine are quite similar but I chose shorter CLI names and use hatchling rather than setuptools (based on this guidance from python). But I don't really care which one Tim wants to adopt.

What am I doing wrong on Ubuntu WSL?

$ sudo apt-get install python3-pip python3-venv
$ python3 -m pip install --user pipx
$ python3 -m pipx ensurepath
$ pipx install git+https://github.com/jankatins/twitter-archive-parser.git@jan-make-installable
$ twitter-archive-parser
Traceback (most recent call last):
  File "/home/tim/.local/bin/twitter-archive-parser", line 5, in <module>
    from parser import main
ImportError: cannot import name 'main' from 'parser' (/usr/lib/python3.8/lib-dynload/parser.cpython-38-x86_64-linux-gnu.so)

Edit: Ah, this one worked:

$ pipx install git+https://github.com/mikeckennedy/twitter-archive-parser.git@installable
$ twitter_archive_markdown
Error: Failed to load ./data/account.js. Start this script in the root folder of your Twitter archive.

So I have it running via pipx, which is great. But I must be missing something, because this seems much easier:

wget https://raw.githubusercontent.com/timhutton/twitter-archive-parser/main/parser.py
python3 parser.py

And has the advantage that the user gets the source code and can edit it locally.

Is the problem that python is harder to install than pipx on macOS?

ImportError: cannot import name 'main' from 'parser' (/usr/lib/python3.8/lib-dynload/parser.cpython-38-x86_64-linux-gnu.so)

I think this has a name clash on your system: it tries to import it from the so file instead of the parser.py one :-( The other branch puts the parser.py into a subdir and thats how that works .

So the problematic part is that the file is named 'parser.py' and in the root of the repo instead of a package or having a more specific name.

Advantages of pipx from my perspective:

  • I have a pipx installed for other reasons, so it's just another package
  • For my taste, having a nice name to call on the cli feels much "cleaner" than using python with a wget-ed file :-) (I don't want to hack on it, just use it... :-))
  • I can use pipx to install updates with the same tools/commands as all other pipx installed packages (instead of remembering wget)
  • Pipx uses a venv and so keeps my system python cleans (ok, probably a non-argument, as pip also comes with requests)

Hey @timhutton

Agree with everything @jankatins said above. And, it's easier in a sense. But you're assuming several things are preconditions.

  1. Python is installed
  2. Python is in the path
  3. The right version of Python is in the path
  4. requests is installed
  5. the user has permissions to run the command pip install requests
  6. the user is in the correct directory when they run wget
  7. they have wget installed

pipx handles all of these in a single pipx install twitter-archive-parser call. All of them. Provided pipx arrived via a package manager. The first is a necessary if you want to pip install pipx instead, which is what I do because I always have Python around.

Moreover, as Jan said about the clean API, it also allows them to run the command in multiple directories without copying the parser.py over and over. It becomes a global command.