chriskuehl / dumb-pypi

PyPI generator, backed entirely by static files

Home Page:https://chriskuehl.github.io/dumb-pypi/test-repo/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uploader name via package-list

modem7 opened this issue · comments

Heya,

Off the back of #20 and #22, is there a way to do this via package-list instead of package-list-json?

I currently do the following --package-list <(ls packages) to upload to github pages which works quite well, but being able to do a static "person who uploaded" would be quite a nice feature if this is possible!

E.g. adding an "--uploaded-by" flag or similar.

Cheers!

The JSON is used as a way to pass the metadata of who the uploader is (and other metadata like file hashes if passed) since dumb-pypi otherwise doesn't have a way to know it if it's just passed a simple list of files. Is there some other way it would be able to get that information?

Unfortunately not to my knowledge (at least natively).

So far, I've looked at jc which looks quite interesting.

Doing something like ls packages | jc --ls presents the JSON output:

[{"filename":"attrs-21.4.0-py2.py3-none-any.whl"},{"filename":"borgbackup-1.2.0-cp310-cp310-linux_aarch64.whl"},{"filename":"borgbackup-1.2.0-cp310-cp310-linux_x86_64.whl"},{"filename":"borgbackup-1.2.1-cp310-cp310-linux_aarch64.whl"},{"filename":"borgbackup-1.2.1-cp310-cp310-linux_armv7l.whl"},{"filename":"borgbackup-1.2.1-cp310-cp310-linux_x86_64.whl"},{"filename":"borgmatic-1.6.2-py3-none-any.whl"},{"filename":"borgmatic-1.6.3-py3-none-any.whl"},{"filename":"certifi-2022.5.18.1-py3-none-any.whl"},{"filename":"charset_normalizer-2.0.12-py3-none-any.whl"},{"filename":"colorama-0.4.4-py2.py3-none-any.whl"},{"filename":"distlib-0.3.4-py2.py3-none-any.whl"},{"filename":"dumb_pypi-1.9.0-py2.py3-none-any.whl"},{"filename":"idna-3.3-py3-none-any.whl"},{"filename":"Jinja2-3.1.2-py3-none-any.whl"},{"filename":"jsonschema-4.6.0-py3-none-any.whl"},{"filename":"llfuse-1.4.2-cp310-cp310-linux_aarch64.whl"},{"filename":"llfuse-1.4.2-cp310-cp310-linux_armv7l.whl"},{"filename":"llfuse-1.4.2-cp310-cp310-linux_x86_64.whl"},{"filename":"MarkupSafe-2.1.1-cp310-cp310-linux_armv7l.whl"},{"filename":"MarkupSafe-2.1.1-cp310-cp310-musllinux_1_1_aarch64.whl"},{"filename":"MarkupSafe-2.1.1-cp310-cp310-musllinux_1_1_x86_64.whl"},{"filename":"msgpack-1.0.3-cp310-cp310-linux_aarch64.whl"},{"filename":"msgpack-1.0.3-cp310-cp310-linux_x86_64.whl"},{"filename":"msgpack-1.0.4-cp310-cp310-linux_armv7l.whl"},{"filename":"msgpack-1.0.4-cp310-cp310-musllinux_1_1_aarch64.whl"},{"filename":"msgpack-1.0.4-cp310-cp310-musllinux_1_1_x86_64.whl"},{"filename":"packaging-21.3-py3-none-any.whl"},{"filename":"pyparsing-3.0.9-py3-none-any.whl"},{"filename":"pyrsistent-0.18.1-cp310-cp310-linux_aarch64.whl"},{"filename":"pyrsistent-0.18.1-cp310-cp310-linux_armv7l.whl"},{"filename":"pyrsistent-0.18.1-cp310-cp310-linux_x86_64.whl"},{"filename":"requests-2.27.1-py2.py3-none-any.whl"},{"filename":"requests-2.28.0-py3-none-any.whl"},{"filename":"ruamel.yaml-0.17.21-py3-none-any.whl"},{"filename":"ruamel.yaml.clib-0.2.6-cp310-cp310-linux_aarch64.whl"},{"filename":"ruamel.yaml.clib-0.2.6-cp310-cp310-linux_armv7l.whl"},{"filename":"ruamel.yaml.clib-0.2.6-cp310-cp310-linux_x86_64.whl"},{"filename":"setuptools-62.3.2-py3-none-any.whl"},{"filename":"setuptools-62.3.3-py3-none-any.whl"},{"filename":"urllib3-1.26.9-py2.py3-none-any.whl"}]

But obviously this doesn't really help much regarding the uploader/hash etc (and it's an extra tool which isn't optimal for most).

In regard to --package-list <(ls packages), how does dumb-pypi handle the list? Does it take the series of filenames individually then does its thing?

If so, whilst it's not the nicest thing from a package list perspective (but useful for programmatic methods in small 1-2 man teams), is there a way to tack on an arbitrary owner to everything that is listed?

in order to have any of the additional information you need to synthesize that and pass it as JSON

dumb-pypi cannot generate that information

Basically the trade-off of dumb-pypi is that it only handles the generation of the index, it doesn't manage any of the metadata or store files itself like most other registries do. That means dumb-pypi is very simple but requires you to manage the metadata yourself and call it with the appropriate data.

If you just have a list of files, --package-list can be used to generate a working registry but you won't have the optional metadata like uploader name.

For most people with complex setups dumb-pypi is likely only a part of their registry. For example at my day job, we have a small Python script that fetches the package list and metadata from S3 and shapes it into JSON so that it can call dumb-pypi.

Basically the trade-off of dumb-pypi is that it only handles the generation of the index, it doesn't manage any of the metadata or store files itself like most other registries do. That means dumb-pypi is very simple but requires you to manage the metadata yourself and call it with the appropriate data.

If you just have a list of files, --package-list can be used to generate a working registry but you won't have the optional metadata like uploader name.

For most people with complex setups dumb-pypi is likely only a part of their registry. For example at my day job, we have a small Python script that fetches the package list and metadata from S3 and shapes it into JSON so that it can call dumb-pypi.

Ahhah,

That makes a lot of sense!

Thank you!

I think I've figured out a really dirty way of generating the required info, if it works, I'll post here in case anyone has the same question.

I'll close this for now, thank you for the elaboration!

@chriskuehl I've done a dirty script that emulates what I'm after and it seems to work quite nicely.

echo -n > packages.json
for FILE in $(ls packages | sed -e 's/"/\\"/g')
do
echo -en {\"filename\": \"${FILE}\", \"uploaded_by\": \"${UPLOADER}\", \"upload_timestamp\": $(date +%s)} '\n' >> packages.json
done

Full script (bar the building part):

# Create package.json
echo -n > packages.json
for FILE in $(ls packages | sed -e 's/"/\\"/g')
do
echo -en {\"filename\": \"${FILE}\", \"uploaded_by\": \"${UPLOADER}\", \"upload_timestamp\": $(date +%s)} '\n' >> packages.json
done

# Create index
docker run --rm -v "$(pwd)":/data -w /data -e PKG_URL=$PKG_URL -it modem7/dumb-pypi sh -c 'dumb-pypi --package-list-json packages.json \
   --packages-url $PKG_URL \
   --output-dir .'

image

It's not pretty, and a couple of issues (things like setting $uploader and the $(date +%s) will all be the same for all files), but it does the job for my particular use case for now.

if the uploader and time are always the same what's the point? what are you trying to accomplish?

if the uploader and time are always the same what's the point? what are you trying to accomplish?

In this particular case, purely for prettiness in all fairness. I wish I had a better reason!

But I do agree that there needs to be a better method, at some point, I'll figure something out (maybe a temp folder where new wheels are put into, timestamp is generated, then use the --previous-package-list-json option to create a changelog or something).

With the way I'm building wheels for this particular project (python3 -m pip install -U pip setuptools wheel && python3 -m pip wheel --no-cache-dir --wheel-dir ./packages/ -r borgmatic/requirements.txt -f ./packages/), I'm not quite sure how to get proper metadata out of it at the moment.

Basically, I'm a noob trying to paw his way through things currently!

ok because it seems like you're doing it but don't know why lol -- you probably shouldn't do that since you're going to generate diffs every single generation for no reason and no benefit

the uploader is specifically meant to be used when you're in a multi-user environment and want to track who added things -- if it's always you and you don't keep track of the timestamp then there's no reason to use that feature

ok because it seems like you're doing it but don't know why lol -- you probably shouldn't do that since you're going to generate diffs every single generation for no reason and no benefit

the uploader is specifically meant to be used when you're in a multi-user environment and want to track who added things -- if it's always you and you don't keep track of the timestamp then there's no reason to use that feature

Cheers for the push.

I've edited my particular script to get the appropriate build time of the files (turns out git clone makes all the file dates the same, real useful).

More for others:

# Create package.json
git ls-files | xargs -I{} git log -1 --date=format:%Y%m%d%H%M.%S --format='touch -t %ad "{}"' "{}" | $SHELL
echo -n > packages.json
for FILE in $(ls packages | sed -e 's/"/\\"/g')
do
TIMESTAMP=$(date -r packages/${FILE} +%s)
echo -en {\"filename\": \"${FILE}\", \"uploaded_by\": \"${UPLOADER}\", \"upload_timestamp\": ${TIMESTAMP}\} '\n' >> packages.json
done

# Create index
docker run --rm --user=$PUID:$PGID -v "$(pwd)":/data -w /data -e PKG_URL=$PKG_URL -it modem7/dumb-pypi sh -c 'dumb-pypi \
   --package-list-json packages.json \
   --packages-url $PKG_URL \
   --output-dir .'

Yes, that still has the hurdle of the uploaded_by, but the date was the most important thing to actually track for my mostly single person use case using github pages.

image