Uploader name via package-list
modem7 opened this issue · comments
Heya,
Off the back of #20 and #22, is there a way to do this via package-list instead of package-list-json?
I currently do the following --package-list <(ls packages)
to upload to github pages which works quite well, but being able to do a static "person who uploaded" would be quite a nice feature if this is possible!
E.g. adding an "--uploaded-by" flag or similar.
Cheers!
The JSON is used as a way to pass the metadata of who the uploader is (and other metadata like file hashes if passed) since dumb-pypi otherwise doesn't have a way to know it if it's just passed a simple list of files. Is there some other way it would be able to get that information?
Unfortunately not to my knowledge (at least natively).
So far, I've looked at jc which looks quite interesting.
Doing something like ls packages | jc --ls
presents the JSON output:
[{"filename":"attrs-21.4.0-py2.py3-none-any.whl"},{"filename":"borgbackup-1.2.0-cp310-cp310-linux_aarch64.whl"},{"filename":"borgbackup-1.2.0-cp310-cp310-linux_x86_64.whl"},{"filename":"borgbackup-1.2.1-cp310-cp310-linux_aarch64.whl"},{"filename":"borgbackup-1.2.1-cp310-cp310-linux_armv7l.whl"},{"filename":"borgbackup-1.2.1-cp310-cp310-linux_x86_64.whl"},{"filename":"borgmatic-1.6.2-py3-none-any.whl"},{"filename":"borgmatic-1.6.3-py3-none-any.whl"},{"filename":"certifi-2022.5.18.1-py3-none-any.whl"},{"filename":"charset_normalizer-2.0.12-py3-none-any.whl"},{"filename":"colorama-0.4.4-py2.py3-none-any.whl"},{"filename":"distlib-0.3.4-py2.py3-none-any.whl"},{"filename":"dumb_pypi-1.9.0-py2.py3-none-any.whl"},{"filename":"idna-3.3-py3-none-any.whl"},{"filename":"Jinja2-3.1.2-py3-none-any.whl"},{"filename":"jsonschema-4.6.0-py3-none-any.whl"},{"filename":"llfuse-1.4.2-cp310-cp310-linux_aarch64.whl"},{"filename":"llfuse-1.4.2-cp310-cp310-linux_armv7l.whl"},{"filename":"llfuse-1.4.2-cp310-cp310-linux_x86_64.whl"},{"filename":"MarkupSafe-2.1.1-cp310-cp310-linux_armv7l.whl"},{"filename":"MarkupSafe-2.1.1-cp310-cp310-musllinux_1_1_aarch64.whl"},{"filename":"MarkupSafe-2.1.1-cp310-cp310-musllinux_1_1_x86_64.whl"},{"filename":"msgpack-1.0.3-cp310-cp310-linux_aarch64.whl"},{"filename":"msgpack-1.0.3-cp310-cp310-linux_x86_64.whl"},{"filename":"msgpack-1.0.4-cp310-cp310-linux_armv7l.whl"},{"filename":"msgpack-1.0.4-cp310-cp310-musllinux_1_1_aarch64.whl"},{"filename":"msgpack-1.0.4-cp310-cp310-musllinux_1_1_x86_64.whl"},{"filename":"packaging-21.3-py3-none-any.whl"},{"filename":"pyparsing-3.0.9-py3-none-any.whl"},{"filename":"pyrsistent-0.18.1-cp310-cp310-linux_aarch64.whl"},{"filename":"pyrsistent-0.18.1-cp310-cp310-linux_armv7l.whl"},{"filename":"pyrsistent-0.18.1-cp310-cp310-linux_x86_64.whl"},{"filename":"requests-2.27.1-py2.py3-none-any.whl"},{"filename":"requests-2.28.0-py3-none-any.whl"},{"filename":"ruamel.yaml-0.17.21-py3-none-any.whl"},{"filename":"ruamel.yaml.clib-0.2.6-cp310-cp310-linux_aarch64.whl"},{"filename":"ruamel.yaml.clib-0.2.6-cp310-cp310-linux_armv7l.whl"},{"filename":"ruamel.yaml.clib-0.2.6-cp310-cp310-linux_x86_64.whl"},{"filename":"setuptools-62.3.2-py3-none-any.whl"},{"filename":"setuptools-62.3.3-py3-none-any.whl"},{"filename":"urllib3-1.26.9-py2.py3-none-any.whl"}]
But obviously this doesn't really help much regarding the uploader/hash etc (and it's an extra tool which isn't optimal for most).
In regard to --package-list <(ls packages)
, how does dumb-pypi handle the list? Does it take the series of filenames individually then does its thing?
If so, whilst it's not the nicest thing from a package list perspective (but useful for programmatic methods in small 1-2 man teams), is there a way to tack on an arbitrary owner to everything that is listed?
in order to have any of the additional information you need to synthesize that and pass it as JSON
dumb-pypi cannot generate that information
Basically the trade-off of dumb-pypi is that it only handles the generation of the index, it doesn't manage any of the metadata or store files itself like most other registries do. That means dumb-pypi is very simple but requires you to manage the metadata yourself and call it with the appropriate data.
If you just have a list of files, --package-list
can be used to generate a working registry but you won't have the optional metadata like uploader name.
For most people with complex setups dumb-pypi is likely only a part of their registry. For example at my day job, we have a small Python script that fetches the package list and metadata from S3 and shapes it into JSON so that it can call dumb-pypi.
Basically the trade-off of dumb-pypi is that it only handles the generation of the index, it doesn't manage any of the metadata or store files itself like most other registries do. That means dumb-pypi is very simple but requires you to manage the metadata yourself and call it with the appropriate data.
If you just have a list of files,
--package-list
can be used to generate a working registry but you won't have the optional metadata like uploader name.For most people with complex setups dumb-pypi is likely only a part of their registry. For example at my day job, we have a small Python script that fetches the package list and metadata from S3 and shapes it into JSON so that it can call dumb-pypi.
Ahhah,
That makes a lot of sense!
Thank you!
I think I've figured out a really dirty way of generating the required info, if it works, I'll post here in case anyone has the same question.
I'll close this for now, thank you for the elaboration!
@chriskuehl I've done a dirty script that emulates what I'm after and it seems to work quite nicely.
echo -n > packages.json
for FILE in $(ls packages | sed -e 's/"/\\"/g')
do
echo -en {\"filename\": \"${FILE}\", \"uploaded_by\": \"${UPLOADER}\", \"upload_timestamp\": $(date +%s)} '\n' >> packages.json
done
Full script (bar the building part):
# Create package.json
echo -n > packages.json
for FILE in $(ls packages | sed -e 's/"/\\"/g')
do
echo -en {\"filename\": \"${FILE}\", \"uploaded_by\": \"${UPLOADER}\", \"upload_timestamp\": $(date +%s)} '\n' >> packages.json
done
# Create index
docker run --rm -v "$(pwd)":/data -w /data -e PKG_URL=$PKG_URL -it modem7/dumb-pypi sh -c 'dumb-pypi --package-list-json packages.json \
--packages-url $PKG_URL \
--output-dir .'
It's not pretty, and a couple of issues (things like setting
if the uploader and time are always the same what's the point? what are you trying to accomplish?
if the uploader and time are always the same what's the point? what are you trying to accomplish?
In this particular case, purely for prettiness in all fairness. I wish I had a better reason!
But I do agree that there needs to be a better method, at some point, I'll figure something out (maybe a temp folder where new wheels are put into, timestamp is generated, then use the --previous-package-list-json
option to create a changelog or something).
With the way I'm building wheels for this particular project (python3 -m pip install -U pip setuptools wheel && python3 -m pip wheel --no-cache-dir --wheel-dir ./packages/ -r borgmatic/requirements.txt -f ./packages/
), I'm not quite sure how to get proper metadata out of it at the moment.
Basically, I'm a noob trying to paw his way through things currently!
ok because it seems like you're doing it but don't know why lol -- you probably shouldn't do that since you're going to generate diffs every single generation for no reason and no benefit
the uploader is specifically meant to be used when you're in a multi-user environment and want to track who added things -- if it's always you and you don't keep track of the timestamp then there's no reason to use that feature
ok because it seems like you're doing it but don't know why lol -- you probably shouldn't do that since you're going to generate diffs every single generation for no reason and no benefit
the uploader is specifically meant to be used when you're in a multi-user environment and want to track who added things -- if it's always you and you don't keep track of the timestamp then there's no reason to use that feature
Cheers for the push.
I've edited my particular script to get the appropriate build time of the files (turns out git clone
makes all the file dates the same, real useful).
More for others:
# Create package.json
git ls-files | xargs -I{} git log -1 --date=format:%Y%m%d%H%M.%S --format='touch -t %ad "{}"' "{}" | $SHELL
echo -n > packages.json
for FILE in $(ls packages | sed -e 's/"/\\"/g')
do
TIMESTAMP=$(date -r packages/${FILE} +%s)
echo -en {\"filename\": \"${FILE}\", \"uploaded_by\": \"${UPLOADER}\", \"upload_timestamp\": ${TIMESTAMP}\} '\n' >> packages.json
done
# Create index
docker run --rm --user=$PUID:$PGID -v "$(pwd)":/data -w /data -e PKG_URL=$PKG_URL -it modem7/dumb-pypi sh -c 'dumb-pypi \
--package-list-json packages.json \
--packages-url $PKG_URL \
--output-dir .'
Yes, that still has the hurdle of the uploaded_by
, but the date was the most important thing to actually track for my mostly single person use case using github pages.