Exceen / 4chan-downloader

Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filenames...

vt-idiot opened this issue · comments

I've run into a few issues with filenames. One of them isn't entirely Windows specific, and I have an idea of what needs to be done to fix it, but no idea how to. Duplicate filenames within a thread simply overwrite any preceding files. Usually Spoiler_Image or file.png.

The other issue is Windows specific and I've managed to solve it at least locally by:

from django.utils.text import get_valid_filename
...snip...
                img_path = ntpath.join(directory, get_valid_filename(img))
...snip...

The issue in question, filenames like this used to make the script halt on Windows:

C:\vtai>inb4404.py -c -d -l -t https://boards.4channel.org/vt/thread/34806758
[2022-10-11 07:50:46 PM] [  9/278] vt/34806758/[sound=files.catbox.moe%2F7a2f1v.m4a]{{takanashi kiara}}, {{{1girl}}}, {begging},[[[lipstick]]],[[[lip gloss]]],kusogaki,closed eyes,{{pov}}, {{incoming kiss}}, close-up, {{horny}}, blushing, solo, puffy sleeves, orange skirt, aqua choker, orange hair, ba.png
Traceback (most recent call last):
  File "C:\vtai\inb4404.py", line 171, in <module>
    main()
  File "C:\vtai\inb4404.py", line 31, in main
    download_thread(thread, args)
  File "C:\vtai\inb4404.py", line 112, in download_thread
    with open(img_path, 'wb') as f:
OSError: [Errno 22] Invalid argument: 'C:\\vtai\\downloads\\vt\\34806758\\[sound=files.catbox.moe%2F7a2f1v.m4a]{{takanashi kiara}}, {{{1girl}}}, {begging},[[[lipstick]]],[[[lip gloss]]],kusogaki,closed eyes,{{pov}}, {{incoming kiss}}, close-up, {{horny}}, blushing, solo, puffy sleeves, orange skirt, aqua choker, orange hair, ba.png'

After importing it appears to work.

And a gigabrain fix for my other issue:

                img_path = ntpath.join(directory, str(regex_result_cnt) + "-" + get_valid_filename(img))

Thanks a lot for contributing! I'm a little confused by how you said that you have no idea of how to fix the issue, but you showed two fixes for something!
If you want me to include these snippets of code in the program it should be pretty easy.

I was going to open a PR myself, but didn't because the first solution isn't quite what I'd hoped. It's a little bit too aggressive with the filenames. The second one works great though. Happily running it locally now with both changes plus all os.path's changed to ntpath. And the block of code that runs the downloaded/new folder copy tossed out.

Oh, and there's definitely a way to properly parse Spoiler_Image filenames since 4chanx and archive sites seem to do it, but I wouldn't know where to start for that one.

I added your call of get_valid_filename to the script and added the issue of duplicate filenames as a TODO to the README.