agourlay / dlm

Minimal HTTP download manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] dlm doesn't use the original file name of redirected URL

chromer030 opened this issue · comments

user@user ~/Desktop % ./dlm -i dlm.txt -M 1 -o . --proxy http://127.0.0.1:10809 -U 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
[2022-02-02 16:24:30] Starting dlm with at most 1 concurrent downloads
[2022-02-02 16:24:30] Found 1 URLs in input file dlm.txt
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0/1
get_video?id=Jw9Bym8mVafjmoX&expire [00:00:29] [>---------------------------------------] 9.00MiB/390.31MiB (speed:810.02KiB/s) (eta:10m)

dlm use the URL as the filename, if the file is from a redirected URL.
as you see in output get_video?id=Jw9Bym8mVafjmoX&expire is URL not file name.

Was the original URL in the input file something like http://domain.com/get_video?id=Jw9Bym8mVafjmoX&expire?
That is, the actual filename is not part of the URL.

Another site for test :

Original URL : https://code.visualstudio.com/sha/download?build=stable&os=linux-x64

Redirected URL : https://az764295.vo.msecnd.net/stable/899d46d82c4c95423fb7e10e68eba52050e30ba3/code-stable-x64-1639562789.tar.gz

I tested others tools like aria2c , They save the file based on redirected URL and with proper filename : code-stable-x64-1639562789.tar.gz

Or explicitly in browsers like Chrome, clicking on Original URL will give us the file with proper filename.

but dlm saves it with : download?build=stable&os=linux-x64part which is part of Original URL , in RAW mode, even without file extension.

Another site :

https://atom.io/download/deb

💐Thanks a Million for all your efforts to improving this useful tool.💐

Thanks for the examples 👍

For https://code.visualstudio.com/sha/download?build=stable&os=linux-x64 I am able to retrieve the proper file name via a HEAD query which contains the following header

"content-disposition": "attachment; filename=\"code-stable-x64-1639562789.tar.gz\""

So I can work this out.

However for https://atom.io/download/deb there is no info in the header, I don't know where the filename comes from.

Those are the headers retrieved:

{"connection": "keep-alive", "content-length": "137468500", "content-type": "application/octet-stream", "content-md5": "MjEM8AbNi9zFmSmtOmtYKw==", "last-modified": "Tue, 27 Jul 2021 05:51:40 GMT", "etag": "0x8D950C29B079281", "server": "Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0", "x-ms-request-id": "bc5c0b65-901e-0009-6a02-1967b6000000", "x-ms-version": "2009-09-19", "x-ms-lease-status": "unlocked", "x-ms-blob-type": "BlockBlob", "accept-ranges": "bytes", "age": "833", "date": "Thu, 03 Feb 2022 19:35:57 GMT", "via": "1.1 varnish", "x-served-by": "cache-hhn4028-HHN", "x-cache": "HIT", "x-cache-hits": "0", "x-timer": "S1643899557.434528,VS0,VE1"}

I will at least implement a fix for the first case.

Fix for the first case implemented in ec74c4d
It is released as part of 0.2.4

I do not know yet how to handle the second redirect case.

I will test the other cases and report here.

Yet it's not able to detect and assign correct filename for some cases :

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0/1
get_video?id=1xR4wp8e8DtdwW&expires [00:00:10] [>---------------------------------------] 7.25MiB/311.03MiB (speed:732.60KiB/s) (eta:7m)

above case is : https://streamtape.com/

I'm wondering how aria2c can detect and assign the name of any redirected URL 🤔 , even https://atom.io/download/deb or https://streamtape.com/

output of aria2c (i hope it could help) :

user@user ~/Desktop % aria2c 'https://atom.io/download/deb'

02/04 14:09:26 [NOTICE] Downloading 1 item(s)
[#98b0bb 0B/0B CN:1 DL:0B]                                                                                                                                                                         
02/04 14:09:28 [NOTICE] CUID#7 - Redirecting to https://atom-installer.github.com/v1.58.0/atom-amd64.deb?s=1627025597&ext=.deb
[#98b0bb 896KiB/131MiB(0%) CN:4 DL:0B]                                                                                                                                                             

02/04 14:09:34 [NOTICE] Download GID#98b0bbd9992214d0 not complete: /home/user/Desktop/atom-amd64.deb

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
98b0bb|INPR|       0B/s|/home/user/Desktop/atom-amd64.deb

Thanks for the investigation, I will have to investigate the source code of aria2c.

I suspect it uses the Location header before redirection.

curl -i https://atom.io/download/deb 
HTTP/1.1 302 Found
Status: 302 Found
Location: https://atom-installer.github.com/v1.58.0/atom-amd64.deb?s=1627025597&ext=.deb

Another test case : https://download.mozilla.org/?product=firefox-latest-ssl&os=osx&lang=en-US

Result :

user@user ~/Desktop/ % ./dlm -i dlm.txt -M 1 -o . --proxy http://127.0.0.1:10809 -U 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'
[2022-02-04 14:29:13] Starting dlm with at most 1 concurrent downloads
[2022-02-04 14:29:13] Found 1 URLs in input file d.txt
[2022-02-04 14:29:17] Could not determine file extension for https://download.mozilla.org/?product=firefox-latest-ssl&os=osx&lang=en-US
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0/1
?product=firefox-latest-ssl&os=osx& [00:00:11] [>---------------------------------------] 814.95KiB/120.90MiB (speed:63.31KiB/s) (eta:32m)

Current output name : ?product=firefox-latest-ssl&os=osx&lang=en-US.part

True desired name : Firefox 96.0.3.dmg

aria2c output :

user@user ~ % aria2c 'https://download.mozilla.org/?product=firefox-latest-ssl&os=osx&lang=en-US'

02/04 14:33:58 [NOTICE] Downloading 1 item(s)
[#fe3133 0B/0B CN:1 DL:0B]                                                                                                                                  
02/04 14:33:59 [NOTICE] CUID#7 - Redirecting to https://download-installer.cdn.mozilla.net/pub/firefox/releases/96.0.3/mac/en-US/Firefox%2096.0.3.dmg                                                                                                             
[#fe3133 16KiB/120MiB(0%) CN:3 DL:60KiB ETA:34m10s]                                                                                                         
                                                                                            
02/04 14:34:03 [NOTICE] Download GID#fe313386b4ecf94a not complete: /home/user/Firefox 96.0.3.dmg

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
fe3133|ERR |   369KiB/s|/home/user/Firefox 96.0.3.dmg

I pushed a fix for using the Location header if possible in case of redirect.
You can give it a try in 0.2.5.

Test result of 0.2.5 :

StreamTape : SUCCESSFUL name assigning ✅
VSCode : SUCCESSFUL name assigning ✅
Atom : as atom-amd64.deb?s=1627025597&ext=.part but acceptable ✅
Firefox : SUCCESSFUL name assigning ✅

I think the issue is closed, dlm working well in this feature, really thanks a million for your efforts.

Great to hear 👍

The case for atom-amd64.deb?s=1627025597&ext=.part would not be super clean to handle and it does not look standard.
Happy to leave it as it is.