Download process doesn't work in macOS
dayvsonsales opened this issue · comments
Describe the bug
Hello,
I tried to use your application to download an m3u8 playlist, but it seems that the code is failing in the download step. Apparently the method sched_getaffinity
doesn't exist in module os
on macOS.
press ctrl+c or ctrl+z if parsed headers of type http2 False are incorrect.
Starting Download process MainProcess.
Traceback (most recent call last):
File "/Users/dayvsonsales/m3u8-dl/core/download_process.py", line 27, in download_process.
download_manager = DownloadProcess(links, total_links, session, http2,
File "/Users/dayvsonsales/m3u8-dl/core/download_process.py", line 55, in __init__.
self.__process_num = 4 if platform.system() == "Windows" else len(os.sched_getaffinity(os.getpid())).
AttributeError: module 'os' has no attribute 'sched_getaffinity'
To Reproduce
Steps to reproduce the behavior:
python3 main.py https://<mywebsite>.com/scripts/m3us/playlist.m3u
- See error
Expected behavior
I expected that my playlist should be downloaded.
Desktop (please complete the following information):
- OS: macOS
- Version 10.13.6
- python version: Python 3.8.5
The fix has been made, let me know if you face another issue @dayvsonsales, if not, this issue can be closed
@excalibur-kvrv it works now. thank you. One more question: is the download process made entirely in memory? 'cause I downloaded a total of 730mb playlist and I noticed via top
that python process was growing over and over.
It is designed to write the data as soon as it is downloaded (this maybe a problem if each individual chunk is big), i did notice the growing size, i'm still working on identifying the areas where the memory is growing. Btw @dayvsonsales how much download speed(mega bytes per sec) where you getting while using m3u8-dl? was it close to your internet bandwidth(mega bytes per sec)?
@excalibur-kvrv the download speed is fine. The only problem is memory usage, my playlists have big single files (more than 100mb usually). I have a simple script that I wrote using only curl
and it was working fine (no memory usage problem), but it doesn't scale (like your script that uses 4 parallel processes).
100mb per file in the playlist? @dayvsonsales, then i think i know what the issue is. The playlists that i have encountered so far only contained small files(10mb max) so i had designed my program to download the entire file and then write it, the fix is quite simple i will just need to write the data in chunks. it'll take a few hours to fix.
@excalibur-kvrv I think that I solved my problem. Inspecting the fetch.py
file, I noticed that you use session.get
without passing the stream
option. So, I added these options and deal with the chunks, writing to file_path file, using python's default file system (not your write_file_no_gil). The code is below:
with session.get(download_url, timeout=timeout, stream=True) as r:
r.raise_for_status()
if r.status_code == 302:
r = redirect_handler(session, r.content)
with open(file_path, "wb") as f:
for chunk in r.iter_content(1024):
if not chunk:
break
f.write(chunk)
The memory usage seems now littler than before.
But theres a check that I had to ignore:
if type(request_data) == bytes:
data = request_data
else:
data = request_data.content
I don't know what you were trying to do with this type check. Could you explain to me, please?
The type check was simply for compatiblity, in the event redirect_handler were to run since it was returning bytes. The if else would ensure that it wouldn't be calling .content on a bytes object, but run it on a response object. Also try experimenting with the amount of bytes passed into r.iter_content since if you pass a small amount it would increase the overall file write time, the file writting to the os is faster if it's passed a larger value. The custom write_file_no_gil was to ensure faster write time by taking advantage of the fact that the gil gets dropped.
Nice so @dayvsonsales, i take it your issue has been resolved?
@excalibur-kvrv it solved. Just to clarify, if redirect_data returns bytes there's no iter_content so? Is this right? Cause removing this check could cause more problems, I think. I'll make a pull request just to history the code in this issue. But I think it should be more investigated before merge it.
Well if you were to remove the type check, it would cause a lot of problems for whenever the redirect handler were to run. But you are on the right path, with a few more changes and a bit of restructuring the code your fix would work, i'll take a look and notify you of the changes that you need to make. Oh and do ensure that your code passes codacy checks.