Download process doesn't work in macOS

Question

Download process doesn't work in macOS

dayvsonsales opened this issue 4 years ago · comments

Describe the bug
Hello,

I tried to use your application to download an m3u8 playlist, but it seems that the code is failing in the download step. Apparently the method sched_getaffinity doesn't exist in module os on macOS.

press ctrl+c or ctrl+z if parsed headers of type http2 False are incorrect.
 
Starting Download process MainProcess. 
Traceback (most recent call last):  
  File "/Users/dayvsonsales/m3u8-dl/core/download_process.py", line 27, in download_process. 
    download_manager = DownloadProcess(links, total_links, session, http2,  
  File "/Users/dayvsonsales/m3u8-dl/core/download_process.py", line 55, in __init__. 
    self.__process_num = 4 if platform.system() == "Windows" else   len(os.sched_getaffinity(os.getpid())). 
AttributeError: module 'os' has no attribute 'sched_getaffinity'

To Reproduce
Steps to reproduce the behavior:

python3 main.py https://<mywebsite>.com/scripts/m3us/playlist.m3u
See error

Expected behavior
I expected that my playlist should be downloaded.

Desktop (please complete the following information):

OS: macOS
Version 10.13.6
python version: Python 3.8.5

Kevin Rohan Vaz · Answer 1 · Fri Oct 02 2020 12:32:43 GMT+0800 (China Standard Time)

The fix has been made, let me know if you face another issue @dayvsonsales, if not, this issue can be closed

Dayvson Sales · Answer 2 · Fri Oct 02 2020 20:11:11 GMT+0800 (China Standard Time)

@excalibur-kvrv it works now. thank you. One more question: is the download process made entirely in memory? 'cause I downloaded a total of 730mb playlist and I noticed via top that python process was growing over and over.

Kevin Rohan Vaz · Answer 3 · Fri Oct 02 2020 20:18:46 GMT+0800 (China Standard Time)

It is designed to write the data as soon as it is downloaded (this maybe a problem if each individual chunk is big), i did notice the growing size, i'm still working on identifying the areas where the memory is growing. Btw @dayvsonsales how much download speed(mega bytes per sec) where you getting while using m3u8-dl? was it close to your internet bandwidth(mega bytes per sec)?

Dayvson Sales · Answer 4 · Fri Oct 02 2020 21:45:19 GMT+0800 (China Standard Time)

@excalibur-kvrv the download speed is fine. The only problem is memory usage, my playlists have big single files (more than 100mb usually). I have a simple script that I wrote using only curl and it was working fine (no memory usage problem), but it doesn't scale (like your script that uses 4 parallel processes).

Kevin Rohan Vaz · Answer 5 · Fri Oct 02 2020 22:04:20 GMT+0800 (China Standard Time)

100mb per file in the playlist? @dayvsonsales, then i think i know what the issue is. The playlists that i have encountered so far only contained small files(10mb max) so i had designed my program to download the entire file and then write it, the fix is quite simple i will just need to write the data in chunks. it'll take a few hours to fix.

Dayvson Sales · Answer 6 · Fri Oct 02 2020 22:41:38 GMT+0800 (China Standard Time)

@excalibur-kvrv I think that I solved my problem. Inspecting the fetch.py file, I noticed that you use session.get without passing the stream option. So, I added these options and deal with the chunks, writing to file_path file, using python's default file system (not your write_file_no_gil). The code is below:

with session.get(download_url, timeout=timeout, stream=True) as r:
            r.raise_for_status()
            
            if r.status_code == 302:
                r = redirect_handler(session, r.content)
        
            with open(file_path, "wb") as f: 
                for chunk in r.iter_content(1024):
                    if not chunk:
                        break
                    f.write(chunk)

The memory usage seems now littler than before.

But theres a check that I had to ignore:

if type(request_data) == bytes:
        data = request_data
    else:
        data = request_data.content

I don't know what you were trying to do with this type check. Could you explain to me, please?

Kevin Rohan Vaz · Answer 7 · Fri Oct 02 2020 22:53:11 GMT+0800 (China Standard Time)

The type check was simply for compatiblity, in the event redirect_handler were to run since it was returning bytes. The if else would ensure that it wouldn't be calling .content on a bytes object, but run it on a response object. Also try experimenting with the amount of bytes passed into r.iter_content since if you pass a small amount it would increase the overall file write time, the file writting to the os is faster if it's passed a larger value. The custom write_file_no_gil was to ensure faster write time by taking advantage of the fact that the gil gets dropped.

Kevin Rohan Vaz · Answer 8 · Fri Oct 02 2020 22:53:45 GMT+0800 (China Standard Time)

Nice so @dayvsonsales, i take it your issue has been resolved?

Dayvson Sales · Answer 9 · Fri Oct 02 2020 22:59:35 GMT+0800 (China Standard Time)

@excalibur-kvrv it solved. Just to clarify, if redirect_data returns bytes there's no iter_content so? Is this right? Cause removing this check could cause more problems, I think. I'll make a pull request just to history the code in this issue. But I think it should be more investigated before merge it.

Kevin Rohan Vaz · Answer 10 · Fri Oct 02 2020 23:10:51 GMT+0800 (China Standard Time)

Well if you were to remove the type check, it would cause a lot of problems for whenever the redirect handler were to run. But you are on the right path, with a few more changes and a bit of restructuring the code your fix would work, i'll take a look and notify you of the changes that you need to make. Oh and do ensure that your code passes codacy checks.