Errors while running cell5 In V2

Question

Errors while running cell5 In V2

JohnDavid07 opened this issue 4 years ago · comments

Svsaikiran2457183 · Answer 1 · Wed Jul 29 2020 00:06:00 GMT+0800 (China Standard Time)

``exclusions = ['__MACOSX/']

destination = "/content/drive/My Drive/"
download_tasks = [
{
'folder': 'gdppci',
'url': 'https://........................workers.dev/0:/..................................' (something private url)
},
]

print('##################################')
print('# Crawling all downloadable urls #')
print('##################################', end='\n\n')
tasks = []
for task in download_tasks:
tasks += crawler_v2(task['url'], [], os.path.join(destination, task['folder']), 0, exclusions, verbose=False)

print(json.dumps(tasks, indent=2), end='\n\n')

total_size = get_filesize(sum([int(task['size']) for task in tasks]))

print(json.dumps(tasks, indent=2))
print('\nTotal Task:', len(tasks))
print('Total size: %.3fGB' % total_size, end='\n\n')``

Svsaikiran2457183 · Answer 2 · Wed Jul 29 2020 00:08:35 GMT+0800 (China Standard Time)

Can you please give/post a guide on how to use.

NullBruce · Answer 3 · Sat Aug 22 2020 22:34:20 GMT+0800 (China Standard Time)

@atlonxp can you please look into this? i can't find the problem with "tasks"

atlonxp · Answer 4 · Sun Aug 23 2020 04:49:52 GMT+0800 (China Standard Time)

@JohnDavid07 @NullBruce could you provide me the goindex link I will try when I have time

NullBruce · Answer 5 · Sun Aug 23 2020 21:45:17 GMT+0800 (China Standard Time)

@atlonxp literally any link.

Crawling all downloadable urls #
##################################

https://tutnetflix.mlwdl.workers.dev/FrontEndMasters%20-%20Complete%20Intro%20to%20Containers/
retry #2 https://tutnetflix.mlwdl.workers.dev/FrontEndMasters%20-%20Complete%20Intro%20to%20Containers/
retry #3 https://tutnetflix.mlwdl.workers.dev/FrontEndMasters%20-%20Complete%20Intro%20to%20Containers/
retry #4 https://tutnetflix.mlwdl.workers.dev/FrontEndMasters%20-%20Complete%20Intro%20to%20Containers/
retry #5 https://tutnetflix.mlwdl.workers.dev/FrontEndMasters%20-%20Complete%20Intro%20to%20Containers/

Data is missing! change a plan -
use terminal CURL -
Nah, something went wrong!

JSONDecodeError Traceback (most recent call last)

in crawler_v2(url, downloading_dict, path, level, exclusions, verbose)
55 response = os.popen("curl --globoff {} -d ''".format(url.geturl())).read()
---> 56 response_json = json.loads(response)
57 except Exception as e:

4 frames

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)

in crawler_v2(url, downloading_dict, path, level, exclusions, verbose)
57 except Exception as e:
58 print('Nah, something went wrong!')
---> 59 print(e.args())
60 return []
61 except Exception as e:

TypeError: 'tuple' object is not callable

atlonxp · Answer 6 · Mon Aug 24 2020 00:54:47 GMT+0800 (China Standard Time)

Huh! You don't seem to aware that tutflix (aka tutnetflix) has been banned from Cloudflare. The links you provided were not available long ago.

Easy way to check if the link working is to visit the GoIndex website.

if it displays its contents --> it is working
if it does not display anything, just loading progress toolbar --> not working at all.

NullBruce · Answer 7 · Mon Aug 24 2020 20:43:16 GMT+0800 (China Standard Time)

@atlonxp here's a link that doesn't work, also i tried with multiple ones that are up.

#################################

Crawling all downloadable urls

##################################

https://manga.td-index.workers.dev/0:/
retry #2 https://manga.td-index.workers.dev/0:/
retry #3 https://manga.td-index.workers.dev/0:/
retry #4 https://manga.td-index.workers.dev/0:/
retry #5 https://manga.td-index.workers.dev/0:/

Data is missing! change a plan -
use terminal CURL -
Nah, something went wrong!