Synthesize warc using regular vs raw stream
lesleyodu opened this issue · comments
The synthesize warc command will unintentionally switch back to the original stream instead of the raw stream. The bug seems to be resolved by making deep copies of all variables from the original stream.
Affected lines in hypercane/hypercane/synthesize/warcs.py:
76 - headers_list = copy.deepcopy(resp.raw.headers.items())
81 - warc_target_uri = str(resp.links[link]['url'])
88 - mdt = str(resp.headers['memento-datetime'])
Thank you for this. I'll fix it soon.
Update - Unfortunately I am seeing that hypercane is still switching streams for just some warcs even with these changes - will let you know if I find more code edits to make to fix this issue.
Add after line 60 in syntheisze/warcs.py:
if 'rel' in link.attrs and 'stylesheet' in link['rel']: