oduwsdl / hypercane

A toolkit for developing algorithms that sample mementos from a web archive collection.

Home Page:https://oduwsdl.github.io/hypercane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add functionality to synthesize warcs from archive.today

lesleyodu opened this issue · comments

Feeding the mementos from a timemap generated by memgator to the "synthesize warcs" action in hypercane results in exceptions for mementos from archive.today. There appears to be a captcha.

This turned out to be far more complicated than we expected. @lesleyodu -- could you summarize our email conversation as a comment here so we have it available when I can work on Hypercane again. Thanks.

  • Captcha does not appear when using Hypercane from an archive.today approved research whitelisted network
  • archive.today zip files with original resources are no longer available
  • HTML: need to recreate head (title ok) and undo rewriting in body to mimic raw memento capability of other archives
  • Replace otmt raw memento calculations with MementoEmbed