How to extract files in html

Question

How to extract files in html

wankio opened this issue 2 years ago · comments

With mht i can still extract files inside it, but when i tried to extract html created by this app, it can't

Sunshine · Answer 1 · Thu Sep 08 2022 20:01:23 GMT+0800 (China Standard Time)

could you elaborate a little more, what do you mean exactly by "extract"? Are you talking about going into the source of the saved HTML page and copying strings from there? They are base64 data URLs, some editors highlight them to be clickable. Also, if you open it in the browser, you should be able to "save image as..." and access embedded assets via inspection tool.

wankio · Answer 2 · Sat Oct 01 2022 10:52:55 GMT+0800 (China Standard Time)

before, i was using WebScrapBook, and some mht extension that allowed me to download page and save it. I can easily right click on the final file and use 7zip or winrar to extract the content from saved file. With that i can check which files is missing

with monolith i feel like with CLI, it archived web much faster but it lack of

ability for extract
automatic set download dir as domain from urls
https://github.com/Y2Z/monolith/issues/311 > saved as github.com dir or github.com/Y2Z/monolith/issues/311/pagetitle.extension (filename as page title)