JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Downloads from scifistories.com and other WLPC sites fail after title page

jabofh opened this issue · comments

I've tried the same credentials from my browser and it works flawlessly. On Friday (I think) it still worked without any problem, now it starts and fails before it parses any chapters.

I've also tried other WLPC sites and they fail at the same stage.

As for SOL specifically, updates fail, too. The log window says:

Status | Title | Author | Comment | URL
-- | -- | -- | -- | --
Skipped | AMA: The Boyfriend | BreaktheBar | Existing epub contains 147 chapters, web site only has 1. Use Overwrite or force_update_epub_always to force update. | https://storiesonline.net/s/31154/ama-the-boyfriend/i

Relevant parts from the debug log:

FF: DEBUG: 2023-11-19 14:10:58,777: story.py(737): use_flaresolverr_proxy:
FFF: DEBUG: 2023-11-19 14:11:00,874: calibre_plugins.fanficfare_plugin.fff_plugin(1125): FanFicFare v4.29.0
FFF: INFO: 2023-11-19 14:11:01,033: calibre_plugins.fanficfare_plugin.prefs(216): Using default settings
FFF: DEBUG: 2023-11-19 14:11:01,058: story.py(737): use_flaresolverr_proxy:
FFF: DEBUG: 2023-11-19 14:11:01,069: configurable.py(1078): use_browser_cache:
FFF: DEBUG: 2023-11-19 14:11:01,070: configurable.py(1098): use_basic_cache:true
FFF: DEBUG: 2023-11-19 14:11:01,072: adapter_storiesonlinenet.py(155): URL: https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:01,072: cache_basic.py(116): 
========== MISS (GET) BasicCache
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:01,072: fetcher_requests.py(114): 
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:01,929: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:01,929: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:01,930: decorators.py(123): random sleep(0.50-1.50):1.00
FFF: DEBUG: 2023-11-19 14:11:02,930: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:02,931: adapter_storiesonlinenet.py(116): Will now login to URL (https://storiesonline.net/sol-secure/login.php) as (***@***.***)
FFF: DEBUG: 2023-11-19 14:11:02,931: cache_basic.py(116): 
========== MISS (GET) BasicCache
https://storiesonline.net/sol-secure/login.php
FFF: DEBUG: 2023-11-19 14:11:02,931: fetcher_requests.py(114): 
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/sol-secure/login.php
FFF: DEBUG: 2023-11-19 14:11:03,809: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:03,810: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:03,810: decorators.py(123): random sleep(0.50-1.50):0.86
FFF: DEBUG: 2023-11-19 14:11:04,673: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:04,690: cache_basic.py(116): 
========== MISS (POST) BasicCache
https://login.wlpc.com/index.php
FFF: DEBUG: 2023-11-19 14:11:04,690: fetcher_requests.py(114): 
---------- REQ (POST) RequestsFetcher
https://login.wlpc.com/index.php
FFF: DEBUG: 2023-11-19 14:11:05,008: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:05,008: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:05,008: decorators.py(123): random sleep(0.50-1.50):1.12
FFF: DEBUG: 2023-11-19 14:11:06,128: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:06,129: cache_basic.py(116): 
========== MISS (GET) BasicCache
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:06,130: fetcher_requests.py(114): 
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/s/31154/ama-the-boyfriend
FFF: DEBUG: 2023-11-19 14:11:10,546: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:10,547: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:10,547: decorators.py(123): random sleep(0.50-1.50):0.77
FFF: DEBUG: 2023-11-19 14:11:11,314: requestable.py(55): Encoding:utf8
FFF: INFO: 2023-11-19 14:11:11,331: adapter_storiesonlinenet.py(189): use url: https://storiesonline.net/s/31154/ama-the-boyfriend/i?ind=1
FFF: DEBUG: 2023-11-19 14:11:11,331: cache_basic.py(116): 
========== MISS (GET) BasicCache
https://storiesonline.net/s/31154/ama-the-boyfriend/i?ind=1
FFF: DEBUG: 2023-11-19 14:11:11,332: fetcher_requests.py(114): 
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/s/31154/ama-the-boyfriend/i?ind=1
FFF: DEBUG: 2023-11-19 14:11:15,639: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:15,640: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:15,641: decorators.py(123): random sleep(0.50-1.50):0.79
FFF: DEBUG: 2023-11-19 14:11:16,429: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:16,528: cache_basic.py(116): 
========== MISS (GET) BasicCache
https://storiesonline.net/a/breakthebar/1
FFF: DEBUG: 2023-11-19 14:11:16,528: fetcher_requests.py(114): 
---------- REQ (GET) RequestsFetcher
https://storiesonline.net/a/breakthebar/1
FFF: DEBUG: 2023-11-19 14:11:16,663: fetcher_requests.py(127): response code:200
FFF: DEBUG: 2023-11-19 14:11:16,663: decorators.py(112): fromcache:False
FFF: DEBUG: 2023-11-19 14:11:16,663: decorators.py(123): random sleep(0.50-1.50):0.87
FFF: DEBUG: 2023-11-19 14:11:17,538: requestable.py(55): Encoding:utf8
FFF: DEBUG: 2023-11-19 14:11:17,587: adapter_storiesonlinenet.py(318): Found story row on page 1
FFF: DEBUG: 2023-11-19 14:11:17,657: calibre_plugins.fanficfare_plugin.fff_plugin(1435): update existing id:327
bs4\builder\__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.

What's Changed

These sites have changed their story chapter URL format.

Before: https://finestories.com/s/1111:2222/chapter-1-story-title
After: https://finestories.com/s/1111/story-title/1

It looks like they've changed it for all stories, not just new ones. Removing a--presumably--unique chapter ID number (2222 in the example above) and chapter title in favor of story title and chapter number.

Fixing It

This has a straightforward fix--just look for the new chapter URLs. It's a one line change.

The Complication - Updating Existing EPUBs

But that fix alone invalidates existing chapters. When you update an existing EPUB, FFF will download all chapters again because the chapter URLs in existing EPUB chapters won't match what the site now shows.

FFF has a mechanism (adapters' normalize_chapterurl() method) to avoid that, but I don't have enough examples of 'before' stories from these sites to implement it. Is chapter-1 in the 'before' URL the chapter title as chosen by the author, or simply chapter-#?

How to Help

Can either of you look inside some pre-existing EPUBs for these sites to help answer that? Specifically, stories with chapter names that aren't just 'Chapter 1', 'Chapter 2'.

  1. Open EPUB in Edit book (assuming Calibre)
  2. In the EPUB, open file0001.xhtml (or other chapter file)
  3. Look for HTML near the top that looks like:
<meta name="chapterurl" content="XXXXXXX" />
<meta name="chapterorigtitle" content="1. Chapter 1" />
<meta name="chaptertoctitle" content="1. Chapter 1" />
<meta name="chaptertitle" content="1. Chapter 1" />

What I want to know is, what is the value for chapterurl when chapterorigtitle isn't just Chapter 1; when it's 'Prologue', 'Epilogue', 'Chapter XIV', 'Chapter 1: It all goes wrong'. Ideally, I'd like to see several examples.

Thank you! That's all I need to see.

The first case, 'Prologue', proves we can't get normalize the pre-existing chapter URLs to match the new chapter URL form.

Warning

Don't click the chapter links above. They not only don't work, they cause the site to block that story & it's chapters from loading in the same browser session.

Test versions posted in the usual places.