JimmXinu / FanFicFare

FanFicFare is a tool for making eBooks from stories on fanfiction and other web sites.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RoyalRoad Inserting Invisitext Into Chapters

Chirishman opened this issue · comments

Some time in the last week or so Royal Road started inserting "this work has been stolen" paragraphs with what looks like some kind of randomly generated alphanumeric class tag to hide them with CSS.

I've seen authors manually slip these into their works before when they're having issues but as far as I'm aware this is the first time I'm seeing it inserted by the site itself. I twigged that this was the site and not the authors when I noticed that the message was word-for-word identical in new chapters of stories by different authors. Interestingy the location where the watermark message appears in a given chapter seems to be deterministic even if the class tag value is randomized.

It's irritating when listening to the stories via TTS especially since it occurs at least once in every chapter but it's not unusable if nothing is changed. I'm not sure if you'll want to write a defeat for this or just mark it as known but either way here's what I've found:

As viewed in a normal browser

image

As seen when inspecting the source of an individual chapter in a browser

image

image

and the source of the same snip of the resulting ebook

image

Story URL so I can see it?

Story URL so I can see it?

Those screenshots are from this story but like I said it appears to be sitewide. It is present in every new chapter that I've downloaded in the last 24 or 48 hours including all of these:

https://www.royalroad.com/fiction/77821
https://www.royalroad.com/fiction/63759
https://www.royalroad.com/fiction/77820
https://www.royalroad.com/fiction/47826
https://www.royalroad.com/fiction/63762

All by different authors, all with identical injected text with a randomized class name in the <p> tag that rolls over every time the page is loaded fresh.

I'm reluctant to implement a solution for this because I agree that authors shouldn't have their works published by others. And as such things go, this is very innocuous.

Technologically, finding all paragraphs containing a fixed string and removing them is easy. Until that string or tag is changed, that is. Then it could become an escalating race of change and counter change.

FYI, in my TTS reader, that paragraph is read aloud even with display:none on the tag.

I agree on all points. I feel ambivalent a solution here too given the relative innocuousnes.

On a technical level I don't expect it to actually achieve much in the way of stopping the content theft in the long term, especially since the actual content of the message isn't randomized in any way so it's super easy for a bad actor to find and replace.

Mostly what it's going to do for my personal reading workflow is cause me to start tracking the copies of stories on SpaceBattles and Scribblehub instead of RR for the ones that are crossposted.

I mostly wanted to bring it to your attention so that there would be a record of it here so it's clear it's a [Known Issue/Won'tFix] when other people notice it. It's also worth being aware of as a possible start of a trend of RR changes that might follow, and if they increased the frequency of the insertion to more times per chapter or something the decision might need to be revisited.

And yes, TTS still reading it even when hidden by CSS has been my experience too.

Fair enough.

I'm going to leave this issue open, for the increased visibility.

Just an FYI, there are multiple strings that they are adding with it - it's not the exact same text every time. I am weird about this as well - it is definitely annoying to me, but it's probably healthier for the ecosystem.

Agreed with @MrTyton - It's most likely healthier for those authors using the platform, but it's highly irritating.
And they enter the scenario where thieves will still find a way around it, so it just begins to harm "regular" users.

I've noticed that beside increasing the variability of the messages, they are also now putting a class onto every single

tag in the code, to make it more difficult. Currently it can be curated out as only 1 of those

tags has a class within the style tag that says "display:none".

All this does is irritate the people using tts and other accessibility functions .
I say that if this does turn into a game of cat and mouse we should go as often as they do . I very much doubt they'll go nuclear and this causes a significantly worse experience specially to the disabled.
As of right now you have a solution to the current implementation . I say merge it ( or remake it and merge cause my code is pretty terrible) and if they change how it works , just keep patching fixes.

They are now inserting text inline in existing paragraphs and without spaces after periods, making the accessibility/TTS impact worse 🙄

image

image

Well, I'm going through a few chapters of a story ATM and every time I come across one of these warnings I add them to my apps list of word replacement (making them effectively disappear from the body of the text for good). If anyone wants I can provide a list of the different variations I've come across so far. Maybe then we can just manually delete it from the epub on calibre... (though I don't personally know how to go about doing that.)

They are now inserting text inline in existing paragraphs and without spaces after periods, making the accessibility/TTS impact worse 🙄

image

image

The pr in limbo still catches these btw.

I'm reluctant to implement a solution for this because I agree that authors shouldn't have their works published by others. And as such things go, this is very innocuous.

Technologically, finding all paragraphs containing a fixed string and removing them is easy. Until that string or tag is changed, that is. Then it could become an escalating race of change and counter change.

FYI, in my TTS reader, that paragraph is read aloud even with display:none on the tag.

What about (at least) a middle road of keeping enough information around that it's possible for users who want to do post-processing to handle this? Right now, it seems that display: none is stripped from the css styling, and so there's no way to inspect the resulting epub and determine which blocks were hidden. Could the plugin at least keep all the display: none mappings in the css file it generates for the epub? (Or am I mistaken and it's just that "update epub" does not regenerate the css file, but if I redownloaded the book from scratch I'd get an updated css?)

I'm reluctant to implement a solution for this because I agree that authors shouldn't have their works published by others. And as such things go, this is very innocuous.
Technologically, finding all paragraphs containing a fixed string and removing them is easy. Until that string or tag is changed, that is. Then it could become an escalating race of change and counter change.
FYI, in my TTS reader, that paragraph is read aloud even with display:none on the tag.

What about (at least) a middle road of keeping enough information around that it's possible for users who want to do post-processing to handle this? Right now, it seems that display: none is stripped from the css styling, and so there's no way to inspect the resulting epub and determine which blocks were hidden. Could the plugin at least keep all the display: none mappings in the css file it generates for the epub? (Or am I mistaken and it's just that "update epub" does not regenerate the css file, but if I redownloaded the book from scratch I'd get an updated css?)

No reason you couldn't do this https://github.com/JimmXinu/FanFicFare/compare/main...grenskul:FanFicFare:main?diff=split&w= but instead of extracting setting some special flag or css on it.

As a general rule, FFF discards site styling in favor of consistent styling (as customized by the user) for all stories and sites.

Since this has been stable for a week, I've merged @grenskul's PR(#1031) and uploaded test versions in the usual places.

The fix isn't working for me - Tested with https://www.royalroad.com/fiction/61228/i-will-touch-the-skies-a-pokemon-fanfiction

python3.9 -m fanficfare.cli -u "royalroad.com/fiction/61228" --update-cover --non-interactive

python3.9 -m fanficfare.cli --version
Version: 4.30.6

Hmm. I don't read RoyalRoad, so I didn't test it myself.

Probably has to do with that CSS class also containing speak: never; now. @grenskul?

Changed the regex for the new style . I made it thinking about possible minor changes so anything they do using this style method can be beaten within minutes by anyone that knows basic regex.
Made the PR

#1033 merged in, test versions posted again.

Test version seems to work now. Thanks.

So this appears to still happen.
I just downloaded this series:
https://www.royalroad.com/fiction/78230/magical-girl-platinum

and in the the downloaded file it keeps repeating the text:
"This work was stolen from Magical Girl Platinum on Royal Road. Please message me there."

The text is not there in the normal browser page.

I think that was inserted by the author as it uses a different way of doing it ( style within the class vs referenced in the head by class name) You can change the regex to match that but I wouldn't go out of my way unless it shows up in other stories.

I tested this just now using their writer tool and it's 100% possible the author did it .
All he has to do is paste "<p class="cnNmN2FkMTI3NTcxZjQ3NDM5NWZkN2FkOGU1ZjFmYjIz" style="display: none">This work was stolen from Magical Girl Platinum by Samantha Nelson on Royal Road. Please message me there.</p> "
into the source code (which rrl will let you edit directly) and it exhibits the behavior you just described.