RoyalRoad Inserting Invisitext Into Chapters

Question

RoyalRoad Inserting Invisitext Into Chapters

Chirishman opened this issue 6 months ago · comments

Some time in the last week or so Royal Road started inserting "this work has been stolen" paragraphs with what looks like some kind of randomly generated alphanumeric class tag to hide them with CSS.

I've seen authors manually slip these into their works before when they're having issues but as far as I'm aware this is the first time I'm seeing it inserted by the site itself. I twigged that this was the site and not the authors when I noticed that the message was word-for-word identical in new chapters of stories by different authors. Interestingy the location where the watermark message appears in a given chapter seems to be deterministic even if the class tag value is randomized.

It's irritating when listening to the stories via TTS especially since it occurs at least once in every chapter but it's not unusable if nothing is changed. I'm not sure if you'll want to write a defeat for this or just mark it as known but either way here's what I've found:

As viewed in a normal browser

As seen when inspecting the source of an individual chapter in a browser

and the source of the same snip of the resulting ebook

Jim Miller · Answer 1 · Tue Jan 16 2024 22:48:02 GMT+0800 (China Standard Time)

Story URL so I can see it?

Chirishman · Answer 2 · Tue Jan 16 2024 22:57:28 GMT+0800 (China Standard Time)

Story URL so I can see it?

Those screenshots are from this story but like I said it appears to be sitewide. It is present in every new chapter that I've downloaded in the last 24 or 48 hours including all of these:

https://www.royalroad.com/fiction/77821
https://www.royalroad.com/fiction/63759
https://www.royalroad.com/fiction/77820
https://www.royalroad.com/fiction/47826
https://www.royalroad.com/fiction/63762

All by different authors, all with identical injected text with a randomized class name in the <p> tag that rolls over every time the page is loaded fresh.

Jim Miller · Answer 3 · Tue Jan 16 2024 23:33:05 GMT+0800 (China Standard Time)

I'm reluctant to implement a solution for this because I agree that authors shouldn't have their works published by others. And as such things go, this is very innocuous.

Technologically, finding all paragraphs containing a fixed string and removing them is easy. Until that string or tag is changed, that is. Then it could become an escalating race of change and counter change.

FYI, in my TTS reader, that paragraph is read aloud even with display:none on the tag.

Chirishman · Answer 4 · Tue Jan 16 2024 23:53:27 GMT+0800 (China Standard Time)

I agree on all points. I feel ambivalent a solution here too given the relative innocuousnes.

On a technical level I don't expect it to actually achieve much in the way of stopping the content theft in the long term, especially since the actual content of the message isn't randomized in any way so it's super easy for a bad actor to find and replace.

Mostly what it's going to do for my personal reading workflow is cause me to start tracking the copies of stories on SpaceBattles and Scribblehub instead of RR for the ones that are crossposted.

I mostly wanted to bring it to your attention so that there would be a record of it here so it's clear it's a [Known Issue/Won'tFix] when other people notice it. It's also worth being aware of as a possible start of a trend of RR changes that might follow, and if they increased the frequency of the insertion to more times per chapter or something the decision might need to be revisited.

And yes, TTS still reading it even when hidden by CSS has been my experience too.

Jim Miller · Answer 5 · Wed Jan 17 2024 00:10:17 GMT+0800 (China Standard Time)

Fair enough.

I'm going to leave this issue open, for the increased visibility.

MrTyton · Answer 6 · Fri Jan 19 2024 01:04:19 GMT+0800 (China Standard Time)

Just an FYI, there are multiple strings that they are adding with it - it's not the exact same text every time. I am weird about this as well - it is definitely annoying to me, but it's probably healthier for the ecosystem.

Harvey Barnes · Answer 7 · Sat Jan 20 2024 03:05:39 GMT+0800 (China Standard Time)

Agreed with @MrTyton - It's most likely healthier for those authors using the platform, but it's highly irritating.
And they enter the scenario where thieves will still find a way around it, so it just begins to harm "regular" users.

I've noticed that beside increasing the variability of the messages, they are also now putting a class onto every single

tag in the code, to make it more difficult. Currently it can be curated out as only 1 of those

tags has a class within the style tag that says "display:none".

grenskul · Answer 8 · Sat Jan 20 2024 08:39:05 GMT+0800 (China Standard Time)

All this does is irritate the people using tts and other accessibility functions .
I say that if this does turn into a game of cat and mouse we should go as often as they do . I very much doubt they'll go nuclear and this causes a significantly worse experience specially to the disabled.
As of right now you have a solution to the current implementation . I say merge it ( or remake it and merge cause my code is pretty terrible) and if they change how it works , just keep patching fixes.

Edocsil · Answer 9 · Sat Jan 20 2024 14:22:07 GMT+0800 (China Standard Time)

Or just ignore it? It's not the end of the world to have that one line in the chapters, you can easily clean them up yourself after downloading if you really care, there's no reason for FFF to fight RR over this. And if this is affecting people with a disability while reading on the website, then it should be reported to RR. That's it. El sáb, 20 ene 2024, 1:39, grenskul ***@***.***> escribió:

…

All this does is irritate the people using tts and other accessibility function . I say that if this does turn into a game of cat and mouse we should go as often as they do . I very much doubt they'll go nuclear and this causes a significantly worse experience specially to the disabled. As of right now you have a solution to the current implementation . I say merge it ( or remake it and merge cause my code is pretty terrible) and if they change how it works , just keep patching fixes. — Reply to this email directly, view it on GitHub <#1028 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARHIRB54B72VATKMCGMBC3YPMG3LAVCNFSM6AAAAABB4WXKYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBRGQ2DMNJSG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Chirishman · Answer 10 · Thu Jan 25 2024 01:12:39 GMT+0800 (China Standard Time)

They are now inserting text inline in existing paragraphs and without spaces after periods, making the accessibility/TTS impact worse 🙄

Amatsune · Answer 11 · Thu Jan 25 2024 01:30:01 GMT+0800 (China Standard Time)

Well, I'm going through a few chapters of a story ATM and every time I come across one of these warnings I add them to my apps list of word replacement (making them effectively disappear from the body of the text for good). If anyone wants I can provide a list of the different variations I've come across so far. Maybe then we can just manually delete it from the epub on calibre... (though I don't personally know how to go about doing that.)

grenskul · Answer 12 · Thu Jan 25 2024 01:54:19 GMT+0800 (China Standard Time)

They are now inserting text inline in existing paragraphs and without spaces after periods, making the accessibility/TTS impact worse 🙄

The pr in limbo still catches these btw.

Jason Gross · Answer 13 · Thu Jan 25 2024 03:50:04 GMT+0800 (China Standard Time)

I'm reluctant to implement a solution for this because I agree that authors shouldn't have their works published by others. And as such things go, this is very innocuous.

Technologically, finding all paragraphs containing a fixed string and removing them is easy. Until that string or tag is changed, that is. Then it could become an escalating race of change and counter change.

FYI, in my TTS reader, that paragraph is read aloud even with display:none on the tag.

What about (at least) a middle road of keeping enough information around that it's possible for users who want to do post-processing to handle this? Right now, it seems that display: none is stripped from the css styling, and so there's no way to inspect the resulting epub and determine which blocks were hidden. Could the plugin at least keep all the display: none mappings in the css file it generates for the epub? (Or am I mistaken and it's just that "update epub" does not regenerate the css file, but if I redownloaded the book from scratch I'd get an updated css?)

grenskul · Answer 14 · Thu Jan 25 2024 03:53:14 GMT+0800 (China Standard Time)

I'm reluctant to implement a solution for this because I agree that authors shouldn't have their works published by others. And as such things go, this is very innocuous.
Technologically, finding all paragraphs containing a fixed string and removing them is easy. Until that string or tag is changed, that is. Then it could become an escalating race of change and counter change.
FYI, in my TTS reader, that paragraph is read aloud even with display:none on the tag.

What about (at least) a middle road of keeping enough information around that it's possible for users who want to do post-processing to handle this? Right now, it seems that display: none is stripped from the css styling, and so there's no way to inspect the resulting epub and determine which blocks were hidden. Could the plugin at least keep all the display: none mappings in the css file it generates for the epub? (Or am I mistaken and it's just that "update epub" does not regenerate the css file, but if I redownloaded the book from scratch I'd get an updated css?)

No reason you couldn't do this https://github.com/JimmXinu/FanFicFare/compare/main...grenskul:FanFicFare:main?diff=split&w= but instead of extracting setting some special flag or css on it.

Jim Miller · Answer 15 · Sat Jan 27 2024 02:39:05 GMT+0800 (China Standard Time)

As a general rule, FFF discards site styling in favor of consistent styling (as customized by the user) for all stories and sites.

Since this has been stable for a week, I've merged @grenskul's PR(#1031) and uploaded test versions in the usual places.

MrTyton · Answer 16 · Sat Jan 27 2024 03:09:26 GMT+0800 (China Standard Time)

The fix isn't working for me - Tested with https://www.royalroad.com/fiction/61228/i-will-touch-the-skies-a-pokemon-fanfiction

python3.9 -m fanficfare.cli -u "royalroad.com/fiction/61228" --update-cover --non-interactive

python3.9 -m fanficfare.cli --version
Version: 4.30.6

Jim Miller · Answer 17 · Sat Jan 27 2024 04:10:57 GMT+0800 (China Standard Time)

Hmm. I don't read RoyalRoad, so I didn't test it myself.

Probably has to do with that CSS class also containing speak: never; now. @grenskul?

grenskul · Answer 18 · Sat Jan 27 2024 04:49:19 GMT+0800 (China Standard Time)

Changed the regex for the new style . I made it thinking about possible minor changes so anything they do using this style method can be beaten within minutes by anyone that knows basic regex.
Made the PR

Jim Miller · Answer 19 · Sat Jan 27 2024 06:00:43 GMT+0800 (China Standard Time)

#1033 merged in, test versions posted again.

MrTyton · Answer 20 · Sat Jan 27 2024 06:36:29 GMT+0800 (China Standard Time)

Test version seems to work now. Thanks.

TheOne320 · Answer 21 · Sat Feb 17 2024 06:43:01 GMT+0800 (China Standard Time)

So this appears to still happen.
I just downloaded this series:
https://www.royalroad.com/fiction/78230/magical-girl-platinum

and in the the downloaded file it keeps repeating the text:
"This work was stolen from Magical Girl Platinum on Royal Road. Please message me there."

The text is not there in the normal browser page.

grenskul · Answer 22 · Sat Feb 17 2024 09:31:20 GMT+0800 (China Standard Time)

I think that was inserted by the author as it uses a different way of doing it ( style within the class vs referenced in the head by class name) You can change the regex to match that but I wouldn't go out of my way unless it shows up in other stories.

I tested this just now using their writer tool and it's 100% possible the author did it .
All he has to do is paste "<p class="cnNmN2FkMTI3NTcxZjQ3NDM5NWZkN2FkOGU1ZjFmYjIz" style="display: none">This work was stolen from Magical Girl Platinum by Samantha Nelson on Royal Road. Please message me there.</p> "
into the source code (which rrl will let you edit directly) and it exhibits the behavior you just described.