dipu-bd / lightnovel-crawler

Generate and download e-books from online sources.

Home Page:https://pypi.org/project/lightnovel-crawler/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Novelbin

SirGryphin opened this issue · comments

This happens on all novelbin sites ovelbin.me, novelbin.com, novelbin.net. Any of the multiple forks of this site. As I've now noticed when you actually open a chapter the books are all hosted on novelbin.novel-online.org.

The Problem

It's the Fancy Text, if you don't know what it is just a quick google and you will see what I mean.

So, if you are talking about fancy symbols such as '𝓐𝓑𝓒' etc. then those are Unicode characters. There are a large number of Unicode Characters out of which some are Mathematical, Latin, etc. that are looked at as fancy text symbols.

The site has this feature where if you scrape it using lncrawler or other it adds this text with watermark all over chapters randomly. You can see it on site if your quick enough, when you click on chapter when it loading just look out for a quick flash of text that disappears. I don't know how it works but it's annoying.

Solution so far

I have to use find and replace in sigil and do with for " and '. Also a few other characters and then run this regex [^\x00-\x7F]+ to find any others and remove them.

Help

I wish there was a way to remove these before epub is made or if there is a way to block site adding it I don't know, Not even sure if this is something anyone can fix. It's just novelbin always has latest chapters I don't know if there is a better site with clean chapters.

Can’t we use a «cleaner» for this?

Similar to self.cleaner.bad_css.update([".thumbnail"])?

Or prepare a similar tool for this.

@CryZFix I think using cleaner should fix this.