packt-cli / Packt-Publishing-Free-Learning

Scripts that automatically claim and download free daily eBooks from https://www.packtpub.com/packt/offers/free-learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrectly names files with special characters

RiaanvWyk opened this issue · comments

Hi there,
the folders and files created from book titles containing special characters such as "# or +" are not correctly named. For example "Learn C# in 7 days" is saved as "Learn C in 7 days" and "The Modern C++ Challenge" is saved as "The Modern C Challenge"

We are using python-slugify to convert book title to file name. According to StackOverflow the characters you are writing about are valid file name characters.

We may whitelist # and + characters so they won't be removed by python-slugify. Any more characters to whitelist?

Excellent, thanks for the prompt reply! I also concluded that it was the python-slugify package and started playing around with the source. It would be great if you could whitelist # and +. It would be nice if the period character ".", "-" and "–" is also whitelisted (if that isn't asking too much). Some of the books contain them e.g. "C# 7.1 and .NET Core 2.0 – Modern Cross-Platform Development"

Actually I forked the repository and just removed the use of slugify to keep the full filename as it is returned from Packtpub. Since it is a personal preference I don't think it is needed to make these changes on the main branch so I'm closing the ticket!

@RiaanvWyk Please, have a look at #180 - does it look OK?

@mjenczmyk Looks good, thanks!

I'll merge it and release a new patch version. Thanks for issuing this bug, I wasn't aware of it.

Amusing side effect of the change. When I did a monthly download of claimed books, it is downloading some older titles that now have different filenames with the additional characters. Hardly an issue, just an observation.

Makes sense, previously file names could easily conflict with each other.