Mati365 / upolujksiazke.pl

๐Ÿ“– Open-source platform that aggregates reviews, book ratings and brochures written in React + TypeScript + NestJS + Redis + ElasticSearch

Home Page:https://upolujksiazke.pl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

upolujksiazke.pl

Codacy Badge Website PRs Welcome

Real world open source book reviews aggregator, something like Metacritic / Digg for books. It allows to compare book price between different shops.

Screens

Filtered books
Book
Book
Book
Home
Bot

Available websites

๐Ÿ‡ต๐Ÿ‡ฑ Poland

  • Wykop.pl (#bookmeter tag)
  • Gildia.pl
  • Literatura Gildia
  • Granice.pl
  • Matras.pl
  • Bonito.pl
  • Skupszop.pl
  • Dadada.pl
  • Aros.pl
  • Publio.pl
  • Hrosskar.blogspot.com
  • krytycznymokiem.blogspot.com
  • Madbooks.pl
  • Gandalf.com.pl
  • ibuk.pl
  • Woblink.com
  • Taniaksiazka.pl
  • Bryk.pl
  • Streszczenia.pl
  • klp.pl
  • legimi.pl

To be added soon:

๐Ÿ‡ต๐Ÿ‡ฑ Poland

  • polskina5.pl
  • Virtualo.pl
  • tantis.pl
  • Znak.com.pl
  • Swiatksiazki.pl
  • wbibliotece.pl
  • Wolnelektury.pl
  • LitRes.pl
  • audible.com
  • Chodnikliteracki.pl
  • czeskieklimaty.pl
  • paskarz.pl
  • litres.pl
  • selkar.pl
  • promocjeksiazkowe.pl (Blog Post)
  • eczytanie-eksiazki.blogspot.com (Blog Post)
  • Tantis.pl
  • Gandalf.com
  • Booklips.pl
  • Allegro.pl
  • Cyfroteka.pl
  • Amazon.com
  • Nieprzeczytane.pl
  • wolnelektury.pl
  • bookbook.pl
  • nakanapie.pl
  • opracowania.pl
  • ksiegarnia-armoryka.pl

๐ŸŒ World

  • Reddit
  • Goodreads

Goals

  • wykop #ksiฤ…ลผki as blog
  • Book summary
  • Changes history
  • Mark as school reading
  • Book summary aggregation
  • Free readings download button
  • Discover all author books (links discover queue, discover all series book, all author book)
  • Add article scrapping (wykop, reddit, etc)
  • Book series tree (al'a tree box)
  • Allegro.pl / Amazon.pl / Skฤ…piec.pl price synchronization integration
  • Wikipedia style edit info proposals
  • Automatic daily summary tag posting (wykop.pl, #bookmeter tag)
  • Notifications about new reviews
  • Front page customization (pin sections)
  • Read list
  • Category books RSS
  • Price, activity diagram, notifications
  • Category filters
  • Trending books
  • Emoji reactions
  • Add comment after publishing entry on wykop.pl with links to shops, add comment to verify matched book
  • Add current user library link to wykop comment
  • Add website spiders (as separate module that appends content to redis)
  • Fb top offers bot post publish
  • product basket, compare multiple books prices in table and summarize per shop basket price
  • RSS integration
  • E-Book readers price section and reviews
  • Section: Top Books/Reviews from Wykop.pl
  • Machine learning for book (review) picking
  • Users who bought this book bought also section
  • Automatic blog posts
  • SEOLinks on blog posts / reviews
  • Tinder alternative but for books
  • Wykop charts in comment
  • Add trending stats
  • Books summaries
  • Dynamic create e-leaflets from books grouped by shop
  • Add button on availability table with "add store link" and if user adds try to parse
  • Video reviews
  • Users might create own book regals
  • allow users to add book store by configuring JSON / XML (https://news.ycombinator.com/item?id=27739568)
  • add e-leaflets
  • youtube reviews
  • add coupons
  • books cons table
  • Lookup in Empik go, Legimi

Development

Setup

cp .env.example .env # edit .env config
yarn install

yarn run migration:run
yarn run seed:run
gulp entity:reindex:all

[yarn run console]:
  await app.select(ScrapperModule).get('BookParentCategoryService').findAndAssignMissingParentCategories();
  await app.select(ScrapperModule).get('BookCategoryRankingService').refreshCategoryRanking();
  await app.select(ScrapperModule).get('BookStatsService').refreshAllBooksStats();
[/console]

yarn run develop
gulp scrapper:refresh

Remote connect

Proxy local 9201 to remote ES

ssh -g -L 9201:localhost:9200 -f -N deploy@upolujksiazke.pl

REPL

There is NestJS context present on window, it is called app. All entities are exporeted to context.

yarn console

REPL Examples

โš ๏ธ Use services to remove records! (TypeORM async callbacks are buggy)

Remove book:

app.select(ScrapperModule).get('BookService').delete([13])

Reindex all record of particular type (after index structure change or something):

app.select(ScrapperModule).get('EsBookIndex').reindexAllEntities();

Tasks

Sitemap:

gulp sitemap:refresh

Fetchers:

# Reindex all records
gulp entity:reindex:all

# Fetches single review by id
gulp scrapper:refresh:single --kind BOOK_REVIEW --remoteId 123 --website wykop.pl

# Fetches single book by url
gulp scrapper:refresh:single --remoteId szepty-spoza-nicosci-remigiusz-mroz,p697692.html --website www.publio.pl

# Fetches all reviews from scrapper
gulp scrapper:refresh:all --kind BOOK_REVIEW --website wykop.pl

# Refreshes only first remote reviews page using all scrappers
gulp scrapper:refresh:latest --kind BOOK_REVIEW
gulp scrapper:refresh:latest --kind BOOK_REVIEW --website wykop.pl

# Fetches all reviews pages from websites using all scrappers
gulp scrapper:refresh:all --kind BOOK_REVIEW

# Fetches missing favicons
gulp entity:website:fetch-missing-logos

# Refreshes promotion value in categories
gulp entity:category:refresh-ranking

# After adding new scrapper fetch availability for books
gulp scrapper:loader:fetch-availability --scrapperGroupId=26

Analyzers:

# Iterates over all records and reparses them, dangerous!!
# it removes records that are not classified as reviews after analyze
gulp scrapper:reanalyze:all --kind BOOK_REVIEW

# Parses again single record
gulp scrapper:reanalyze:single --remoteId szepty-spoza-nicosci-remigiusz-mroz,p697692.html --website www.publio.pl

Stats (console):

app.select(BookModule).get('BookStatsService').refreshBooksStats(R.pluck('id', books))

Spiders:

 gulp scrapper:spider:run

Scrappers:

Refresh all books from all websites:

 node_modules/.bin/gulp scrapper:refresh:all --kind BOOK_REVIEW --initialPage 1 --website wykop.pl
 node_modules/.bin/gulp scrapper:refresh:all --kind BOOK_REVIEW --website hrosskar.blogspot.com

Locks

Prevent clearing redis when warmup when lock is available (used for long tasks)

dist/locks/redis_warmup_flushdb.lock

Importers

Flow

  1. Running scrapper tasks such as refreshLatest, refreshSingle triggers fetching new records into scrapper_metadata table. All of these functions are stored in ServiceModule -> ScrapperService. After successful fetching page of scrapped content ScrapperService creates new background job stored in redis that runs database and book matchers.

  2. Each job is later executed and MetadataDbLoaderService tries to match book in database and saves it.

Scrappers

Adding new scrapper:

  1. Create scrapper file
  cd ./src/server/modules/importer/sites/
  mkdir example-scrapper/
  touch example-scrapper/ExampleScrapperGroup.ts
  1. Assign scrapper to scrappersGroups variable inside ScrapperService

Stack

Real World Nest.JS + TypeORM app.

  • Node.JS
  • Nest.JS
  • TypeORM
  • React
  • nginx
  • Nomad

About

๐Ÿ“– Open-source platform that aggregates reviews, book ratings and brochures written in React + TypeScript + NestJS + Redis + ElasticSearch

https://upolujksiazke.pl

License:GNU General Public License v3.0


Languages

Language:TypeScript 94.9%Language:SCSS 4.2%Language:JavaScript 0.4%Language:HCL 0.3%Language:Pug 0.1%Language:Dockerfile 0.0%Language:Shell 0.0%