Code42Cate / News-Aggregator

News Aggregator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This project is not even close to being where I want it to be, but its already usable in a local environment

News Aggregator

This project is the solution to my frustration which most news sites/aggregators I used caused for way too long.

I read a lot of news articles everyday, but I usually don't have the time to scout on multiple news site for articles that interest me.

The problem with many news sites is, that their algorithms are way too freaking sensitive and like to put mit into annoying bubbles.

Don't know about you, but I don't like bubbles:(

Okay, enough rant. We have a problem? Whats the solution?

Let's build our own news aggregator!

Supported Sites Planned Additional Sites
Hackernews The Verge
Hackernoon Mashable
Reddit Engadget
TechCrunch ArsTechnica
TechRepublic Vox
Wired MakeUseOf
XKCD TechMeme
yCombinator Buzzfeednews
Slashdot Inquisitr
Digg businessInsider
ReadWrite nytimes
thenextweb time
Techradar bbc
cnet

How does it work?

Yeyee coming tonight:D

Tech Stack

Screenshots

Since I am only running it locally in my dev environment and you might want to see how the frontend looks:

alt text

If you click on 'Add Label' this shows up:

alt text

Requirements for local environment

  • Installation of mongoDB
  • Python 3.7+
  • Node.js (I am using v10, not sure how far back you can go. ES6 is required!)
  • And more stuff, an installation script is coming some time in the future

You want to add your own scrapers for your favourite site?

All the scrapers are in the scraper/ directory.

If you want to add another scraper, all you need to do is make a class which inherits from SiteScraper and implement the scrape() method. The scraper will be dynamically used by scrape.py.

If you need inspiration, take a look at the existing scrapers. Some use RSS Feeds wired_scraper.py, some json reddit_scraper.py and the others beautifulsoup. AND please, if you do more than 1 request, make it asynchronously Before you push, make sure they work with the provided unit tests!:)

Testing

Feel free to add more tests:D

This project is using pyunit for unit tests. Go into the root directory of this project and run:

python3 -m unittest discover

I am also using node.js and puppeteer for some frontend testing. Run them with npm run in the testing folder. (You need to run python3 api.py before and might want to update your database with python3 scrape.py first!)

TODO:

Backend:

  • Figure out how we can prevent timeouts (and handle other errors) from all the sites (Might wanna use proxies?)
  • Figure out how the fuck asyncio works so I can make the main scraping loop in async:D
  • Figure out how to close failing connections
  • Figure out how to handle exceptions in the scrapers, it would be good if people who implement scrapers do not need to do worry about that
  • Make it production ready!
  • Write more tests

Frontend:

  • Write more tests
  • Add filtering options for frontend
  • Work on site performance
  • Make it possible to scroll while dragging
  • Send the delete requests in batches

Authors

About

News Aggregator


Languages

Language:Python 64.2%Language:JavaScript 24.6%Language:HTML 8.6%Language:CSS 2.5%