Creepy_Crawler

Creepy Crawler is a full-stack search engine application. It's inspired by popular search engine apps. It allows the user to make queries, see their history, and set their theme.

Crawl the web 🕷

Queries from the frontend are received asynchronously by Flask with help from the Crochet library where they are processed and passed to the Scrapy spiders.

import crochet
crochet.setup()
@crochet.wait_for(timeout=200.0)
def scrape_with_crochet(raw_query):
  partitioned_query = ...
  query_regex = re.compile(...)
  dispatcher.connect(_crawler_result, signal=signals.item_scraped)
  spiders = [...]
  if len(partitioned_query):
      for spider in spiders: crawl_runner.crawl(spider, query_regex=query_regex)
      eventual = crawl_runner.join()
      return

Settings are passed from Flask backend to Scrapy framework through configuration object.

...
from scrapy.utils.project import get_project_settings
...
settings = get_project_settings()
settings_dict = json.load(open('app/api/routes/settings.json'))
settings.update(settings_dict)
crawl_runner = CrawlerRunner(settings)

Each spider runs a broad crawl through the web, starting from a seed URL.

class BroadCrawler2(scrapy.Spider):
  """Broad crawling spider."""

  name = 'broad_crawler_2'
  start_urls = ['https://example.com/']

  def parse(self, response):
      """Follow links."""
      try:
          all_text = response.css('*:not(script):not(style)::text')
          for text in all_text:
              query_found = bool(re.search(self.query_regex, text.get()))
              if query_found: yield { 'url': response.request.url, 'text': text.get() }
              
      except: print(f'End of the line error for {self.name}.')

      yield from response.follow_all(css='a::attr(href)', callback=self.parse)

Create custom themes 🎨

AWS integration allows users to add backgrounds and profile images of their choice.

Look over your search history 🔍

The user can conveniently switch between 24 and 12 hour time.
Moreover, NATO timezone abbreviations are specially parsed for users with altered native settings.

Enjoy advanced interactions with your themes 🧮

Contact

LinkedIn

Errors I encountered and conquered:

https://github.com/MasterGrant137/Creepy_Crawler/wiki/Tasty-Bugs

About

Creepy Crawler is a search engine wrapped in a fully customizable browser. Users can search the web, have their history logged, and set custom themes.

Languages

Language:Python 62.8%Language:CSS 35.2%Language:HTML 0.6%Language:Dockerfile 0.5%Language:Mako 0.4%Language:JavaScript 0.4%Language:Shell 0.0%