There are 1 repository under nutch topic.
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Viewers for statistics and dashboarding of Domain Search Engine data
A OCR Search Engine With Tesseract Nutch Solr And PHP
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
An ultra small PoC to show how to combine Apache Nutch and Apache Solr, crawling through web pages and storing the results in Solr for quering
Apache Nutch is an extensible and scalable web crawler
Python port of Nutch that allows controlling Apache Nutch via its REST API.
A very simple search engine "specialised" in searching financial news.
Simple crawler using apache nutch and elasticsearch
Launch fast and easy an Apache Solr linked with Apache Nutch in separated docker containers.
A simple web crawler inside a docker container using Apache Nutch 1 and Solr.
Nutch 1.x Indexer Plugin that runs against ES6.7
Developed a Spatial Search website that allow users to search documents from FBI Vault website. Extract the most frequently occurring location in each of documents, and load the geo-tagged data into Apache Solr to index the documents, visualize search results using the Google Maps API.
Search Engine project for Information Retrieval class.
A Vapor app consisting in a simple search engine built for my information retrieval course project.
Nutch 2.3.1 plugin for Whitelisting/Blacklisting specific HTML elements
Nutch with Cassandra and Elasticsearch on Docker
Search engine knowledge systems(搜索引擎知识体系).
DataHarvest: Dockerized Web Crawling, Indexing, and Storage Solution
Rest Service for Spring/Solr backed search engine.
:sparkles: :dna: Apache Nutch Plugin for Viglet Turing Search
Developed as part of an Information Retrieval coursework, this project showcases a search engine that efficiently indexes and retrieves information from a given dataset.
Apache Nutch system adapter for ORCA