Web Search and Mining - StackOverflow Search
-
Install the following dependency libraries with Python 3(>= 3.5)
- For Linux (python3-dev is also needed)
sudo apt install python3-dev pip install -r requirements.txt
- For Windows
pip install -r requirements_win.txt
-
(Optional) Install NLTK data. If you don't want to build the index by yourself, you can skip this step.
python -m nltk.downloader stopwords punkt
-
Download the database (and config file) from this link (extraction code: qgao) and place it under directory
SOSearch/
. We provide two types of databases, one containing only raw data and the other containing both data and indexes (and also Django fields). Each type of the database has three specifications, which are small (5k), medium (100k) and large (300k). -
Change the settings in the following files according to the database you choose.
SOSearch/SOSearch/settings.py
SOSearch/test_backend.py
SOSearch/test_indexer.py
-
(Optional) Add additional fields to the database and build index by running the following commands (at
SOSearch/
). If you chose to use the database with index (and also Django fields), you can skip this step.python manage.py makemigrations python manage.py migrate python manage.py rebuild_index
-
(Optional) Modify file
SOSearch/config.txt
according to the data inSOSearch/question_index
andSOSearch/answer_index
. If you chose to use the database with index (and also Django fields), you can skip this step. -
(Optional) Add static file service by running the following command (at
SOSearch/
). If you are in debug mode(setDEBUG = True
inSOSearch/SOSearch/settings.py
), you can skip this step.python manage.py collectstatic
-
Finally, run the following command to start server locally.
python manage.py runserver
Or make it public by
python manage.py runserver 0.0.0.0:8000
-
question_list_spider.py
python question_list_spider.py --l 100500
-
question_answer_spider.py
python question_answer_spider.py --l 5000797
create sqlite3 database
See SOSearch/readme.md
for more details.