LiamTalberg/StuQuestions-Question-Generator

Installation and Setup:

- Ensure you are using Python version 3.9

- On Windows and Linux to install the required libraries perform the following steps:
1) Run the command: "pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html"
- NOTE: It is VITAL that the "-f https://download.pytorch.org/whl/torch_stable.html" is included in the command.
2) After the packages have been installed run the command: "python3 -m spacy download en_core_web_sm"
- This will download the english package for Spacy.

- On Mac to install the required libraries perform the following steps:
1) Alter the requirements.txt file by removing the "+cpu" from torch and torchvision.
2) Run the command: "pip install -r requirements.txt"
3) After the libraries have been installed run the command: "python3 -m spacy download en_core_web_sm"
- This will download the english package for Spacy.

- Fixes to some common erors:
- Pip can not find "torch version 1.7.1" package:
- Make sure that Python version 3.9 is being used.
- This can be done using "python3.9 -m pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html"
- Or using the actual path to the the python 3.9 installation: "/usr/bin/python3.9 -m pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html"

- Error when building wheel for jsonnet:
- Install the python dev tools for python3.9 with the command: "sudo apt-get install python3.9-dev"

Descriptions of Included Folders/Files:

- "Transcripts" : A folder that contains 12 weeks of lecture transcripts, each in a different folder corresponding the week.
- "Sample Questions" : A folder that contains the 12 weeks of questions related to transcripts. Along with a merged and cleaned version of the all the questions in seperate .txt files.
- "Evaluation Data": A folder than contains the excel files with the results of the survey and internal evaluation.
- "semanticQG.py" : A Python script that contains the Semantic QG system.
- "templateQG.py" : A Python script that contains the Template QG system.
- "requirements.txt" : A text file containing the names and versions of the required libraries.
- "demoContent.txt" : A text file containing an example of a lecture transcript that can be used for testing/demo.
- "structured-prediction-srl-bert.2020.12.15.tar.gz" : The model from AllenNLP that performs the SRL.
- "allQuestions.txt" : A file containing 594 questions that have been cleaned and extracted from the dataset.
- "validQuestions.txt" A file containing the 81 questions from which valid templates were extracted.
- NOTE: The "validQuestions.txt" file can be used to quickly run the program for testing/demo.
- NOTE: However, due to the clustering the output will be different to that evalauted in the report.

Required Format of the Input Files:

- NOTE: All input files should be placed at the same level as the Python scripts.

- Content File:
- This is the file that contains the text from which questions will be generated.
- This can be any passage of text.
- The text can be of any resonable length (longer text = longer to generate questions)
- It can contain multiple paragraphs.
- The format can be seen in "demoContent.txt"

- Sample Questions File:
- The is the file containing the sample questions from which templates are extracted.
- The file should contain each question on a new line.
- The format can be seen in "allQuestions.txt" and "validQuestions.txt"

How to Run the Semantic QG System:

- NOTE: If the "structured-prediction-srl-bert.2020.12.15.tar.gz" model is not already downloaded (i.e.: not at same level as the "semanticQG.py" script):
- Comment out line 166.
- Uncomment line 167.
- This will download the model when the program is executed.
- However, this will require the model to re-downloaded every time the program is run.
- Thus, the model can be manually be downloaded from: "https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz"
- After downlading, place the model at the same level as the semanticQG.py script.

- The code for the semantic QG system is found within "semanticQG.py"
- The program can be executed by running "python3 semanticQG.py"
- Once executed the program will begin by loading in the SRL model (this can take a few seconds).
- NOTE: If running on Windows, an error will be displayed in the console. This can be ignored.
- On the first time running the code it may automatically install an NLTK Wordnet package.
- Once the model is loaded you will be prompted to enter the name of a text file from which you want questions to be generated.
- The file should be placed at the same level as the "semanticQG.py" script.
- You should enter the file name EXCLUDING the ".txt" extension (i.e.: demoContent NOT demoContent.txt).
- If the file CANNOT be found, an error message be presented.
- If the file CAN be found, the system will begin to generate questions.
- Upon completion, some statistics regarding the amount generated questions will be displayed in the terminal.
- The generated questions will be output to a file with the same name as the input file with the suffix "SemanticQuestions"
- i.e.: If the input file was "demoContent.txt" the output file will be "demoContentSemanticQuestions.txt"
- This output file will contain the sentence followed by the question/s generated from it by the system.
- To terminate the program you can simply type "stop"

How to Run the Template QG System:

- NOTE: If the "structured-prediction-srl-bert.2020.12.15.tar.gz" model is not already downloaded (i.e.: not at same level as the "semanticQG.py" script):
- Comment out line 166.
- Uncomment line 167.
- This will download the model when the program is executed.
- However, this will require the model to re-downloaded every time the program is run.
- Thus, the model can manually be downloaded from: "https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz"
- After downlading, place the model at the same level as the semanticQG.py script.

- The code for the Template QG system is found within "templateQG.py"
- The program can be executed by running "python3 templateQG.py"
- Once executed the program will begin by loading in the SRL model (this can take a few seconds)
- On the first time running the code it may automatically install an NLTK Wordnet package.
- NOTE: If running on Windows, an error will be displayed in the console. This can be ignored.
- Once the model is loaded the program will request the name of the file containing the sample questions.
- These sample questions are the questions used to extract templates (The format of the file is explained above).
- You should enter the name of the file containing the questions EXCLUDING the ".txt" extension (i.e.: allQuestions NOT allQuestions.txt)
- The file should be placed at the same level as the "templateQG.py" script.
- If the file CANNOT be found, an error message be presented.
- If the file CAN be found, the system will beging to cluster and extract templates from the questions.
- This can take some time depending on the amount of questions.
- Once completed, the system will display the number of templates extracted.
- Two files will also be created: one with the suffix "Clustered" and another with the suffix "TemplatesClustered"
- e.g.: If the input file was called "allQuestions.txt":
- "allQuestionsClustered.txt" : Would contain all the sample questions listed in their respective clusters.
- "allQuestionsTemplatesClustered.txt" : Would contain the extracted templates in their respective clusters.
- NOTE: The templates in this file will only be the valid templates that have met the extraction criteria.
- After template extaction you will be prompted to enter the name of a text file from which you want questions to be generated.
- You should enter the file name EXCLUDING the ".txt" extension (i.e.: demoContent NOT demoContent.txt).
- The file should be placed at the same level as the "templateQG.py" script.
- If the file CANNOT be found, an error message be presented.
- If the file CAN be found, the system will begin to generate questions.
- Upon completion, some statistics regarding the generated questions will be displayed in the terminal.
- The generated questions will be output to a file with the same name as the input file with the suffix "TemplateQuestions"
- i.e.: If the input file was "demoContent.txt" the output file will be "demoContentTemplateQuestions.txt"
- This output file will contain the sentence followed by the question/s generated from it by the system, as well as the template used.
- To terminate the program you can simply type the "stop"

LiamTalberg / StuQuestions-Question-Generator

About

Languages