automatic-annotation gec grammar-checker grammar-errors learner-corpus l1-interference spacy-pipeline spacy

✒️ Essay Grammar Checker

Essay Grammar Checker trained on Russian Error-Annotated Learner English Corpus using SpaCy.

Training

The checker consists of 6 pipelines each trained on specific error types. Error Categories used for pipeline mapping:

    "spelling":{"Spelling", "Capitalisation"},  
    "punctuation": {"Punctuation"},
    "articles": {"Articles"},  
    "vocabulary": {"lex_item_choice", "lex_part_choice",
                   'Category_confusion','Formational_affixes'},
    "grammar_major": {'Tense_choice','Prepositions','Agreement_errors', 'Redundant_comp'},
    "grammar_minor": {'Word_order','Noun_number', 'Numerals','Verb_pattern', 'Determiners'}

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation. nlp.rehearse method can be also used to update trained models.

Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command	Description
`preprocess`	Convert the data to spaCy format required
`generate_configs`	Configs class weight update
`train_pipelines`	Launch training
`evaluate_pipelines`	Evaluate models
`assemble_pipelines`	Assemble model
`package`	Package the resulting model

Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow	Steps
`all`	`preprocess` → `generate_configs` → `train_pipelines` → `evaluate_pipelines` → `assemble_pipelines` → `package`

Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory. The data used for training can be extracted from the corpus using the following code.

File	Source
`assets/realec/data_realec.tar.bz2`	REALEC

Performance

Metric	Scores
f1-scores	`punctuation`:0.779, `spelling`:0.939, `capitalisation`:0.902, `articles`:0.852, `lex_part_choice`: 0.235, `lex_item_choice`: 0.685, `Category_confusion`: 0.705, `Formational_affixes`: 0.742, `Verb_pattern`:0.629, `Noun_number`:0.920, `Word_order`:0.527, `Numerals`:0.736, `Determiners`:0.044, `Agreement_errors`:0.835, `Prepositions`:0.710, `Redundant_comp`:0.495, `Tense_choice`:0.825

Usage

Install

!pip install https://huggingface.co/iproskurina/en_grammar_checker/resolve/main/en_grammar_checker-any-py3-none-any.whl

# Using spacy.load().
import spacy
nlp = spacy.load("en_grammar_checker")

# Importing as module.
import en_grammar_checker
nlp = en_grammar_checker.load()

Streamlit

streamlit run streamlit_app.py

SpanCategorizer-based rendering

About

Essay Grammar Checker trained on REALEC Corpus using SpaCy

automatic-annotation gec grammar-checker grammar-errors learner-corpus l1-interference spacy-pipeline spacy

Languages

Language:Jupyter Notebook 80.3%Language:Python 19.1%Language:Shell 0.6%