There are 5 repositories under sentence-segmentation topic.
Underthesea - Vietnamese NLP Toolkit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
A toolkit for discourse segmentation (EDU segmentation).
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
A sentence segmentation library with wide language support optimized for speed and utility.
Port of PragmaticSegmenter for sentence boundary detection
Deep neural approach to Boundary and Disfluency Detection - Based on my Master's work
Pre-trained models for tokenization, sentence segmentation and so on
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
Corpus processing library
Vietnamese Sentence Boundary Detection
Pre-trained models for tokenization, sentence segmentation and so on
A tool to perform sentence segmentation on Japanese text
Corpus processing library
Corpus processing library
Several benchmarks on sentence splitting and language identification
Sentence segmentation for burmese language by rule-based method
Sentence segmenter for legal texts
A python wrapper for VnCoreNLP
A Python3 package for extracting syntactic complexity measures from CoNLL-U annotations.
Semantic-based search using word embedding to help the medical community develop answers to high priority scientific questions using Kaggle's CORD-19 dataset. This repository is part of Kaggle's CORD-19 challenge: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
Extracts sentences from txt files.
Corpus Processing Library
Document preprocessing scripts for the Nature of EU Rules project