Takahashi Kanji's starred repositories
wikipedia-utils
Utility scripts for preprocessing Wikipedia texts for NLP
llama_index
LlamaIndex is a data framework for your LLM applications
whisper.cpp
Port of OpenAI's Whisper model in C/C++
entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
pytorch-partial-crf
CRF, Partial CRF and Marginal CRF in PyTorch
applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
ML-Workflow-with-SageMaker-and-StepFunctions
Example of ML Workflow using SageMaker and StepFunctions
sample-codes-for-aiml
こちらでは AWS を使った AI/ML のサンプルコードを公開しています。
best-of-ml-python
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
LSTM-CRF-pytorch-faster
A more than 1000X faster paralleled LSTM-CRF implementation modified from the slower version in the Pytorch official tutorial (URL:https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html).
ner-wikipedia-dataset
Wikipediaを用いた日本語の固有表現抽出データセット
100-nlp-papers
100 Must-Read NLP Papers
UD_Japanese-GSD
Japanese data from the Google UDT 2.0.
awesome-mlops
A curated list of references for MLOps
inappropriate-words-ja
日本語における不適切表現を収集します。自然言語処理の時のデータクリーニング用等に使えると思います。