Megagon Labs (megagonlabs)

Megagon Labs

megagonlabs

Geek Repo

Home Page:https://www.megagon.ai

Github PK Tool:Github PK Tool

Megagon Labs's repositories

ginza

A Japanese NLP Library using spaCy as framework based on Universal Dependencies

Language:PythonLicense:MITStargazers:727Issues:32Issues:81

ditto

Code for the paper "Deep Entity Matching with Pre-trained Language Models"

Language:PythonLicense:Apache-2.0Stargazers:244Issues:6Issues:24

bunkai

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Language:PythonLicense:Apache-2.0Stargazers:180Issues:5Issues:3

sato

Code and data for Sato https://arxiv.org/abs/1911.06311.

Language:PythonLicense:Apache-2.0Stargazers:108Issues:13Issues:13

jrte-corpus

Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)

Language:PythonLicense:NOASSERTIONStargazers:75Issues:5Issues:0

opiniondigest

OpinionDigest: A Simple Framework for Opinion Summarization (ACL 2020)

Language:PythonLicense:Apache-2.0Stargazers:56Issues:3Issues:3

SubjQA

A question-answering dataset with a focus on subjective information

t5-japanese

Codes to pre-train Japanese T5 models

Language:PythonLicense:Apache-2.0Stargazers:39Issues:4Issues:1

ruler

Data Programming by Demonstration (DPBD) for Document Classification

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:36Issues:6Issues:6

tagruler

Data programming by demonstration for information extraction and span annotation

Language:JavaScriptLicense:Apache-2.0Stargazers:35Issues:5Issues:6

coop

☘️ Code for Convex Aggregation for Opinion Summarization (Iso et al; Findings of EMNLP 2021)

Language:PythonLicense:BSD-3-ClauseStargazers:33Issues:5Issues:4

UD_Japanese-GSD

Japanese data from the Google UDT 2.0.

Language:PythonLicense:NOASSERTIONStargazers:28Issues:1Issues:0

doduo

Annotating Columns with Pre-trained Language Models

Language:PythonLicense:Apache-2.0Stargazers:25Issues:10Issues:8

asdc

Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)

Language:PythonLicense:CC-BY-4.0Stargazers:23Issues:6Issues:4

cocosum

:coconut: Code & Data for Comparative Opinion Summarization via Collaborative Decoding (Iso et al; Findings of ACL 2022)

Language:PythonLicense:Apache-2.0Stargazers:21Issues:7Issues:4

rotom

Code for the paper "Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond"

Language:RoffLicense:BSD-3-ClauseStargazers:20Issues:3Issues:2

ebe-dataset

Evidence-based Explanation Dataset (AACL-IJCNLP 2020)

Language:PLSQLLicense:NOASSERTIONStargazers:17Issues:6Issues:0

ginza-transformers

Use custom tokenizers in spacy-transformers

Language:PythonLicense:MITStargazers:17Issues:5Issues:3

starmie

Resources for PVLDB 2023 submission

machamp

The dataset for the paper "Machamp: A Generalized Entity Matching Benchmark" published in CIKM 2021

License:BSD-3-ClauseStargazers:15Issues:5Issues:0

teddy

Code and data for Teddy https://arxiv.org/abs/2001.05171.

Language:PythonLicense:Apache-2.0Stargazers:15Issues:7Issues:0

sudowoodo

The source code of the Sudowoodo paper in ICDE 2023

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:7Issues:6Issues:4

desuwa

Feature annotator to morphemes and phrases based on KNP rule files (pure-Python)

Language:Emacs LispLicense:Apache-2.0Stargazers:6Issues:6Issues:0
Language:PythonLicense:Apache-2.0Stargazers:5Issues:4Issues:1
Language:PythonLicense:BSD-3-ClauseStargazers:5Issues:6Issues:0

jrte-corpus_example

Example codes for Japanese Realistic Textual Entailment Corpus

Language:PythonLicense:Apache-2.0Stargazers:3Issues:5Issues:0

leam

Source code and demo for Leam

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3Issues:5Issues:0

minun

Evaluating Counterfactual Explanations for Entity Matching

Language:PythonLicense:BSD-3-ClauseStargazers:3Issues:4Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:5Issues:1

albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0