ugmatcha suite

ugmatcha suite consists of these sub projects:

See each projects for details.

Pre-requirements

java-ugmatcha-suite/trietree is available on GitHub Packages. (Japanese version)

for Maven

Create a personal access token with read:packages permission at https://github.com/settings/tokens

Put username and token to your ~/.m2/settings.xml file with <server> tag.

<settings>
  <servers>
    <server>
      <id>github</id>
      <username>USERNAME</username>
      <password>YOUR_PERSONAL_ACCESS_TOKEN_WITH_READ</password>
    </server>
  </servers>
</settings>

Add a repository to your repositories section in project's pom.xml file.

<repository>
  <id>github</id>
  <url>https://maven.pkg.github.com/koron/java-ugmatcha-suite</url>
</repository>

Add a <dependency> tag to your <dependencies> tag.

<dependency>
  <groupId>net.kaoriya.ugmatcha</groupId>
  <artifactId>wikidict</artifactId>
  <version>0.0.3</version>
</dependency>

Please read public document also. (Japanese)

for Gradle

Create a personal access token with read:packages permission at https://github.com/settings/tokens

Put username and token to your ~/.gradle/gradle.properties file.

gpr.user=YOUR_USERNAME
gpr.key=YOUR_PERSONAL_ACCESS_TOKEN_WITH_READ:PACKAGES

Add a repository to your repositories section in build.gradle file.

maven {
    url = uri("https://maven.pkg.github.com/koron/java-ugmatcha-suite")
    credentials {
        username = project.findProperty("gpr.user") ?: System.getenv("USERNAME")
        password = project.findProperty("gpr.key") ?: System.getenv("TOKEN")
    }
}

Add an implementation to your dependencies section.

implementation 'net.kaoriya.ugmatcha:wikidict:0.0.3'

Please read public document also. (Japanese).

Developping memo

tmp/ に wikiwords.stt と wikiwords.stw を置く。両ファイルは https://github.com/koron/wpwordtool で作る。

tmp/ に in.txt を置く

$ ./gradlew wikidict:matchDemo -Pargs='../tmp/in.txt' > tmp/out.txt

Benchmark

Input data is consisted from Japanese Wikipedia's abstracts of all page. See https://github.com/koron/wpwordtool#abstract-sub-command for details.

$ ./gradlew wikidict-benchmark:benchmarkMatcher -Pargs=../tmp/abstract.txt
benchmark with file:../tmp/abstract.txt
control:
  total: 0.504289 seconds
  average_per_line: 435 nanoseconds
  lineCount: 1157686
matcher:
  total: 11.130144 seconds
  average_per_line: 9614 nanoseconds
  lineCount: 1157686

koron / java-ugmatcha-suite