tokenizer.SysDicIPASimple() causes out of memory error
ikawaha opened this issue · comments
ikawaha commented
go 1.13.0, kagome 1.11.0, Debian 9.9 (Chromebook)
https://twitter.com/shibu_jp/status/1178466763995959296?s=20
package main
import (
"http://github.com/ikawaha/kagome/tokenizer"
"testing"
)
var doc = tokenizer.SysDicIPASimple()
var kagomeTokenizer = tokenizer.NewWithDic(doc)
func TestSample(t *testing.T) {
t.Log("hello")
}
ikawaha commented
In my environment, the above code did not cause an error.
I'm looking for cases where similar errors occur.
KEINOS commented
I confirm that it is not causing an error with the below env.
- go 1.13.0, kagome 1.11.0, Debian 9.9 (Docker over macOS)
I think the error was from another cause. And +1
to close this issue until any reproducible error arises.
- Log
$ tree
.
├── Dockerfile
├── go.mod
├── go.sum
├── main.go
└── main_test.go
0 directories, 5 files
$ docker build -t test:local .
....
$ docker run --rm test:local
Run main
BOS(0, 0)DUMMY[-1]
私(0, 1)KNOWN[304999]
は(1, 2)KNOWN[57061]
太郎(2, 4)KNOWN[181027]
です(4, 6)KNOWN[47492]
。(6, 7)KNOWN[98]
EOS(7, 7)DUMMY[-1]
Run test
ok kagome/sample 1.415s
$ docker run --rm --entrypoint cat test:local /etc/debian_version
9.9
$ docker --version
Docker version 20.10.7, build f0df350
$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.7
BuildVersion: 19H1217
Dockerfile
# Available Images see: https://golang.org/dl/
ARG VER_GO='1.13'
ARG VER_OS='9.9'
FROM debian:${VER_OS}
ARG VER_GO
ENV \
GO111MODULE=on \
PATH="${PATH}:/usr/local/go/bin"
# Install Go
RUN \
apt update && \
apt install -y wget && \
name_archive="go${VER_GO}.linux-amd64.tar.gz" && \
wget "https://golang.org/dl/${name_archive}" && \
rm -rf /usr/local/go && \
tar -C /usr/local -xzf "./${name_archive}" && \
go version && \
rm -rf "./${name_archive}"
COPY . /workspace
WORKDIR /workspace
RUN \
go mod download
ENTRYPOINT echo 'Run main' && go run . && echo 'Run test' && go test .
go.mod / go.sum
module kagome/sample
go 1.13
require github.com/ikawaha/kagome v1.11.0
github.com/ikawaha/kagome v1.11.0 h1:mJ3W/SSDaDnmx1W2PaJsdTpab/mCeRgp586jXuYoh3Y=
github.com/ikawaha/kagome v1.11.0/go.mod h1:eEV1yEy8Hm2eJRMz6nU1OlbrafRqXTECbsmm9aUMX2s=
main.go / main_test.go
package main
import (
"fmt"
"github.com/ikawaha/kagome/tokenizer"
)
var doc = tokenizer.SysDicIPASimple()
var kagomeTokenizer = tokenizer.NewWithDic(doc)
func main() {
Sample()
}
func Sample() {
text := "私は太郎です。"
tokens := kagomeTokenizer.Tokenize(text)
for _, token := range tokens {
fmt.Printf("%v\n", token)
}
}
package main
import (
"testing"
)
func TestSample(t *testing.T) {
t.Log("hello")
Sample()
}