stain / jena-docker

Docker image for Apache Jena riot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tdbloader2 requires GNU find and sort

danmichaelo opened this issue · comments

tdbloader2 throws some errors because the BusyBox versions of find and quit are missing some options:

See full output below.

After I did apk update && apk add coreutils findutils it ran without errors.

bash-4.3# /jena/bin/tdbloader2 --loc /fuseki/databases/ds /staging/bibbi-aut.ttl
 13:05:13 INFO -- TDB Bulk Loader Start
find: unrecognized: -quit
BusyBox v1.24.2 (2017-01-18 14:13:46 GMT) multi-call binary.

Usage: find [-HL] [PATH]... [OPTIONS] [ACTIONS]

Search for files and perform actions on them.
First failed action stops processing of current file.
Defaults: PATH is current directory, action is '-print'

        -L,-follow      Follow symlinks
        -H              ...on command line only
        -xdev           Don't descend directories on other filesystems
        -maxdepth N     Descend at most N levels. -maxdepth 0 applies
                        actions to command line arguments only
        -mindepth N     Don't act on first N levels
        -depth          Act on directory *after* traversing it

Actions:
        ( ACTIONS )     Group actions for -o / -a
        ! ACT           Invert ACT's success/failure
        ACT1 [-a] ACT2  If ACT1 fails, stop, else do ACT2
        ACT1 -o ACT2    If ACT1 succeeds, stop, else do ACT2
                        Note: -a has higher priority than -o
        -name PATTERN   Match file name (w/o directory name) to PATTERN
        -iname PATTERN  Case insensitive -name
        -path PATTERN   Match path to PATTERN
        -ipath PATTERN  Case insensitive -path
        -regex PATTERN  Match path to regex PATTERN
        -type X         File type is X (one of: f,d,l,b,c,...)
        -perm MASK      At least one mask bit (+MASK), all bits (-MASK),
                        or exactly MASK bits are set in file's mode
        -mtime DAYS     mtime is greater than (+N), less than (-N),
                        or exactly N days in the past
        -mmin MINS      mtime is greater than (+N), less than (-N),
                        or exactly N minutes in the past
        -newer FILE     mtime is more recent than FILE's
        -inum N         File has inode number N
        -user NAME/ID   File is owned by given user
        -group NAME/ID  File is owned by given group
        -size N[bck]    File size is N (c:bytes,k:kbytes,b:512 bytes(def.))
                        +/-N: file size is bigger/smaller than N
        -links N        Number of links is greater than (+N), less than (-N),
                        or exactly N
        -prune          If current file is directory, don't descend into it
If none of the following actions is specified, -print is assumed
        -print          Print file name
        -print0         Print file name, NUL terminated
        -exec CMD ARG ; Run CMD with all instances of {} replaced by
                        file name. Fails if CMD exits with nonzero
        -exec CMD ARG + Run CMD with {} replaced by list of file names
        -delete         Delete current file/directory. Turns on -depth option
 13:05:13 INFO Data Load Phase
 13:05:13 INFO Got 1 data files to load
 13:05:13 INFO Data file 1: /staging/bibbi-aut.ttl
INFO  Load: /staging/bibbi-aut.ttl -- 2019/10/31 13:05:14 GMT
INFO  Add: 50,000 Data (Batch: 40,584 / Avg: 40,584)
INFO  Add: 100,000 Data (Batch: 81,967 / Avg: 54,288)
INFO  Add: 150,000 Data (Batch: 133,689 / Avg: 67,689)
INFO  Add: 200,000 Data (Batch: 76,452 / Avg: 69,686)
INFO  Add: 250,000 Data (Batch: 110,132 / Avg: 75,210)
INFO  Add: 300,000 Data (Batch: 140,449 / Avg: 81,521)
INFO  Total: 334,010 tuples : 4.08 seconds : 81,925.43 tuples/sec [2019/10/31 13:05:18 GMT]
 13:05:18 INFO Data Load Phase Completed
 13:05:18 INFO Index Building Phase
 13:05:18 INFO Creating Index SPO
 13:05:18 INFO Sort SPO
sort: unrecognized option: buffer-size=50%
BusyBox v1.24.2 (2017-01-18 14:13:46 GMT) multi-call binary.

Usage: sort [-nrugMcszbdfimSTokt] [-o FILE] [-k start[.offset][opts][,end[.offset][opts]] [-t CHAR] [FILE]...

Sort lines of text

        -b      Ignore leading blanks
        -c      Check whether input is sorted
        -d      Dictionary order (blank or alphanumeric only)
        -f      Ignore case
        -g      General numerical sort
        -i      Ignore unprintable characters
        -M      Sort month
        -n      Sort numbers
        -o      Output to file
        -t CHAR Field separator
        -k N[,M] Sort by Nth field
        -r      Reverse sort order
        -s      Stable (don't sort ties alphabetically)
        -u      Suppress duplicate lines
        -z      Lines are terminated by NUL, not newline
        -mST    Ignored for GNU compatibility
 13:05:18 ERROR Failed during data phase

Thanks, hopefully fixed by #27 and the latest builds trickling in now.