IBMStreams / streamsx.nlp

Provide operations for text analysis, like lemmatization and text annotation with Uima Ruta scripts or existing project specific Uima pear files.

Home Page:https://ibmstreams.github.io/streamsx.nlp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evolve operator "file" parameters and samples for IBM cloud

markheger opened this issue · comments

  • analyse different kind of operator parameters dealing with files or directories
  • relative path/file parameters should start from application directory
  • sample applications with sample input files: file should be read from directories other than data, e.g opt or etc and take care that files are present in sab file to prepare the sample for Streaming Analytics service in IBM cloud.

This breaks the former convention, that a relative path roots to data directory. The data directory can be moved away from the default place during compilation and during job submission. This feature is lost if you root all files from appl. directory.

  • relative from application directory is valid for configuration files only and not for data files
  • Operators dealing with input and output files like FileSource and FileSink still work with files relative from data directory (which are used in some composites of this toolkit). Unfortunately this does not work well in Streaming Analytics service for two reasons:
    a) data dir is not per default part of sab file
    b) default data dir is write protected in Streaming Analytics service
    Therefore the samples needs to be prepared for Streaming Analytics service.
    Current samples should be copied to test directory, because test cases need to compare files in data dir in the project.

Changes for version 1.4.0:

  • UIMA/Ruta operators: relative path support from applicationDir and applicationDir/etc for pearFile paramater
  • DictionaryFilter: relative path support from applicationDir for dictionaryFile (former relative to dataDir is not supported anymore)

Changes for version 1.4.0:

  • TfIdfWeight: relative path support from applicationDir for corpusFile parameter (former relative to dataDir is not supported anymore)