dlwh / epic

**Archived** Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.

Home Page:http://scalanlp.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation and Knowledge base is so incredible poor

MarcusSjolin opened this issue · comments

Can there be some more ways of gaining knowledge please?

At least documented code?

More examples?

The introduction clip on YouTube (https://www.youtube.com/watch?v=rpfVtRqQ4_o) explains that you can create your own features, but not how to use them. For this project to gain traction there needs to be more information. I would help writing guides etc, but I just can't get on with it, there's nothing to reference to.

/Marcus

Sorry about that. I agree the documentation is pretty shoddy. What would you like to be able to do?

Did you look at https://github.com/dlwh/epic-demo ?

My biggest problem I guess is to know what I can combine, what goes where and how things integrate with each other.

I'd like to know how I implement a simple feature to use when going through a text?

I'd like to know how to use multiple custom ones?

I've seen the Epic demos, and they all work

What are these representing?
preprocess?

  • Do something with the data before running something on it, but what can be achieved here?

slab?

  • A data source that you can do something with?

models?

  • Reference to a set of features that can pick out certain things in a text? (pre build ones are language feature detectors?)

parser?

  • Something that goes through the text to work out what is necessary?

trees?

  • A representation of what words are, like noun and after that there's a verb etc?

sequences?

  • Segment data to pick up if it is a set of two words or one?

Some of these concepts, I think it would be much easier to get started if they can be explained. Why they are there, and what I can do with them. If I'm looking for a certain feature, where should I look?

Might be a lot to answer, but I do think you got something useful here and I'd like to see it being developed further!

/Marcus

Thanks. That is helpful.

At the moment, the internals of Epic (making features, etc) are kind of
targeted at people with a good bit of NLP ML expertise. Really some of the
external bits are too. I would like to make it more friendly, but it's a
long way from that, obviously.

On Sun, Dec 21, 2014 at 3:16 PM, Marcus Sjölin notifications@github.com
wrote:

My biggest problem I guess is to know what I can combine, what goes where
and how things integrate with each other.

I'd like to know how I implement a simple feature to use when going
through a text?

I'm not sure what you mean here?

I'd like to know how to use multiple custom ones?

Featurizers in Epic can be added together with the "+" operator to create
composite featurizers.
"Featurizers" turn a sentence into a set of features. I think you might
have a misconception about what I mean by features (which is the standard
ML terminology?), which is property of (part of) an input data point (like
a sentence) that can be used to predict the appropriate output.

I've seen the Epic demos, and they all work

What are these representing?
preprocess?

  • Do something with the data before running something on it, but what
    can be achieved here?

preprocess can:

  1. segment sentences
    val segmenter = MLSentenceSegmenter.bundled().get
    segmenter.segment(text)

  2. Tokenize sentences into words and punctuation.
    epic.preprocess.tokenize(sentence)

  3. Do both at once (epic.preprocess.preprocess) as demonstrated in the demo.

  4. Extract content from arbitrary files or urls using Apache Tika
    (epic.extractText(url))

slab?

  • A data source that you can do something with?

Slabs hold annotations (parse trees, named entities, etc) for a text in a
uniform way. We're actually reworking them, so don't put a lot of effort
into learning them.

models?

  • Reference to a set of features that can pick out certain things in a
    text? (pre build ones are language feature detectors?)

Something like that. Models refer to the result of a machine learning
algorithm, with a featurizer, some weights, and a dynamic program which can
build structures over a text, like (I overload terminology and sometimes
use "model" to mean everything except the weights.)

parser?

  • Something that goes through the text to work out what is necessary?

Parsers produce parse trees, as below.

trees?

  • A representation of what words are, like noun and after that there's
    a verb etc?

That and how the words are related to one another: what are the noun
phrases in a sentence, what verb has what object, etc.
http://en.wikipedia.org/wiki/Parse_tree

If you didn't know what these were going in, they will probably not be
useful to you---I'm working in the background on a format that's more
useful to laymen, but it will be some time.

sequences?

  • Segment data to pick up if it is a set of two words or one?

There are two kinds of predictions we have under sequences: something that
assigns a label to every word (e.g. part of speech tags like noun, verb,
etc), and those that assign a label to disjoint contiguous sequences of
words (e.g. which phrases are people, places, or things.)

Some of these concepts, I think it would be much easier to get started if
they can be explained. Why they are there, and what I can do with them. If
I'm looking for a certain feature, where should I look?

Might be a lot to answer, but I do think you got something useful here and
I'd like to see it being developed further!

/Marcus


Reply to this email directly or view it on GitHub
#18 (comment).

Thanks! That was really helpful, I think these answers were what I needed to grasp how things are connected. I now see more clearly how the process from input to output should be formed and what I can use in between. Thanks a lot!

Good going with the library as well, there seem to be a lot of work put into this.

/Marcus