Documentation and Knowledge base is so incredible poor
MarcusSjolin opened this issue · comments
Can there be some more ways of gaining knowledge please?
At least documented code?
More examples?
The introduction clip on YouTube (https://www.youtube.com/watch?v=rpfVtRqQ4_o) explains that you can create your own features, but not how to use them. For this project to gain traction there needs to be more information. I would help writing guides etc, but I just can't get on with it, there's nothing to reference to.
/Marcus
Sorry about that. I agree the documentation is pretty shoddy. What would you like to be able to do?
Did you look at https://github.com/dlwh/epic-demo ?
My biggest problem I guess is to know what I can combine, what goes where and how things integrate with each other.
I'd like to know how I implement a simple feature to use when going through a text?
I'd like to know how to use multiple custom ones?
I've seen the Epic demos, and they all work
What are these representing?
preprocess?
- Do something with the data before running something on it, but what can be achieved here?
slab?
- A data source that you can do something with?
models?
- Reference to a set of features that can pick out certain things in a text? (pre build ones are language feature detectors?)
parser?
- Something that goes through the text to work out what is necessary?
trees?
- A representation of what words are, like noun and after that there's a verb etc?
sequences?
- Segment data to pick up if it is a set of two words or one?
Some of these concepts, I think it would be much easier to get started if they can be explained. Why they are there, and what I can do with them. If I'm looking for a certain feature, where should I look?
Might be a lot to answer, but I do think you got something useful here and I'd like to see it being developed further!
/Marcus
Thanks. That is helpful.
At the moment, the internals of Epic (making features, etc) are kind of
targeted at people with a good bit of NLP ML expertise. Really some of the
external bits are too. I would like to make it more friendly, but it's a
long way from that, obviously.
On Sun, Dec 21, 2014 at 3:16 PM, Marcus Sjölin notifications@github.com
wrote:
My biggest problem I guess is to know what I can combine, what goes where
and how things integrate with each other.I'd like to know how I implement a simple feature to use when going
through a text?I'm not sure what you mean here?
I'd like to know how to use multiple custom ones?
Featurizers in Epic can be added together with the "+" operator to create
composite featurizers.
"Featurizers" turn a sentence into a set of features. I think you might
have a misconception about what I mean by features (which is the standard
ML terminology?), which is property of (part of) an input data point (like
a sentence) that can be used to predict the appropriate output.I've seen the Epic demos, and they all work
What are these representing?
preprocess?
- Do something with the data before running something on it, but what
can be achieved here?preprocess can:
-
segment sentences
val segmenter = MLSentenceSegmenter.bundled().get
segmenter.segment(text) -
Tokenize sentences into words and punctuation.
epic.preprocess.tokenize(sentence) -
Do both at once (epic.preprocess.preprocess) as demonstrated in the demo.
-
Extract content from arbitrary files or urls using Apache Tika
(epic.extractText(url))
slab?
- A data source that you can do something with?
Slabs hold annotations (parse trees, named entities, etc) for a text in a
uniform way. We're actually reworking them, so don't put a lot of effort
into learning them.models?
- Reference to a set of features that can pick out certain things in a
text? (pre build ones are language feature detectors?)Something like that. Models refer to the result of a machine learning
algorithm, with a featurizer, some weights, and a dynamic program which can
build structures over a text, like (I overload terminology and sometimes
use "model" to mean everything except the weights.)parser?
- Something that goes through the text to work out what is necessary?
Parsers produce parse trees, as below.
trees?
- A representation of what words are, like noun and after that there's
a verb etc?That and how the words are related to one another: what are the noun
phrases in a sentence, what verb has what object, etc.
http://en.wikipedia.org/wiki/Parse_tree
If you didn't know what these were going in, they will probably not be
useful to you---I'm working in the background on a format that's more
useful to laymen, but it will be some time.
sequences?
- Segment data to pick up if it is a set of two words or one?
There are two kinds of predictions we have under sequences: something that
assigns a label to every word (e.g. part of speech tags like noun, verb,
etc), and those that assign a label to disjoint contiguous sequences of
words (e.g. which phrases are people, places, or things.)Some of these concepts, I think it would be much easier to get started if
they can be explained. Why they are there, and what I can do with them. If
I'm looking for a certain feature, where should I look?Might be a lot to answer, but I do think you got something useful here and
I'd like to see it being developed further!/Marcus
—
Reply to this email directly or view it on GitHub
#18 (comment).
Thanks! That was really helpful, I think these answers were what I needed to grasp how things are connected. I now see more clearly how the process from input to output should be formed and what I can use in between. Thanks a lot!
Good going with the library as well, there seem to be a lot of work put into this.
/Marcus