Invalid Answers

Question

Invalid Answers

opened this issue 5 years ago · comments

When asking: "Who was the first president of the United States?"

The answer is:

Normally vice presidents hold some power and special responsibilities below that of the president. The amendment also specifies that if any eligible person serves as president or acting president for more than two years of a term for which some other eligible person was elected president, the former can only be elected president once. Mitt Romney for president. Perhaps the best known sub-national presidents are the borough presidents of the Five Boroughs of New York City. The president fulfills various ceremonial duties.

Shirish Kadam · Answer 1 · Thu May 23 2019 02:50:16 GMT+0800 (China Standard Time)

@infosisio currently I am unable to maintain this project, I have identified some issues/shortcomings with the project if you are interested to contribute I will share it with you.

Ido Roi Engel · Answer 2 · Wed May 29 2019 01:49:49 GMT+0800 (China Standard Time)

@5hirish Hey, I am interested in helping out, please let me know what needs to be done, and I'll try to do something about it :)

Shirish Kadam · Answer 3 · Sun Jun 02 2019 00:15:48 GMT+0800 (China Standard Time)

@idoroiengel That's great to hear. When I started the project the basic outline I chalked out was to have a Question Answering system where you would ask a question it would go and perform basic NLP operations on the question like Tokenisation, Stemming, POS tagging, Dependency extraction. It will try to extract all the relevant keywords from the question which could be used to construct a query to search on any knowledge source. After searching the on a knowledge source it would get the raw data, try to filter out irrelevant information or summarize and generate candidate answers and rank them.

Shirish Kadam · Answer 4 · Sun Jun 02 2019 00:22:13 GMT+0800 (China Standard Time)

Since then a lot of things have changed with my understanding of this problem statement and the different ways to solve it. There are a lot of constructs in the system currently that can work against its favor and give out irrelevant answers such as above. To understand the current state of the system I would redirect you to /docs folder of the repo where there is an architecture diagram and a white paper of the system. I will also note down a couple of issues I am aware of here in this issue. Also, the build on Travis is failing I will also look in to that and try to fix it. In the mean time you can reach out to mean on my email address in case you need nay help with the project and trying to understand its codebase or having any troubles setting up the project.

Shirish Kadam · Answer 5 · Sun Jun 02 2019 00:34:57 GMT+0800 (China Standard Time)

I have compiled this list a long time ago, so I have forgotten the specifics of it, but nonetheless, it should be a good start.

Issues with the keywords being searched on Wikipedia [Selective Search]: Irrelevant keywords being searched on knowledge source leading to add noise in the extracted knowledge.
Improve the keyword extraction: Working on a keyword extraction algorithm, so that the current rule-based keyword extraction can be deprecated for an unsupervised methodology. We can look into the dependency relations of each token and take into account its other grammatical features to identify the keywords in it.
Search on the structured info: A lot of tabular and structured information is extracted from Wikipedia. Work on an algorithm to search on nested JSON data to identify the relevant keys in it and get their values.
Question classification: Revisited question classification model (Support Vector Machine), tweak it if necessary try to include the classified label in keyword extraction or query construction phase to improve keyword extraction/query construction
Information retrieval: Revisit information extraction phase (Vector Space Model), can we improve it with LSTM maybe?
Can we leverage Elasticsearch more in the project?

Shirish Kadam · Answer 6 · Sun Jun 02 2019 00:36:58 GMT+0800 (China Standard Time)

@idoroiengel Maybe this easiest thing to start with can be upgrading the dependencies like spacy. I would be glad if we can revive this project and will try to take this up more regularly!!!

Shirish Kadam · Answer 7 · Sun Jun 02 2019 02:03:58 GMT+0800 (China Standard Time)

Fixed build issues with Travis CI

Tharun_sai · Answer 8 · Sun Jun 02 2019 19:28:00 GMT+0800 (China Standard Time)

what is know_corp in Corpus and how does it will affect the model?

Ido Roi Engel · Answer 9 · Sun Jun 02 2019 19:44:53 GMT+0800 (China Standard Time)

@5hirish sounds good, I also already glanced at some of the docs, and I think I got the basics. I work mostly on Android, but since I'm MA Linguistics graduate I want to do some NLP coding. I can take a look at the dependencies this week. I built it successfully with the current dependencies on my local machine, and ran it a few times with several queries to test the system.

Ido Roi Engel · Answer 10 · Sun Jun 02 2019 19:46:57 GMT+0800 (China Standard Time)

@5hirish do you have any specific notes for the branches of the project that I should be aware of? Also, should we continue this discussion in a different conversation?

Shirish Kadam · Answer 11 · Mon Jun 03 2019 00:39:06 GMT+0800 (China Standard Time)

@idoroiengel currently all the branches are stale and no feature is under development. So, master is the stable branch. Yes, let us carry out this conversation on mail (mail@5hirish.com) or Gitter or maybe Slack.

Also, in December I was thinking of trying to implement some of the SQUAD 2.0 approaches. SQUAD 2.0 Think ths would be a good start to kickstart the project again. Going through some of the approaches from this competition and trying to implement one of it that uits our project and the problem we are trying to solve.

Shirish Kadam · Answer 12 · Mon Jun 03 2019 00:43:28 GMT+0800 (China Standard Time)

@TharunAts this would be an intermediate storage file to store the extracted knowledge source from Wikipedia which is later processed and ranked. Not proud of how I approached this problem at the time 😅

Deleted user · Answer 13 · Mon Jun 03 2019 00:47:43 GMT+0800 (China Standard Time)

It would be better if you keep the discussion here and not via mail so that others can view it and participate too.

…

On Sun, Jun 2, 2019 at 7:43 PM Shirish Kadam ***@***.***> wrote: @TharunAts <https://github.com/TharunAts> this would be an intermediate storage file to store the extracted knowledge source from Wikipedia which is later processed and ranked. Not proud of how I approached this problem at the time 😅 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#36?email_source=notifications&email_token=AJRXIPRNEEE6D7QRKM5OA4LPYP2DDA5CNFSM4HN3B3X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXZONY#issuecomment-498046775>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJRXIPXXCW37Z6NCV2KGVZDPYP2DDANCNFSM4HN3B3XQ> .

Shirish Kadam · Answer 14 · Mon Jun 03 2019 00:58:40 GMT+0800 (China Standard Time)

@infosisio @idoroiengel I have created a Gitter chat for the project, which would be much more convenient for any discussions related to the project. As broad conversations would be quite inconvenient to carry out on a single issue. Feel free to join Gitter chat

Also, I had created a Kanban project board here on GitHub when I was thinking of SQAUD competition and have documented whatever initial findings I had done. Kanban Board