hamelsmu / code_search

Code For Medium Article: "How To Create Natural Language Semantic Search for Arbitrary Objects With Deep Learning"

Home Page:https://medium.com/@hamelhusain/semantic-code-search-3cd6d244a39c

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In Part 4, vectorize all of the code without docstrings.

zhaoyiCC opened this issue · comments

I am wondering the reasons why we need to vectorize all of the code without docstrings, can we use the code that we have seen by the model?(if we do so, what are the problems) Thanks a lot.

It was only to demonstrate that you can find code even if the code doesn't contain a docstring.
We wanted to emphasize that you can build a representation of code in the same space as natural language even if the code doesn't contain natural language such as comments or docstrings.

In practice, if we are trying to build a real semantic search system we would definitely find another way (perhaps using a separate encoder) to learn a representation of the docstring as well and have that be considered as well as the code representation.