In Part 4, vectorize all of the code without docstrings.
zhaoyiCC opened this issue · comments
I am wondering the reasons why we need to vectorize all of the code without docstrings, can we use the code that we have seen by the model?(if we do so, what are the problems) Thanks a lot.
It was only to demonstrate that you can find code even if the code doesn't contain a docstring.
We wanted to emphasize that you can build a representation of code in the same space as natural language even if the code doesn't contain natural language such as comments or docstrings.
In practice, if we are trying to build a real semantic search system we would definitely find another way (perhaps using a separate encoder) to learn a representation of the docstring as well and have that be considered as well as the code representation.