bblfsh / bblfshd

A self-hosted server for source code parsing

Home Page:https://doc.bblf.sh

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large-scale parse 2

dennwc opened this issue · comments

Umbrella issue for tracking failures from the recent large-scale parse.

Supersedes #268

Version information
Build information
  commit: v2.13.0
  date: 2019-05-03T14:07:46+0000
vadim@typos-1:~$ docker exec -it bblfshd bblfshctl driver list
+------------+------------------------------------------+---------+--------+----------+-------------+------------------------------+
|  LANGUAGE  |                  IMAGE                   | VERSION | STATUS | CREATED  |     GO      |            NATIVE            |
+------------+------------------------------------------+---------+--------+----------+-------------+------------------------------+
| python     | docker://bblfsh/python-driver:latest     | v2.9.0  | beta   | 2 months | 1.10-alpine | python:3.6-alpine            |
| cpp        | docker://bblfsh/cpp-driver:latest        | v1.3.0  | beta   | 2 months | 1.10-alpine | openjdk:8-jre-alpine         |
| java       | docker://bblfsh/java-driver:latest       | v2.7.0  | beta   | 2 months | 1.10-alpine | openjdk:8-jre-alpine         |
| javascript | docker://bblfsh/javascript-driver:latest | v2.8.0  | beta   | 2 months | 1.10-alpine | node:8-alpine                |
| typescript | docker://bblfsh/typescript-driver:latest | v0.8.0  | alpha  | 2 months | 1.10-alpine | node:8-alpine                |
| bash       | docker://bblfsh/bash-driver:latest       | v2.6.0  | beta   | 2 months | 1.10-alpine | openjdk:8-jre-alpine         |
| ruby       | docker://bblfsh/ruby-driver:latest       | v2.9.2  | beta   | 2 months | 1.10-alpine | ruby:2.4-alpine3.7           |
| go         | docker://bblfsh/go-driver:latest         | v2.6.0  | beta   | 2 months | 1.10-alpine | alpine:3.7                   |
| csharp     | docker://bblfsh/csharp-driver:latest     | v1.5.0  | beta   | 2 months | 1.10        | microsoft/dotnet:2.1-runtime |
| php        | docker://bblfsh/php-driver:latest        | v2.8.0  | beta   | 2 months | 1.10-alpine | php:7-alpine3.6              |
+------------+------------------------------------------+---------+--------+----------+-------------+------------------------------+
Response time 28.051674ms

Logs: log 1, log 2

Parsed: log 1, log 2 (parser)

Top unique errors: results (tool)

OK, since we now have a list of unique errors, someone should look through them and further de-duplicate related ones.

Error messages might have a similar cause, for example, missing a specific AST field or having an extra one.

A new issue should be created for each unique error type in the corresponding driver repo. Please specify the error message (or a few similar ones) and the file names that triggered it (included in the output).

I can assign this to myself, but I am more focused in understanding drivers right now, so how urgent is this?

Not urgent, but it may uncover more issues.

Currently, both I and @bzz are busy so we can't take any of those issues even if discovered, but there may be new ones for you guys to work on.

Also note that this batch job was running an older bblfshd version, so some of the issues may be resolved already.

Just to leave a record of how I processed the files.

I scripted a tool to group errors by:

  • Language (using enry and intersecting with the list of drivers we have available -> there is no point on saying an error came from the Hack language, for example).
  • Type of error (just by manually inspecting files and coming with creative ways of clustering them).

The tool needs the folder per-language to be created in the same directory. And it does not care about error handling and it is kind of spaghetti code. Also for the c++ part, due to the huge number of errors, it creates a separate file bulk-errors-c++ with a lot of errors of type unused field(s) on node ... already processed and ready to copy-paste into an issue in Github.

To manually inspect errors for language lang, I used (inside the per-language folder) the following beautify-output.sh script and the commands:

(head -n2 ./errors-lang | ../beautify-output.sh) && sed -i 1,2d ./errors-lang

@ncordon Thanks for parsing it!

Since we now have separate issues for each unique error, we can safely close this one. Overall progress on these issues can still be tracked here since Github displays the status of all linked issues.