guxd / deep-code-search

DeepCS: Deep Code Search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem loading the Data

dudany opened this issue · comments

Hi, i've been trying to activate the full PyTorch model, but i had issue with the data.
I loaded the whole data you provided and added it to the signed location path, then after the train.py, i was running those commands and i got those errors, i hope you could help me out with them:

!python repr_code.py --model JointEmbeder --reload_from 340000

NumExpr defaulting to 2 threads.
Constructing Model..
loading data...
tcmalloc: large alloc 1116725248 bytes == 0xe93ea000 @  0x7f87e6b8d1e7 0x7f87e46f35e1 0x7f87e475a420 0x7f87e47e7f87 0x50a7f5 0x50c1f4 0x507f24 0x509277 0x594b01 0x54a17f 0x5517c1 0x5a9eec 0x50a783 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f87e678ab97 0x5b26fa
tcmalloc: large alloc 1365450752 bytes == 0x140276000 @  0x7f87e6b8d1e7 0x7f87e46f35e1 0x7f87e475a420 0x7f87e47e7f87 0x50a7f5 0x50c1f4 0x507f24 0x509277 0x594b01 0x54a17f 0x5517c1 0x5a9eec 0x50a783 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f87e678ab97 0x5b26fa
16262602 entries
 12% 199/1627 [03:24<25:10,  1.06s/it]tcmalloc: large alloc 4096000000 bytes == 0x7f83218e0000 @  0x7f87e6b8d1e7 0x7f87e46f35e1 0x7f87e4757c78 0x7f87e4757d93 0x7f87e47f5ea8 0x7f87e47f6704 0x7f87e47f6852 0x567193 0x59fe1e 0x7f87e47434ed 0x50a47f 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x588e91 0x59fe1e 0x7f87e47434ed 0x50a47f 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24
^C

!python search.py --model JointEmbeder --reload_from 340000

NumExpr defaulting to 2 threads.
Constructing Model..
Loading codebase (chunk size=2000000)..
Traceback (most recent call last):
  File "search.py", line 137, in <module>
    "inconsistent number of chunks, check whether the specified files for codebase and code vectors are correct!"    
AssertionError: inconsistent number of chunks, check whether the specified files for codebase and code vectors are correct!

The first error seems to be the root cause. Probably because your machine has a small memory to store temporary code vectors. You can try to reduce the chunk size, for example, from 2,000,000 to 200,000.