guxd / deep-code-search

DeepCS: Deep Code Search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pretrained model ETA

saicharishmavalluri opened this issue · comments

Hello @guxd,

When I test the tool using a pre-trained model, it shows me an ETA of 25 hrs.
Is this the right way, or am I missing something?
I am wondering if there is any other way to test it quickly. Can someone help me?
I am using google colaboratory.
Thank you in advance for your reply!

pretrainedmodel

Seems that the platform restricts a large allocation of memory?
You can try to reduce the batch size (e.g., from 10,000 to 1,000).

Hello @guxd

I am trying to run the pre-trained model but I am not able to search. It gives me the below error.
Screen Shot 2021-10-19 at 6 34 15 PM

Hello @saicharishmavalluri How did you manage to use it the google collab? CAn you share that?

Thanks

Hi @samvaid

I haven't used the pre-trained model since it is taking me 25 hrs to embed it.
Instead, I trained the normal model in Keras by decreasing the number of epochs to 2.
Also, how did you manage to run it for 44 hours without any hurdles?

@saicharishmavalluri Did you train the Keras notebook on colab? How did you do that?
I did run it locally but eventually ran into some problem

@saicharishmavalluri Did you train the Keras notebook on colab? How did you do that?
I did run it locally but eventually ran into some problem

@samvaid
I cloned the entire deep code search code into my google drive and followed the instructions in the readme file of Keras folder

@saicharishmavalluri how much time did it take you to do that? And did it work fine?

@saicharishmavalluri how much time did it take you to do that? And did it work fine?

@samvaid
For 2 epochs it took me around 1 hour of time to train the model then the code embedding and search didn't take much time.
Initially, I tried changing the files in the data/github folder with the real dataset and training the model.
Training the model took around 1 hour for 2 epochs but when coming to code embedding, my google colab got crashed because the memory is full.
So, later I hadn't changed any files in the data/github folder and tried. It worked fine for me.

@samvaid Keras and Pytorch use different data folders in Google Drive. Make sure that you have downloaded train.methname.h5 from Google Drive.

@guxd I am trying to use the pretrained model without copying the dataset from google drive.

@saicharishmavalluri I am facing trouble while runnin the keras code in colab. I am using the exact same package versions. Can you share your notebook with me?

@samvaid
code_search_keras-2.ipynb.zip

@saicharishmavalluri
Hi. What python version do you have in colab? When I run your notebook, I seem to get an error. My python version is Python 3.7.12

Did you downgrade your colab to python3.6?

@saicharishmavalluri 2021-10-21 19:22:05,570: models: INFO: compiling models Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 42, in load_model assert os.path.exists(model_path + f"epo{epoch}_code.h5"),f"Weights at epoch {epoch} not found" AssertionError: Weights at epoch 500 not found

I am getting the above error when I am trying to run the below code. I am changing the reload value to 500 #change configs.py file reload value to 500 !python main.py --mode repr_code

@samvaid
sorry for the confusion.
The value to the reload parameter should be your last epoch number. In my case, since I trained for 2 epochs (0,1) my reload parameter value will be 1.
Also please let me know if it works for you, in my case the cell is getting terminated because of memory full.

@saicharishmavalluri 2021-10-21 19:22:05,570: models: INFO: compiling models Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 42, in load_model assert os.path.exists(model_path + f"epo{epoch}_code.h5"),f"Weights at epoch {epoch} not found" AssertionError: Weights at epoch 500 not found
I am getting the above error when I am trying to run the below code. I am changing the reload value to 500 #change configs.py file reload value to 500 !python main.py --mode repr_code

@samvaid sorry for the confusion. The value to the reload parameter should be your last epoch number. In my case, since I trained for 2 epochs (0,1) my reload parameter value will be 1. Also please let me know if it works for you, in my case the cell is getting terminated because of memory full.

@saicharishmavalluri
'batch_size': 128,
'chunk_size':100000,
'nb_epoch': 2,
'validation_split': 0.2,
'optimizer': 'adam',
#'optimizer': Adam(clip_norm=0.1),
'valid_every': 5,
'n_eval': 100,
'evaluate_all_threshold': {
'mode': 'all',
'top1': 0.4,
},
'save_every': 10,
'reload':-1,

Are above your configuration in the config file when you are training the model?

And then when you are running !python main.py --mode repr_code you just change reload: 1?

@samvaid
Yes

@saicharishmavalluri 2021-10-21 19:22:05,570: models: INFO: compiling models Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 42, in load_model assert os.path.exists(model_path + f"epo{epoch}_code.h5"),f"Weights at epoch {epoch} not found" AssertionError: Weights at epoch 500 not found
I am getting the above error when I am trying to run the below code. I am changing the reload value to 500 #change configs.py file reload value to 500 !python main.py --mode repr_code

@samvaid sorry for the confusion. The value to the reload parameter should be your last epoch number. In my case, since I trained for 2 epochs (0,1) my reload parameter value will be 1. Also please let me know if it works for you, in my case the cell is getting terminated because of memory full.

@saicharishmavalluri
'batch_size': 128,
'chunk_size':100000,
'nb_epoch': 2,
'validation_split': 0.2,
'optimizer': 'adam',
#'optimizer': Adam(clip_norm=0.1),
'valid_every': 5,
'n_eval': 100,
'evaluate_all_threshold': {
'mode': 'all',
'top1': 0.4,
},
'save_every': 10,
'reload':-1,

Are above your configuration in the config file when you are training the model?
And then when you are running !python main.py --mode repr_code you just change reload: 1?

@samvaid Yes

@saicharishmavalluri I am getting the below error

File "main.py", line 258, in model.compile(optimizer=optimizer) File "/content/deep-code-search/keras/deep-code-search/keras/models.py", line 202, in compile self._code_repr_model.compile(loss='cosine_proximity', optimizer=optimizer, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 336, in compile self.loss, self.output_names) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 1351, in prepare_loss_functions loss_functions = [get_loss_function(loss) for _ in output_names] File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 1351, in loss_functions = [get_loss_function(loss) for _ in output_names] File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 1087, in get_loss_function loss_fn = losses.get(loss) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/losses.py", line 1183, in get return deserialize(identifier) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/losses.py", line 1174, in deserialize printable_module_name='loss function') File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/generic_utils.py", line 210, in deserialize_keras_object raise ValueError('Unknown ' + printable_module_name + ':' + object_name) ValueError: Unknown loss function:cosine_proximity

@samvaid
You need to change 'cosine_proximity'-> 'cosine_similarity' in
/content/deep-code-search/keras/deep-code-search/keras/models.py", line 202
Also you should change the 'reload' value once you complete training the model.

@saicharishmavalluri The model s getting trained. but when I am running #change configs.py file reload value to 1 !python main.py --mode repr_code

Below is the error

Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 42, in load_model assert os.path.exists(model_path + f"epo{epoch}_code.h5"),f"Weights at epoch {epoch} not found" AssertionError: Weights at epoch 1 not found

@samvaid
For how many epochs did you train your model?

@saicharishmavalluri The model s getting trained. but when I am running #change configs.py file reload value to 1 !python main.py --mode repr_code
Below is the error
Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 42, in load_model assert os.path.exists(model_path + f"epo{epoch}_code.h5"),f"Weights at epoch {epoch} not found" AssertionError: Weights at epoch 1 not found

@samvaid For how many epochs did you train your model?

@saicharishmavalluri 2 epochs

Screen Shot 2021-10-21 at 6 46 08 PM

@samvaid
I am sorry, I missed another point.
in the configs.py file you need to change 'save_every' to 1 instead of 10 before training the model.
By default the weights will be saved for every 10 epochs. so if you change it to 1, it will be saved for every epoch.

@saicharishmavalluri The model s getting trained. but when I am running #change configs.py file reload value to 1 !python main.py --mode repr_code
Below is the error
Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 42, in load_model assert os.path.exists(model_path + f"epo{epoch}_code.h5"),f"Weights at epoch {epoch} not found" AssertionError: Weights at epoch 1 not found

@samvaid For how many epochs did you train your model?

@saicharishmavalluri 2 epochs
Screen Shot 2021-10-21 at 6 46 08 PM

@samvaid I am sorry, I missed another point. in the configs.py file you need to change 'save_every' to 1 instead of 10 before training the model. By default the weights will be saved for every 10 epochs. so if you change it to 1, it will be saved for every epoch.

@saicharishmavalluri

after training with the save_every=1 and running !python main.py --mode repr_code, I get this error Traceback (most recent call last): File "main.py", line 272, in engine.load_model(model, config['training_params']['reload']) File "main.py", line 44, in load_model model.load(model_path + f"epo{epoch}_code.h5", model_path + f"epo{epoch}_desc.h5") File "/content/deep-code-search/keras/deep-code-search/keras/models.py", line 230, in load self._code_repr_model.load_weights(code_model_file, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 181, in load_weights return super(Model, self).load_weights(filepath, by_name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1177, in load_weights saving.load_weights_from_hdf5_group(f, self.layers) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 651, in load_weights_from_hdf5_group original_keras_version = f.attrs['keras_version'].decode('utf8') AttributeError: 'str' object has no attribute 'decode'

I remember getting this error. But I don't remember what I did to make it work. Maybe try training the model again or for another epoch.

@saicharishmavalluri Did you do some other changes as well?

No these are the only changes I made.