tensorfreitas / Siamese-Networks-for-One-Shot-Learning

Implementation of Siamese Neural Networks for One-shot Image Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training loss does not change in the first 700 iterations

DanielNehemiah opened this issue · comments

Hey,

I have started to train the network using the code in this repo.
I see that the training accuracy has not gone above 0.65 and is mostly revolving around 0.45-0.52
in the first 700 iterations. Is this normal? the loss is also changing very minutely revolving around 5.1

Thanks for this code!

image

Hi!

Let it run more iterations.

In the original code, the validation is evaluated every 1000 iterations. Se after 3000 / 5000 iterations if the loss keeps constant or not.

Let me know how it goes.

@DanielNehemiah There's a stopping condition in the training function , so let it run as long as the validation error between 2 sets i.e 1000 iterations does not improve and the training ends on its own

I've tried for several times, and it always stopped after 10000 iterations, which means that the model cannot converge..

@JacksonLaw577 Could you give more details?

When I ran your original code, this error occurred. So I modified self.lr into self.lr0. I don't know if this is the reason.
微信图片_20201124142956

@JacksonLaw577 Are you running the exact same dataset? I developed this on an older version of Keras and Tensorflow. Haven't been able to retry on the newer versions.

/home/user/anaconda3/envs/tensorflow-gpu/bin/python3.6 "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py"
2021-02-20 17:56:39.594927: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
2021-02-20 17:56:40.525094: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-20 17:56:40.525605: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-02-20 17:56:40.553530: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-20 17:56:40.554140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 14 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 178.84GiB/s
2021-02-20 17:56:40.554175: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-02-20 17:56:40.558338: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-02-20 17:56:40.558412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-02-20 17:56:40.559689: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-02-20 17:56:40.559968: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-02-20 17:56:40.560085: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-02-20 17:56:40.560968: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-02-20 17:56:40.561058: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-02-20 17:56:40.561073: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-02-20 17:56:40.561376: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-20 17:56:40.562933: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-20 17:56:40.562966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-20 17:56:40.562976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2021-02-20 17:56:40.606270: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 150994944 exceeds 10% of free system memory.
2021-02-20 17:56:40.623839: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 150994944 exceeds 10% of free system memory.
2021-02-20 17:56:40.641658: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 150994944 exceeds 10% of free system memory.
['Alphabet_of_the_Magi', 'Tifinagh', 'Gujarati', 'Syriac_(Estrangelo)', 'Futurama', 'Early_Aramaic', 'Latin', 'Japanese_(hiragana)', 'Grantha', 'Sanskrit', 'Greek', 'Burmese_(Myanmar)', 'Mkhedruli_(Georgian)', 'Asomtavruli_(Georgian)', 'Anglo-Saxon_Futhorc', 'Arcadian', 'Balinese', 'Japanese_(katakana)', 'Blackfoot_(Canadian_Aboriginal_Syllabics)', 'Tagalog', 'Armenian', 'Inuktitut_(Canadian_Aboriginal_Syllabics)', 'Korean', 'Bengali', 'Ojibwe_(Canadian_Aboriginal_Syllabics)', 'Cyrillic', 'Braille', 'N_Ko', 'Hebrew', 'Malay_(Jawi_-_Arabic)']
30
Traceback (most recent call last):
File "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py", line 58, in
main()
File "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/train_siamese_network.py", line 46, in main
model_name='siamese_net_lr10e-4')
File "/home/user/ShadowCreative/Siamese network/Siamese-Networks-for-One-Shot-Learning/siamese_network.py", line 235, in train_siamese_network
train_loss, train_accuracy = self.model.train_on_batch(images, labels)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1727, in train_on_batch
logs = self.train_function(iterator)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
    return step_function(self, iterator)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
    return fn(*args, **kwargs)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:788 run_step  **
    outputs = model.train_step(data)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:757 train_step
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:497 minimize
    loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
/home/user/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:547 _compute_gradients
    with ops.name_scope_v2(self._name + "/gradients"):

TypeError: unsupported operand type(s) for +: 'Modified_SGD' and 'str'

Process finished with exit code 1
#################################################
HI, I met this problems, can you help me look?

Answered in #14