Refactoring of Code, Proper Folder Structure and Add new codes uptill progress

Need some initial motivation from here: https://github.com/YassineYousfi/comma10k-baseline
Also need to look how the robust model performs on these datasets:

Jimut Bahan Pal · Answer 4 · Sun Mar 14 2021 22:01:21 GMT+0800 (China Standard Time)

Hey @heraldofsolace I forgot about this task... is this done?
https://colab.research.google.com/drive/16LMsoN0y__pAEVcrqzIb9bJ8y2Fogv7_?usp=sharing

Problem: Classification Model with the following specifications

Task:

Use data generator from Keras to do data augmentation with a batch size of 24, probably in Google Colab for now.
Make sure to find/ check the data distribution (using histograms from matplotlib) first, and if there is any class imbalance and stuff, then we have to take necessary actions, and ask for help.
Split the data to 80-10-10 (80% training, 10% validation and 10% test).
Record Precision, Recall, Accuracy, Loss, F1 score on datasets, i.e., training, validation and test sets. This can be done using scikit learn.
Plot all the graphs obtained and display the same.
Make a confusion matrix plot it using Matplotlib

TODO

K-Fold cross validation

Aniket Bhattacharyea · Answer 5 · Sun Mar 14 2021 23:46:40 GMT+0800 (China Standard Time)

Working on it but I feel dumb lol

Jimut Bahan Pal · Answer 6 · Mon Mar 15 2021 01:12:50 GMT+0800 (China Standard Time)

Lol Tamal Mj's Laptop's CUDA just messed up for some reason, I need to fix that tomorrow... Let's see. Studying few papers today.

Aniket Bhattacharyea · Answer 7 · Mon Mar 15 2021 01:25:49 GMT+0800 (China Standard Time)

Lol. I still haven't gotten my room back.

Aniket Bhattacharyea · Answer 8 · Mon Mar 15 2021 13:27:17 GMT+0800 (China Standard Time)

In the set "ig" has 4 different types of things "IG", "MMY", "MY", "PMY" and Neutrphil has 3 different types "BNE", "SNE" and "Neutrophil"

Aniket Bhattacharyea · Answer 9 · Mon Mar 15 2021 13:27:42 GMT+0800 (China Standard Time)

Aniket Bhattacharyea commented 3 years ago

Jimut Bahan Pal · Answer 10 · Mon Mar 15 2021 13:50:38 GMT+0800 (China Standard Time)

I have seen that before, thought that 8 classes will be good, but eventually we need them too. What would be better? Having a classifier to screen the first 8 classes first and then from that we will use different screening techniques? Or what do you suggest?

This is the cell glossary we will get from the slides

Blast (Bl)
Promyelocyte (PM)
Myelocyte (My)
Metamyelocyte (Me)
Band form  (Band) (Not a cell)
Neutrophil (N)
Eosinophil (Eo)
Basophil (Ba)
Lymphocyte (L)
Monocyte (Mo)
Nucleated RBC (NRBC)

Jimut Bahan Pal · Answer 11 · Mon Mar 15 2021 13:51:26 GMT+0800 (China Standard Time)

Jimut Bahan Pal commented 3 years ago

Aniket Bhattacharyea · Answer 12 · Mon Mar 15 2021 14:05:38 GMT+0800 (China Standard Time)

I was thinking, maybe take them all as separate classes and separately augment them until they reach 2000 samples or something and combine into one dataset.

Jimut Bahan Pal · Answer 13 · Mon Mar 15 2021 14:09:43 GMT+0800 (China Standard Time)

So let's do that then. What techniques will you use?

color jittering
flips
zoom
blurring(?)
maybe noise (?)
maybe (small) random crop of some image into a particular image
rotation
affine (?)

I was even thinking of doing an ensemble with polar transformation and FFT, but that will be too much for now

Aniket Bhattacharyea · Answer 14 · Mon Mar 15 2021 14:14:46 GMT+0800 (China Standard Time)

Let's first write the baseline code and then we'll add more augmentation as needed.

Jimut Bahan Pal · Answer 15 · Mon Mar 15 2021 14:17:27 GMT+0800 (China Standard Time)

https://colab.research.google.com/drive/16LMsoN0y__pAEVcrqzIb9bJ8y2Fogv7_?usp=sharing
A very simple skeleton is here, but it needs to be modified a lot

Aniket Bhattacharyea · Answer 16 · Mon Mar 15 2021 15:22:06 GMT+0800 (China Standard Time)

So, the generator doesn't actually balance the classes. Using the generator we can't control the number of samples in classes. So we have two options -

Accept the imbalance and set the model weight accordingly instead with augmentation.
Use a dummy training run with augmentation to generate augmented image and create a larger dataset.

I think the 2nd approach would not be good as there is huge imbalance and some of the classes would just be full of slightly different images.

Jimut Bahan Pal · Answer 17 · Mon Mar 15 2021 16:00:42 GMT+0800 (China Standard Time)

Then go with the first option. Sent from Yahoo Mail on Android On Mon, 15 Mar 2021 at 12:52 pm, Aniket ***@***.***> wrote: So, the generator doesn't actually balance the classes. Using the generator we can't control the number of samples in classes. So we have two options - - Accept the imbalance and set the model weight accordingly instead with augmentation. - Use a dummy training run with augmentation to generate augmented image and create a larger dataset. I think the 2nd approach would not be good as there is huge imbalance and some of the classes would just be full of slightly different images. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.

Jimut Bahan Pal · Answer 18 · Mon Mar 15 2021 20:29:08 GMT+0800 (China Standard Time)

Aniket, If you have some scripts to run related to the project (say 100 epochs), you can give me.

Aniket Bhattacharyea · Answer 19 · Mon Mar 15 2021 23:18:17 GMT+0800 (China Standard Time)

https://colab.research.google.com/drive/1s1WgkH0DG2BPHJnG11UTSBLRz0h1CGms?usp=sharing

It predicts everything as BNE lol

Jimut Bahan Pal · Answer 20 · Mon Mar 15 2021 23:20:53 GMT+0800 (China Standard Time)

def Model_V2_Gradcam(H,W,C):

    input_layer = tf.keras.Input(shape=(H, W, C))
    x_1 = tf.keras.layers.Conv2D(16, 3, activation='relu', strides=(1, 1), name="conv_16_1", padding='same', kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(input_layer)
    x_2 = tf.keras.layers.Conv2D(16, 3, activation='relu', strides=(1, 1), name="conv_16_2", padding='same', kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(x_1)
    # x_4 = tf.keras.layers.Conv2D(16, 3, activation='relu', strides=(1, 1), name="conv_64_21", padding='same')(add([x_3,x_1]))
    x_3 = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool3")(x_2)
    x_4 = tf.keras.layers.Conv2D(32, 3, activation='relu', strides=(1, 1), name="conv_32_1", padding='same', kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(x_3)
    x_5 = tf.keras.layers.Conv2D(32, 3, activation='relu', strides=(1, 1), name="conv_32_2", padding='same', kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(x_4)

    x_6 = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool4")(x_5)
    x_7 = tf.keras.layers.Conv2D(64, 3, activation='relu', strides=(1, 1), name="conv_64_1", padding='same', kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(x_6)
    x_8 = tf.keras.layers.Conv2D(64, 3, activation='relu', strides=(1, 1), name="conv_64_2", padding='same', kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(x_7)
    x = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool5")(x_8)
    x = tf.keras.layers.Conv2D(64, 3, activation='relu', strides=(2, 2), name="conv_64_3", kernel_initializer = 'he_normal', kernel_regularizer=l2(1e-4))(x)
    x = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool6")(x)
    x = tf.keras.layers.Flatten(name="flatten")(x)
    x = tf.keras.layers.Dropout(0.15, name="dropout_3")(x)
    x = tf.keras.layers.Dense(256, activation='relu', name="dense_64")(x)
    x = tf.keras.layers.Dense(N_LABELS, activation='softmax', name="output_layer")(x)

    model = tf.keras.models.Model(inputs=input_layer, outputs=x)
    return model

model = Model_V2_Gradcam(H=360, W=360, C=3)

model.compile(optimizer='adam', loss='categorical_crossentropy',
            metrics= ['accuracy'])
model.summary()

Jimut Bahan Pal · Answer 21 · Mon Mar 15 2021 23:22:11 GMT+0800 (China Standard Time)

I told you it needs multiple screening models, one model needs to be too deep to understand the minor variations between the classes, so ends up doing nothing

Aniket Bhattacharyea · Answer 22 · Mon Mar 15 2021 23:31:51 GMT+0800 (China Standard Time)

Why? LOL

Well, it's your model

Jimut Bahan Pal · Answer 23 · Tue Mar 16 2021 00:14:10 GMT+0800 (China Standard Time)

Oh, so it won't obey you 🌚

Jimut Bahan Pal · Answer 24 · Tue Mar 16 2021 13:56:50 GMT+0800 (China Standard Time)

https://www.kaggle.com/paultimothymooney/identify-blood-cell-subtypes-from-images

Jimut Bahan Pal · Answer 25 · Thu Mar 18 2021 14:35:54 GMT+0800 (China Standard Time)

Why? LOL

Well, it's your model

Looks like the primitive one should work better. Not tested though.

def Model_V1_Gradcam(H,W,C):
    input_layer = tf.keras.Input(shape=(H, W, C))
    x = tf.keras.layers.Conv2D(32, 3, activation='relu', strides=(2, 2), name="conv_32")(input_layer)
    x = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool1")(x)
    x = tf.keras.layers.Conv2D(64, 3, activation='relu', strides=(2, 2), name="conv_64")(x)
    x = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool2")(x)
    x = tf.keras.layers.Conv2D(64, 3, activation='relu', strides=(2, 2), name="conv_64_2")(x)
    x = tf.keras.layers.MaxPooling2D((2, 2), name="max_pool3")(x)
    
    x = tf.keras.layers.Flatten(name="flatten")(x)
    x = tf.keras.layers.Dense(512, activation='relu', name="dense_512")(x)
    x = tf.keras.layers.Dropout(0.5, name="dropout_1")(x)
    x = tf.keras.layers.Dense(512, activation='relu', name="dense_256")(x)
    x = tf.keras.layers.Dropout(0.5, name="dropout_2")(x)
    x = tf.keras.layers.Dense(128, activation='relu', name="dense_64")(x)
    x = tf.keras.layers.Dropout(0.5, name="dropout_3")(x)
    
    x = tf.keras.layers.Dense(N_LABELS, activation='softmax', name="output_layer")(x)
    #x = tf.keras.layers.Reshape((1, N_LABELS))(x)
    
    model = tf.keras.models.Model(inputs=input_layer, outputs=x)
    return model

model = Model_V1_Gradcam(H=360, W=360, C=3)

model.compile(optimizer='adam', loss='categorical_crossentropy',
            metrics= ['accuracy'])
model.summary()