Summary

The following repository includes my scripts, experiments, and notes that document my progress through the IceCube Neutrinos in Deep Ice challenge, which can be found on Kaggle here. This work resulted in 2nd place the final repo can be found here

EXP_NAME	SCORE	DESCRIPTION	SCRIPT	TRN_SET
EXP_00	`1.182`	The initial trial involves using six blocks of Transformer Encoder with a limitation on the number of events to 100	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF`	`(1, 600)`
EXP_01	`1.169`	The initial test involves using six blocks of Transformer Encoder with a limitation of 100 events. However, I am implementing pooling that is based on a mask.	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V1`	`(1, 600)`
EXP_02	`1.144`	Baseline experiment, `6` Blocks of Transformer Encoder, number of events is restricted to `100` , pooling on `mask` with `logLosh` loss	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V2`	`(1, 600)`
EXP_03	`nan`	Baseline experiment, `6` Blocks of Transformer Encoder, number of events is restricted to `100` , pooling on `mask` with `logLosh` loss, in addition, no `log10` normalization for the charge. `sensor_id` now have there own learnable embeddings with `dim=128`, `x`, `y` and `z` are normalized between `0` and `1`, time also normalized between `1` and `0`, added `weighted` feature based on time (total `event` features are now `14`). ref: https://www.kaggle.com/code/roberthatch/lb-1-183-lightning-fast-baseline-with-polars	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_EMBED_V0`	`(1, 600)`
EXP_04	`1.216` -> `nan`	Baseline experiment, `6` Blocks of Transformer Encoder, number of events is restricted to `100` , pooling on `mask` with `logLosh` loss, in addition, no `log10` normalization for the charge. `sensor_id` now have there own learnable embeddings with `dim=128`, `x`, `y` and `z` are normalized between `0` and `1`, time also normalized between `1` and `0`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_EMBED_V1`	`(1, 600)`
EXP_05	`1.140`	Baseline experiment, `6` Blocks of Transformer Encoder, number of events is restricted to `100` , pooling on `mask` with `logLosh` loss, in addition, `log10` normalization for the charge. `sensor_id` now have there own learnable embeddings with `dim=128`, `x`, `y` and `z` are normalized between `0` and `1`, time also normalized between `1` and `0`. This time i did not add `padding_index == 0` to `nn.Embedding` and also removed `post_norma(embed)` normalization of `embeddings`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_EMBED_V2`	`(1, 600)`
EXP_06	`1.286`	In this experiment, I am trying to use graph `Transformer`, which takes in to account `adjacent matrix` and `distance_matrix`. `adjacent matrix` is calculated by taking `sensor_id` which are `0.015` away from each other (`note`: this might needs to be tuned). `log10` normalization for the charge, `time` is normalized between `1` and `0`, `x`, `y` and `z` are normalized between `0` and `1`. As per usual i restricted to `100` rows per `event`. `6` blocks of `encoders` with `dim=128` and out `2`. note: `Lg=0.5` - weight of adjacent matrix, `Ld=0.5` - weight of distance matrix. Need to optmized, Pooling right now is not done on `mask` but on `x = x.mean(dim=-2)` -> needs to be optmized. Either by testing on `mask`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name MATGRAPH`	`(1, 100)`
EXP_07	`1.275`	same as `EXP_06` but `pooling` is done now on `mask` using `MeanPoolingWithMask`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name MATGRAPHV2`	`(1, 100)`
EXP_08	`~ 1.275`	same as `EXP_06` but `pooling` is done now on `mask` using `MeanPoolingWithMask` , `x`, `y` and `z` are normalized by dividing by `500`, `time` is normalized `(event['time'] - 1.0e04) / 3.0e4` and `charge` `np.log10(event["charge"])/3.0` and `adjacent matrix` is calculated by taking `sensor_id` which are `0.05` away from each other	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name MATGRAPHV3`	`(1, 100)`
EXP_09	`1.177`	same as `EXP_05` but with `(1, 100)` just to have benchmark for small data training	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_EMBED_V3`	`(1, 100)`
EXP_10	`NG`	same as `EXP_09` but with `SigmoidRange`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_EMBED_V4`	`(1, 100)`
EXP_11	`SAME`	same as `EXP_09` but extended `max_events` to `160` from `100`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_EMBED_V5`	`(1, 100)`
EXP_12	`1.170`	same as `EXP_02` , but event are restricted to `128`, they are selected based on `pulse` and `light_speed`, pooling is performed using `mask`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V3`	`(1, 100)`
EXP_13	`1.143`	This experiment I am using again transformer `encoder` with `6` layers, pooling on `mask`, normalization is performed in following way. For `xyz` we divide by `500` for charge its `log10` and for time its `(event["time"].values - 1.0e04) / 3.0e4`, added additional features; `qe` outer layer of the icecube and added `ice_scattering`. The dataset is filtered using `light speed` travel distance if it exceed more then `128` rows	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V4`	`(1, 100)`
EXP_14	`1.142`	same as `EXP_13` but with `mean` and `max` masked `pool` concataneted	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V5`	`(1, 100)`
EXP_16	`1.043`	This experiment I am using again transformer encoder with `6` layers, pooling on mask, normalization is performed in following way. For `xyz` we divide by `500` for charge its `log10` and for time its `(event["time"].values - 1.0e04) / 3.0e4`, added additional features; `qe` outer layer of the icecube and added ice_scattering. The dataset is filtered using light speed travel distance if it exceed more then `128` rows, double pooling (`mean` and `max`) using `mask`, loss function `VonMisesFisher3DLoss`	`!CUDA_VISIBLE_DEVICES=1 python train.py --config_name BASELINE_HF_V7`	`(1, 100)`
EXP_17	`1.017`	same as `EXP_16` but finetuning using weights from `EXP_16` and increased `max_event=148`	`!CUDA_VISIBLE_DEVICES=1 python train.py --config_name BASELINE_HF_V8`	`(1, 100)`
EXP_18	`1.008`/LB: `1.006`	same as `EXP_17` but finetuning using weights from `EXP_17` and with full dataset	`!CUDA_VISIBLE_DEVICES=1 python train.py --config_name BASELINE_HF_V8FT`	`(1, 600)`
EXP_19	`~1.043`	same as `EXP_16` but finetuning using weights from `EXP_16` added `absorption` as feature, `max_events==148`, in total `9` features, models now `pools` on `mean`, `max`, `min`, based on the mask, added `ae` like layer in between the pooling	`!CUDA_VISIBLE_DEVICES=1 python train.py --config_name BASELINE_HF_V9`	`(1, 100)`
EXP_20	`1.024`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `CosineSimilarityLoss`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V10`	`(1, 100)`
EXP_21	`1.025`	got NaN at some point, but loss was still better same as `EXP_16` but using `VonMisesFisher3DLoss` and `EucLadianDistanceLoss`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V11`	`(1, 100)`
EXP_22	`ES`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `EucLadianDistanceLoss` and `CosineSimilarityLoss`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V11`	`(1, 100)`
EXP_23	`1.02`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `GraphNet`, the `KNN` grouping is performed using `xyzt` and max_events are `196`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V0`	`(1, 100)`
EXP_24	`1.016`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `CosineSimilarityLoss`, transformer `encoder` `8` layers, added `rotatry_emb` and `ff_glu`, and `post_emb_normalazation`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V13`	`(1, 100)`
EXP_24_CLS	`1.023`	same as `EXP_24` but using `VonMisesFisher3DLoss` and doing pooling `cls` token, the results is slightly worse, but this is expected for small trnasformer	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V13_CLS`	`cycle`
EXP_24_FT	`1.005`	same as `EXP_24` but using `VonMisesFisher3DLoss` and `CosineSimilarityLoss`, and `FT` at `fp32` due to `NaN`s	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V14`	`cycle`
EXP_24_FT_2	`1.0014`	same as `EXP_24` but using `VonMisesFisher3DLoss` and `CosineSimilarityLoss`. `EuclidDistance`, and `FT` at `fp32` due the lenth of the sequnce is `148`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V14_FT`	`cycle`
EXP_25	`1.017`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `CosineSimilarityLoss` and `GraphNet`, the `KNN` grouping is performed using `xyz` and max_events are `196`, the first grouping in dataloder is done using `xyzt`, i am cycling thru training (going thru all batches)	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V1`	`(1, 100)`
EXP_25_FT	`0.999`	same as `EXP_25` but finetuning	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V1_FT`	`(1, 100)`
EXP_26	`1.037`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `CosineSimilarity` and `EGNNmodel`, the `KNN` grouping is performed using `xyzt` with `8` neighbors, and max_events are `196`, i am using `5` layer, aggregation type `sum`, embedding `dim = 128`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V2`	`cycle`
EXP_27	`1.535`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `CosineSimilarity` and `EGNNmodelV1`, the `KNN` grouping is performed using `xyzt` with `8` neighbors, and max_events are `196`, i am using `5` layer, aggregation type `sum`, embedding `dim = 128`, in `EXP_25` i feed coodinates as embedding, now i will only feed `6` features and keep coordinate seperated, score is really bad	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V3`	`cycle`
EXP_28	`1.050`	same as `EXP_26` but usng `mean`, `max`, `sum`, `min` pooling scheme	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V4`	`cycle`
EXP_29	`1.019`	same as `EXP_05` (`sensor_id`, has there own embeddings), feature input size is `9`, but using `masked` `mean` and `max` pooliing, transformer encoder `8` layers, with `rotatry_emb` and `ff_glu`, and `post_emb_normalazation`, `attn` dim `256` and loss_func `VonMisesFisher3DLoss` and `VonMisesFisher3DLossEcludeLossCosine` and `fp32`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V15`	`cycle`
EXP_29_FT	`0.99999`	same as `EXP_29` but with max_len `160` and `fp32`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V15_FT`	`cycle`
EXP_29_FT_2	`1.002`	same as `EXP_29_FT` but with max_len `196` and `fp32`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_HF_V15_FT_2`	`cycle`
EXP_30	`1.0379`	same as `EXP_26`, added hemophilty as input features	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V5`	`cycle`
EXP_31		same as `EXP_30`, added first layer of `graphnet` as embeding layer and then standarted `EGNNModel`, after first layer we will have `279` embedding features that will go to ENGG along with positions.	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V6`	`cycle`
EXP_32	`BAD`	same as `EXP_16` but using `VonMisesFisher3DLoss` and `CosineSimilarity` and `EGNNmodelV7`, the `KNN` grouping is performed using `xyzt` with `9` neighbors, and max_events are `196`, i am using `5` layer, aggregation type `sum`, with `swish` activation function embedding `dim = 128` also i have embeded `sensor_id` with `dim` = `32`, in `EXP_25` i feed coodinates as embedding, now i will only feed `6` features and keep coordinate seperated, i tried something similar in `EXP_27` but without `senosr_id` the score was bad	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V7`	`cycle`
EXP_33	`1.010`	Here I am using embeding layer from `garpnet`, `graphnet` module calculates first feature based on hemophility and then concat them and passes thru `garph` convolution. This is my embedding layer. After this i just feed to standart transfomer with `6` ecnoder `8` heads. pooling is performed on `mask` with `mean` and `max` concatenated, `ff_glue` , and `rotary_pos_emb` , `max_len = 128`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V8`	`cycle`
EXP_33_FT	`1.0007`	same as EXP_33 but finetuning, `max_len = 196`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V8_FT`	`cycle`
EXP_33_FT_2	`NE`	same as EXP_33 but finetuning, `max_len = 196`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V8_FT_2`	`cycle`
EXP_33_FT_3_KAPPA	`NE`	same as EXP_33 but finetuning, `max_len = 196` and filtering based on `kappa > 0.5`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V8_FT_3`	`cycle`
EXP_34	`1.025`	same as `EXP_26` but i modified `EGNNModel`, every forward pass thru convolution we will try to use `KNN` to rearange edges based on `xyz` (very similar what `dynnet` does	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V9`	`cycle`
EXP_34_FT	`1.010`	same as `EXP_34` but FT using `gVonMisesFisher3DLossCosineSimularityLoss`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V9`	`cycle`
EXP_34_FT_2	`1.005`	same as `EXP_34_FT_2` but FT	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V9_FT_2`	`cycle`
EXP_35	`same`	same as `EXP_34` but i modified `EGNNModel`, every forward pass thru convolution we will try to use `KNN` to rearange edges based on `pos`, `xyz` (very similar what `dynnet` does added `two` more features	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V10`	`cycle`
EXP_36	`same`	same us `EXP_33` but added `2` center of gravity feaatures	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V11`	`cycle`
EXP_37		training transformer based on `iofass` features, added few additiaonl features like a rank, `HuggingFaceDatasetV14`, `Encoder` is slitly bigger, with `dim_out=256`, `attn_depth = 12`, `heads = 12`, `ff_glu = True`,`rotary_pos_emb = True`, `use_rmsnorm = True,`, `layer_dropout = 0.1` ,`attn_dropout = 0.1`, `ff_dropout = 0.1`, added `3` pooling, `max`, `mean` and `cls_token`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name NANO_TRANSFORMER`	`cycle`
EXP_38		training `EGNN` with `10` layers, `GELU` activation function, added features based on `gExtractorV1`, `aux-emb`, `qe - emb`, edge rebuilding on updated `position` based on `7` neighbors, using `xyztc`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V12`	`cycle`
EXP_39	`1.000009`	training `DynNet` similar to `EXP_29_FT` but with `5` layers, `GELU` activation function using `xyztc` training crashed due to `cpu`	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V13`	`cycle`
EXP_39_FT	`1.000009`	FT `EXP_39_FT` no improvent loss fluctuates...	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V13_FT`	`cycle`
EXP_39_FT_2	`1.000009`	FT `EXP_39_FT_2` no improvent loss fluctuates...	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V13_FT_2`	`cycle`
EXP_40		combining `EXP_25` with Transformer , GNN->Transformer, added residual connection, concat features after GNN and also after Pool, with transformer	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name BASELINE_graph_V13_FT_2`	`cycle`
init5_hb_ft-2.ipynb	`0.9854`	`EncoderWithDirectionReconstructionV8`, uses ralitive possition baise with scaling, `dim_out=256`, `attn_depth` = `8`, `heads` = `12` , `layer_dropout = 0.01`, `attn_dropout = 0.01`, `ff_dropout = 0.01`, epoch `6`	`init5_hb_ft-2`	`full`
EXP_100	`0.995`	combining `DynNet` with Transformer , GNN->transformers, I am taking orignal GNN that has cv of `0.99` (its 4 layers) freezing it and using it as feature extractor and feeding to to transformer. Transformer has `cls_token_pooling`, `6` encoder layers. `12` gradd accumulation	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name EXP_100`	`cycle`
EXP_101	`0.994`	similar to `EXP_100` but I am encoding `charge`, `qe`, `aux`, `ice_properties`, concat with original `xyzt` and feeding to 1 layer graphnet (i am not encoding `xyzt`), because i need them for building edges, after this everything get fed to `10` layer transformer `encoder` , I also modified `max_events` we give priority to `aux` and then sort by `time`, `12` gradd accumulation	`!CUDA_VISIBLE_DEVICES=0 python train.py --config_name EXP_101`	`cycle`
hb_egnn	`0.988`	since `GraphNET` -> `Transforemr` did not show good perforamnce, I decided to replace `Garphnet` -> `EGGNN` (`Equivariant GNN`), I am taking first `2` layers , `edges` are build based on `xyz` and attaching `6` layers of transformer `encoder`	`hb_training_loop/hb_egnn.ipynb`	`cycle`
hb_localattention	`0.983251`	4 local attention, we attend based on `adjacen_matrix` max `8` neighbors, 3 layers + nomal transformer encoder, dropouts only added to local attentions	`hb_training_loop/hb_localattention.ipynb`	`cycle`
hb_localattention_ft	`0.9816`	same as `hb_localattention` just finetuning, with added two augmenations, randomly drop 5% events (p < `0.1`) and add randomly up +/- `5ns` (p < `0.1`)	`hb_training_loop/hb_localattention_ft.ipynb`	`cycle`
hb_mat		returning to the experiments `EXP_06`, using 4 layers for molecular transfomer but we are only considering `adjnacent matrix` and the weights to be combined with global attention to `0.9`, adjacent matrix is build using `12` neighborns and now we concider `xyzt`, based on previos experiment i addinatlly added to `DeepInceMode` droputs `attn_drop=0.1, drop=0.1`	`hb_training_loop/hb_mat.ipynb`	`cycle`
hb_graph_encoder	`0.994`	retrying one last time gnn-> transformer, `xyzt` adjacet matrix on `12` neighbros (xyzt) implementaion might have a bug ...	`hb_training_loop/hb_graph_encoder.ipynb`	`cycle`
hb_localattentionV2.ipynb	`0.9839`	sub_version of local attention more like global attention but with `fixed` learnable latent , `EncoderWithDirectionReconstructionV14`	`hb_training_loop/hb_localattentionV2.ipynb`	`cycle`
hb_localattentionV3.ipynb	`0.9840`	same as `V2` but integrated `cls_token` at the begining and training with two `augs`, `time`, and `event` dropps, `EncoderWithDirectionReconstructionV15`	`hb_training_loop/hb_localattentionV3ipynb`	`cycle`
hb_localattentionV4.ipynb	`0.9816`	3x (`localattention` with `Factorize Attention`) followed by transformer, `EncoderWithDirectionReconstructionV11_V2_LOCAL_GLOBAL`	`hb_training_loop/hb_localattentionV4.ipynb`	`cycle`
hb_localattentionV4FT2.ipynb	`0.9787`	same as `hb_localattentionV4` but finetuning on full datasset	`hb_training_loop/hb_localattentionV4FT2.ipynb`	`FULL`
hb_localattentionV4FT2SWA.ipynb		same as `hb_localattentionV4FT2` but finetuning on full datasset with SWA	`hb_training_loop/hb_localattentionV4FT2SWA.ipynb`	`FULL`
hb_global_graphlocal.ipynb	`not good`	in this experiment i am replacing local attention with EGNN, graph model the `edge_index` is build based on `ds2` with considering `8` neighbors. The model has 2 brancbes globa brunch and then local branch , at the end both output gets combined using another attention layer and then feed to transfomer encoder.	`hb_training_loop/hb_global_graphlocal.ipynb`	`FULL`
V17.ipynb	`0.9806`	in this experiment i am replacing local attention with EGNN, graph model the `edge_index` is build based on `ds2` with considering `8` neighbors. The model has 2 brancbes globa brunch and then local branch, local branch is EGNN with `3` layers, global brunch is factorzid attention with `4` layers.. the output get concataneted and feed to `S` model	`hb_training_loop/V17.ipynb`	`FULL`
V18.ipynb	`0.9790`	in this experiment i am replacing local attention with `GraphNEt`, graph model the `edge_index` is build based on `xyz` with considering `8` neighbors. The model has 2 brancbes globa brunch and then local branch, local branch is `Graphnet` with `4` layers, global brunch is factorzid attention with `4` layers.. the output get concataneted and feed to `S` model, each bruhc get seperate set of features .. global brunch operates on features that were generated using `ExtractorV0` layesr, graph brucnh operates on original `9` features.	`hb_training_loop/V18.ipynb`	`FULL`
V18FT.ipynb	`0.9777`	same as `V18.ipynb` but FT	`hb_training_loop/V18FT.ipynb`	`FULL`
V20.ipynb	`0.978`	in this experiment i am replacing local attention with `GraphNEt`, graph model the `edge_index` is build based on `xyz` with considering `8` neighbors. The model has 2 brancbes globa brunch and then local branch, local branch is `Graphnet` with `4` layers, global brunch is `rel_bias_attention` the output get concataneted and feed to `S` model, each bruhc get seperate set of features .. global brunch operates on features that were generated using `ExtractorV11` (`xyzt and aux`) layesr, graph brucnh operates on original `9` features	`hb_training_loop/V20.ipynb`	`FULL`
V20FT.ipynb	`0.9765`	same as `V20.ipynb` but FT	`hb_training_loop/V20FT.ipynb`	`FULL`
V20FT2.ipynb	`0.9755`	same as `V20FT.ipynb` but FT with `swa`	`hb_training_loop/V20FT2.ipynb`	`FULL`
V20FT3.ipynb	`0.9752`	same as `V20FT2.ipynb` but FT with `swa`	`hb_training_loop/V20FT3.ipynb`	`FULL`
V22.ipynb	`0.9743`	in this experiment i am replacing local attention with `GraphNEt`, graph model the `edge_index` is build based on `xyz` with considering `8` neighbors. For extracting featuers I will use `ExtractorV11Scaled` operating on (`x,y,z, aux, t, charge, L`) , we are added for each of them `scaled` parramaters, this get concattanated with `Graphnet` features and fed to `4` layers for `rel_bias_attention` and then fed to `S` model	`hb_training_loop/V22.ipynb`	`FULL`
V22FT.ipynb	`0.9731`	same as `V22.ipynb` but FT	`hb_training_loop/V22FT.ipynb`	`FULL`
V22FT2.ipynb	`0.9725`	same as `V22FT.ipynb` but FT with `swa`	`hb_training_loop/V22FT2.ipynb`	`FULL`
V22FT3.ipynb	`0.9669`	same as `V22FT2.ipynb` but FT with `swa` and `loss_comb` which is metric loss and vms (only did `4/8`)	`hb_training_loop/V22FT3.ipynb`	`FULL`
V22FT4.ipynb	`0.9662`	same as `V22FT3.ipynb` but FT with `swa` and `loss_comb` which is metric loss and vms	`hb_training_loop/V22FT4.ipynb`	`FULL`
V22FT5.ipynb	`0.9658`	same as `V22FT4.ipynb` but FT with `swa` and `loss_comb` which is metric loss and vms	`hb_training_loop/V22FT5.ipynb`	`FULL`
V22FT6.ipynb	`0.963579`	same as `V22FT5.ipynb` but FT with `swa` and `loss_comb` which is metric loss and vms and FT on `L256`	`hb_training_loop/V22FT6.ipynb`	`FULL`
V23.ipynb	`0.9723`	in this experiment i am replacing local attention with `GraphNEt`, graph model the `edge_index` is build based on `xyzt` with considering `8` neighbors. For extracting featuers I will use `ExtractorV11Scaled` operating on (`x,y,z, aux, t, charge, L`) , we are added for each of them `scaled` parramaters, this get concattanated with `Graphnet` features and fed to `4` layers for `rel_bias_attention` and then fed to `B` model	`hb_training_loop/V23.ipynb`	`FULL`
V23FT.ipynb	`0.9718`	same as `V23.ipynb` but FT (`5`/`8`)	`hb_training_loop/V23FT.ipynb`	`FULL`
V23FT2.ipynb	`0.9660`	same as `V23FT.ipynb` but FT with new loss (`4`/`8`)	`hb_training_loop/V23FT2.ipynb`	`FULL`
V23FT3.ipynb	`0.9658`	same as `V23FT2.ipynb` but FT with new loss (`4`/`8`)	`hb_training_loop/V23FT3.ipynb`	`FULL`
V23FT4.ipynb	`0.9652`	same as `V23FT3.ipynb` but FT with new loss (`4`/`8`)	`hb_training_loop/V23FT4.ipynb`	`FULL`
V23FT5.ipynb	`0.9647`	same as `V23FT4.ipynb` but FT with new loss (`8`/`8`)	`hb_training_loop/V23FT5.ipynb`	`FULL`
VFTV3_4REL.ipynb	`0.9659`	`BE` model with `4` REL	`hb_training_loop/VFTV3_4REL.ipynb`	`FULL`
VFTV3_4REL2.ipynb	`0.9634`	`BE` model with `4` REL	`hb_training_loop/VFTV3_4REL2.ipynb`	`FULL`

DrHB / icecube-journal

Summary

About

Languages