madihowa / EnergyCalibration

Refactored and Efficient Version of Cluster Calibration

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Energy Calibration Project for Atlas

Madison Howard

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. High Level Algorithm
  5. To Do
  6. Contact

About The Project

The signals used to form jets in the ATLAS detector are clusters of topologically connected calorimeter cell signals commonly called topo-clusters. The energy calibration method used in run 1 of the ATLAS detector was called local cell weighting (LCW) calibration. While LCW calibration is generally effective for energy calibration at most energy levels, it is less effective for lower energy levels due to a mixture of noise and lower statistics of data. The purpose of this project was to explore calibrating the energy using machine learning. This was initially implemented in a combination of C and Python but this repsoitory is the refactored, fully Python version.

Built With

This section lists major frameworks or libraries used for this project.

Getting Started

To get a local copy up and running follow these simple example steps.

Prerequisites

First, install pip!

Installation

  1. Clone the repository
    git clone https://github.com/madihowa/EnergyCalibration
  2. Go the base directory
    cd EnergyCalibration
  3. Install the required python libraries and set up the directory structure for the program
    ./setup.sh

Usage

There are two ways that I have set this up to work. You can run it locally or a HPC system that uses the Slurm Workload Manager. You also have the option of training and testing the network or testing on an already trained network. First through you need to do the following:

  • Add your data files into the data directory

  • Add an inputs csv file to inputs directory.

    • This should be formatted as a comma separated values in a continuos line with NO spaces.
  • Add a cuts json file to the cuts directory.

    • Format of cuts:
      name (dtype -> string (e.g. "cut1")):
          {
                  function: dtype -> string (e.g. "trimDF"),
                  term: dtype -> string (e.g. "clusterEta"),
                  operation: dtype -> string (e.g. ">"),
                  value: dtype -> float/int (e.g. 0.8, 2),
      
          }
    • Important Notes on cuts:
      1. The 'JSON' file can contain as many of such cuts as required.
      2. An empty 'JSON' file would imply no cuts.
      3. Cuts will be processed in the order they are presented in the 'JSON' file.
      4. The following cuts are available within the program - 'trimDF', 'trimAbsDF', 'summedEnergyCut'. You can define your own custom cuts in Cuts.py file.
  • Start in the EnergyCalibration directory to begin a run.

Usage for training and testing

USAGE ON LOCAL MACHINE:

python Master.py emission_folder path/to/inputs/list/csv path/to/testing/data/csv path/to/training/data/csv path/to/cuts/json

USAGE ON HPC SYSTEMS:

To know all the input parameters

python qjob.py --help

To run a job

sbatch quanah_job.sh emission_folder path/to/inputs/list/csv path/to/testing/data/csv path/to/training/data/csv path/to/cuts/json

Usage for testing a trained network

python testing_trained_NN.py path/to/hdf5_files path/to/test_data.csv path/to/train_data.csv

Highlevel Algorithm:

  1. Take in 2 arguments that specify emission_folder name and inputs_list.csv csv file that specifies the columns needed for training. The emission folder is what will be created after training a network. The input list is what we give the network while training. It will copy this csv file into the emission folder when training is complete.

  2. Read the test and train data.

    • Record and save all the columns.
  3. Format the datasets:

    • Trim the datasets with required columns.

    • Make any cuts if needed by creating cuts JSON file

    • Drop the target column from training data.

    • Normalize the data sets.

  4. Train the model with required callbacks and learning schemes.

    • generate validation data from training data.

    • store the model using checkpoints.

    • store the learning history.

  5. Predict using the trained model.

    • Create the predictions figure

    • Create the results.csv with the new column being the predicted values' vector

    • Use Boosthistogram to create the plots


To Do:

-[] Update GraphMean to generate ROOT plots

-[] Update IQR to generate ROOT plots

Contact

Please contact me with any questions.

Madison Howard - madison7howard@gmail.com

Project Link: https://github.com/madihowa/EnergyCalibration

About

Refactored and Efficient Version of Cluster Calibration


Languages

Language:Jupyter Notebook 94.1%Language:Python 5.6%Language:Shell 0.4%