ryancoffee / edgeml_fes

EdgeML for Fusion Energy Science project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

edgeml_fes

EdgeML for Fusion Energy Science project

DOE-FES Support

This project was supported by the US Department of Energy, Office of Science, Fusion Energy Science, under

#File location Finn has placed files into the Ml group spacs

/sdf/group/ml/datasets
" ============================================================================                                                                                                                                     
" Netrw Directory Listing                                        (netrw v149)
"   /sdf/group/ml/datasets/d3d_data
"   Sorted by      name
"   Sort sequence: [\/]$,\<core\%(\.\d\+\)\=\>,\.h$,\.c$,\.cpp$,\~\=\*$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$
"   Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:exec
" ============================================================================
../
./
ecebes_156562.h5
ecebes_156563.h5
ecebes_156564.h5
ecebes_156565.h5
ecebes_156637.h5
ecebes_156638.h5
ecebes_156639.h5


ecebes_176947.h5
ecebes_176963.h5
ecebes_176970.h5
ecebes_176972.h5
ecebes_176977.h5
ecebes_176979.h5
ecebes_176986.h5
ecebes_176988.h5
ecebes_176995.h5
ecebes_176997.h5
ecebes_177002.h5
ecebes_177004.h5
ecebes_177011.h5
ecebes_177013.h5
ecebes_177020.h5
ecebes_177022.h5         

There are over 6000 files, each file is a shot. Each shot has both ece and bes data (coarse and fine). The keys of these files are ['BESFU', 'BESSU', 'ece', 'ecevs'] of which we want the BESFU and ecevs

Working notes

Using sdf branch for updating /src/run_parallel_ecebes.py Converting the scipy.fftpack.dct to the matrix version to check for performance improvement and also to use lower bit depth. Still in progress Times I will convert to np.uint32 represented in microseconds. Later this will become 1/4 microseconds to accommodate eventual higher sample rate ADCs.

On ece locations, including this now(soon) in the .h5 conversion of pickle files:
From Joe Abbate
Joe: hey ryan sorry again for missing your email! ryan: "Let me guess... you use the magnetics from slow sensors to reconstruct the location from which the cyclotron frequency was emitted corresponding to that ece channel. Since it's a slow variable, you only measure every 50 ms or so. The vector of values and times shows the location drift of the channel throughout the shot. Is that right? We would interpolate in order to assign the spectrogram patches that Alan is working with to a fixed location in the lab frame." Joe: that's exactly right yup!

On second thought, disenabling locations again for sake of the Finn datasets that don't include this derived quantity.

DOE Program Support

This project is funded by the US Department of Energy, Office of Science, Fusion Energy Science under Field Work Proposal FWP-100636 Machine Learning for Real-time Fusion Plasma Behavior Prediction and Manipulation

Running on SLAC SDF

First obtain an account and log into the SLAC SDF cluster login node via ssh. Make sure you have access to the saved pickel file for the ece and bes sampe shots in /gpfs/slac/staas/fs1/g/coffee_group/edgeml_fes_data/ecebes/. Please check out the docs for slurm on SLAC SDF at https://github.com/slaclab/sdf-docs/blob/master/batch-compute.md#interactive. There is a new landing site for the data being pulled by Finn O'Shea, /sdf/group/ml/datasets/d3d_data/ecebes_[176]*.h5. There are many more shots from ecebes_156562.h5 to ecebes_177022.h5. The easiest way to find them is to vim the parent folder name vim /sdf/group/ml/datasets/d3d_data and look.

There are currently nearly 1000 files there with a total of about 3/4 TB and it is still growing as the shots are pulled from DIII-D server. Thank you Finn!
To use this dataset, I will need to refactor the spectrogram code to read in .h5 files rather than the previous pickle implementation. I will preserve this pickle version as an alpha-release.

ssh <uname>@sdf.slac.stanford.edu
git clone https://github.com/ryancoffee/edgeml_fes.git
cd edgeml_fes
git checkout sdf
srun --x11 --partition ml -n 4 --time 0-03:00:00 --mem-per-cpu=200000 --pty /bin/bash
module load slac-ml
ls /gpfs/slac/staas/fs1/g/coffee_group/edgeml_fes_data/d3d_output/
ls /sdf/group/ml/datasets/d3d_data/
python3 ./src/run_parallel_ecebes.py -ipath /sdf/group/ml/datasets/d3d_data -opath /gpfs/slac/staas/fs1/g/coffee_group/edgeml_fes_data/d3d_ouptut/h5files -nthreads 2 -nsamples_bes 1024 -nsamples_ece 512 -shots 157817 157818 157819 157820

... or to run the paramllel version

python3 ./src/parallel_ecebes_dct.py -ipath /gpfs/slac/staas/fs1/g/coffee_group/edgeml_fes_data/ecebes -opath /gpfs/slac/staas/fs1/g/coffee_group/edgeml_fes_data/ecebes/h5files_para -nthreads 4 150616 150792 157102 163117

The above code is intended to be a snippet of a possible execution. It presupposes that the files exist and the paths have not changed. One must check wiht the ls /gpfs... that the pickle files exist as expected and the shot numbers are there. Furthermore, upon successful completion, the resulting /gpfs/slac/staas/fs1/g/coffee_group/edgeml_fes_data/ecebes/h5files/collection_dct.h5 should be moved to another name. This is because the file is appended when being produced, and so fails if trying to create a group (e.g. shotnumber) that already exists.

NOTE: Each shot adds about 6GB to the collection_dct.h5 file

python/gnuplot

Be sure to enable x11 forwarding with -X or -Y options to ssh. This passes through to the slurm container by the --x11 option.

ssh -Y <uname>@sdf.slac.stanford.edu
srun --x11 --partition ml -n 1 --time 0-03:00:00 --mem-per-cpu=200000 --pty /bin/bash
module load slac-ml

and to include only or exclude a node that e.g. has misbehaving x11 forwarding...

srun --x11 --nodelist tur015 --partition ml -n 1 --time 0-03:00:00 --pty /bin/bash
srun --x11 --exclude tur015 --partition ml -n 1 --time 0-03:00:00 --pty /bin/bash

Notes to self

Moving to src/run_parallel_ecebes.py

11/17/2021

  • group by detector, then by channel, then by 'method'
  • restrict method to 'directional' and 'max'
  • use mean and stdev for the highest 10% of frequencies to set the scale of a sigmoid multiplier (maybe) or at least a mean subtract and a threshold
  • paly the trick of multiplying peaks by their derivatives and look for zero crossing.
  • centroid and width of a peak could serve as a 2D embedding for each peak (word)

About

EdgeML for Fusion Energy Science project

License:Other


Languages

Language:HTML 60.7%Language:Python 18.8%Language:CSS 12.5%Language:Gnuplot 7.5%Language:JavaScript 0.4%