dpu genome-sequencing processing-in-memory processing-near-memory rna-seq-quantification upmem

UpPipe

UpPipe is an RNA abundance quantification design on a real processing-near-memory system (UPMEM DPU); the paper of this project is published in Design Automation Conference (DAC) 2023

Citation

Liang-Chi Chen, Chien-Chung Ho, and Yuan-Hao Chang, “UpPipe: A Novel Pipeline Management on In-Memory Processors for RNA-seq Quantification," ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, July 9-13, 2023.

@inproceedings{chen2023uppipe,
  title={UpPipe: A Novel Pipeline Management on In-Memory Processors for RNA-seq Quantification},
  author={Chen, Liang-Chi and Ho, Chien-Chung and Chang, Yuan-Hao},
  booktitle={2023 60th ACM/IEEE Design Automation Conference (DAC)},
  pages={1--6},
  year={2023},
  organization={IEEE}
}

Materials

Hardware/System Prerequisites

The project has to be run on a system equipped with UPMEM DRAM Processing Units (DPUs), and the kernel system requires installing the UPMEM SDK

Start

git clone https://github.com/chi-0828/UpPipe.git
cd UpPipe
chmod +x build.sh
./build.sh
make -j4

Usage

Allocate transcriptome to DPU(s)

KMER SIZE should be 3, 5, ..., 31
NUMBER OF DPU(s) in a PIPELINE WORKER should be less than 64 in our suggestion

./UpPipe build \
            -k KMER SIZE  \
            -i OUTPUT INDEX FILE PATH \
            -d NUMBER OF DPU(s) in a PIPELINE WORKER \
            -f TRANSCRIPTOME FILE PATH

Run alignment step for quantification

The size of k-mer is already set in INPUT INDEX FILE, this setting cannot be changed in this step

./UpPipe alignment \
            -i INPUT INDEX FILE PATH \
            -r NUMBER OF PIPELINE WORKER(s) \
            -f INPUT RNA READ FILE PATH

Parameters setting (dpu_app/dpu_def.h)

KMER SIZE less than 7 may lead to inaccurate mapping result
NUMBER OF DPU(s) in a PIPELINE WORKER should be less than 64 for optimal performance
The number of transcript / NUMBER OF DPU(s) in a PIPELINE WORKER must be less than 200 (COUNT_LEN in dpu_app/dpu_def.h)
Setting READ_LEN to the sequence length of RNA reads
Setting WRAM_READ_LEN to the a number which is larger than READ_LEN and divisible by 8
WRAM_PREFETCH_SIZE is the size for WRAM pre-feteching, 16 is the optimal size in most situations

Test

To build the index file by 11-mer and allocate to 60 DPUs

./UpPipe build \
            -k 11  \
            -i test/test.idx \
            -d 60 \
            -f test/tran.fa

To run alignment with 40 pipeline workers

./UpPipe alignment \
            -i test/test.idx \
            -r 40 \
            -f test/read.fa

Performance: UpPiep uses 40 pipeline workers

real    0m2.747s

Performance: UpPiep uses 20 pipeline workers

real    0m3.584s

Performance: kallisto

real    0m4.003s

To note that UpPipe shows its efficiency more in the large size dataset due to the porcessing-in-memory features

About

UpPipe is an RNA abundance quantification design on a real processing-near-memory system (UPMEM DPU); the paper of this project is published in Design Automation Conference (DAC) 2023

https://doi.org/10.1109/DAC56929.2023.10247915

dpu genome-sequencing processing-in-memory processing-near-memory rna-seq-quantification upmem

Languages

Language:C 91.2%Language:C++ 3.2%Language:Perl 1.7%Language:Makefile 1.6%Language:Roff 1.0%Language:M4 0.7%Language:Shell 0.4%Language:Scilab 0.1%