pepkit / geofetch

Builds a PEP from SRA or GEO accessions

Home Page:https://pep.databio.org/geofetch/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ability create `.pep.yaml` files

nleroy917 opened this issue · comments

Overview:

To facilitate the integration of GEO PEPs with other PEP tools (namely pephub), it would be super helpful to have a flag for geofetch that makes it create a .pep.yaml file in the folder of the PEP for the accession. Basically, the file just lets us point to the configuration file so tools like pephub can find it easily.

More info/discussion on the .pep.yaml dotfile can be read here: pepkit/pephub#23

Example:

GSE73874 will be a PEP that is a single folder with the following contents:

GSE73874_samples.yaml:

pep_version: 2.1.0
project_name: GSE73874
sample_table: GSE73874_samples.csv

sample_modifiers:
  append:
    output_file_path: FILES
  derive:
    attributes: [output_file_path]
    sources:
      FILES: /{GSE}/{file}

GSE73874_samples.csv:

GSE,Sample_title,Sample_geo_accession,Sample_status,Sample_submission_date,Sample_last_update_date,Sample_type,Sample_channel_count,Sample_source_name_ch1,Sample_organism_ch1,Sample_taxid_ch1,Sample_treatment_protocol_ch1,Sample_growth_protocol_ch1,Sample_molecule_ch1,Sample_extract_protocol_ch1,Sample_data_processing,Sample_platform_id,Sample_contact_name,Sample_contact_email,Sample_contact_institute,Sample_contact_address,Sample_contact_city,Sample_contact_state,Sample_contact_zip/postal_code,Sample_contact_country,Sample_instrument_model,Sample_library_selection,Sample_library_source,Sample_library_strategy,Sample_series_id,Sample_data_row_count,file,file_url,sample_name,file_size,type,BioSample,SRA,Library strategy,Genome_build,Supplementary_files_format_and_content,cell line,chip antibody,treatment
GSE73874,PC9_ATACseq,GSM1904729,Public on Jul 03 2017,Oct 09 2015,May 15 2019,SRA,1,NSCLC,Homo sapiens,9606,"To generate DTPs, PC9 cells were treated with 1 μM erlotinib for 8 days. Media was replaced with fresh media supplemented with erlotinib every 3 days. Cells that survive the 8 day-treatment were considered DTPs. DTP_TSA samples were generated by treating DTPs on day 8 of erlotinib treatment with 50 nM TSA for 5 hours. ALDH_High and ALDH_Low samples were obtained from PC9 cells treated with DMSO or 1 μM erlotinib erlotinib for 12 hours as indicated prior to aldefluor sorting by FACS.",PC9 cells were cultured at 37 °C with 5% CO2 in RPMI-1640 (Invitrogen) supplemented with 10% FBS (HyClone).,genomic DNA,"Nuclei were isolated before the transposition reactions were carried out as described in Buenrostro et al Nature Methods 2013., Sequencing libraries were prepared using a modified version of the Illumina Nextera DNA Sample prep kit.","Sequencing reads were aligned to human genome build hg19 using bowtie with essentially the same parameters as described previously (Buenrostro et al Nature Methods 2013). The reporting parameter was changed from -m1 to -M1 in order to include a randomly selected single alignment for reads mapping to multiple locations., Genome-wide accessibility data was calculated using a 150 bp sliding window (step size ",GPL16791,"Suchit,,Jhunjhunwala",suchitj@gene.com,Genentech,"1 DNA Way, MS-444A",South San Francisco,CA,94080,USA,Illumina HiSeq 2500,other,genomic,OTHER,"GSE73874, GSE74180",0,GSM1904729_PC9_Parental.trim.multi.sort.nuc.rmdup.bed.WIN150.STEP20.COUNT.bw,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1904nnn/GSM1904729/suppl/GSM1904729_PC9_Parental.trim.multi.sort.nuc.rmdup.bed.WIN150.STEP20.COUNT.bw,PC9_ATACseq,136233222,BW,https://www.ncbi.nlm.nih.gov/biosample/SAMN04254244,https://www.ncbi.nlm.nih.gov/sra?term,ATAC-seq,hg19,"bigWig files were generated using the software available from UCSC (Kent et al., 2010).",PC9,none,none
GSE73874,DTP_ATACseq,GSM1904730,Public on Jul 03 2017,Oct 09 2015,May 15 2019,SRA,1,NSCLC,Homo sapiens,9606,"To generate DTPs, PC9 cells were treated with 1 μM erlotinib for 8 days. Media was replaced with fresh media supplemented with erlotinib every 3 days. Cells that survive the 8 day-treatment were considered DTPs. DTP_TSA samples were generated by treating DTPs on day 8 of erlotinib treatment with 50 nM TSA for 5 hours. ALDH_High and ALDH_Low samples were obtained from PC9 cells treated with DMSO or 1 μM erlotinib erlotinib for 12 hours as indicated prior to aldefluor sorting by FACS.",PC9 cells were cultured at 37 °C with 5% CO2 in RPMI-1640 (Invitrogen) supplemented with 10% FBS (HyClone).,genomic DNA,"Nuclei were isolated before the transposition reactions were carried out as described in Buenrostro et al Nature Methods 2013., Sequencing libraries were prepared using a modified version of the Illumina Nextera DNA Sample prep kit.","Sequencing reads were aligned to human genome build hg19 using bowtie with essentially the same parameters as described previously (Buenrostro et al Nature Methods 2013). The reporting parameter was changed from -m1 to -M1 in order to include a randomly selected single alignment for reads mapping to multiple locations., Genome-wide accessibility data was calculated using a 150 bp sliding window (step size ",GPL16791,"Suchit,,Jhunjhunwala",suchitj@gene.com,Genentech,"1 DNA Way, MS-444A",South San Francisco,CA,94080,USA,Illumina HiSeq 2500,other,genomic,OTHER,"GSE73874, GSE74180",0,GSM1904730_PC9_DTP.trim.multi.sort.nuc.rmdup.bed.WIN150.STEP20.COUNT.bw,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1904nnn/GSM1904730/suppl/GSM1904730_PC9_DTP.trim.multi.sort.nuc.rmdup.bed.WIN150.STEP20.COUNT.bw,DTP_ATACseq,133054754,BW,https://www.ncbi.nlm.nih.gov/biosample/SAMN04254245,https://www.ncbi.nlm.nih.gov/sra?term,ATAC-seq,hg19,"bigWig files were generated using the software available from UCSC (Kent et al., 2010).",PC9,none,8 day 1 μM erlotinib
GSE73874,DTP_TSA_ATACseq,GSM1904731,Public on Jul 03 2017,Oct 09 2015,May 15 2019,SRA,1,NSCLC,Homo sapiens,9606,"To generate DTPs, PC9 cells were treated with 1 μM erlotinib for 8 days. Media was replaced with fresh media supplemented with erlotinib every 3 days. Cells that survive the 8 day-treatment were considered DTPs. DTP_TSA samples were generated by treating DTPs on day 8 of erlotinib treatment with 50 nM TSA for 5 hours. ALDH_High and ALDH_Low samples were obtained from PC9 cells treated with DMSO or 1 μM erlotinib erlotinib for 12 hours as indicated prior to aldefluor sorting by FACS.",PC9 cells were cultured at 37 °C with 5% CO2 in RPMI-1640 (Invitrogen) supplemented with 10% FBS (HyClone).,genomic DNA,"Nuclei were isolated before the transposition reactions were carried out as described in Buenrostro et al Nature Methods 2013., Sequencing libraries were prepared using a modified version of the Illumina Nextera DNA Sample prep kit.","Sequencing reads were aligned to human genome build hg19 using bowtie with essentially the same parameters as described previously (Buenrostro et al Nature Methods 2013). The reporting parameter was changed from -m1 to -M1 in order to include a randomly selected single alignment for reads mapping to multiple locations., Genome-wide accessibility data was calculated using a 150 bp sliding window (step size ",GPL16791,"Suchit,,Jhunjhunwala",suchitj@gene.com,Genentech,"1 DNA Way, MS-444A",South San Francisco,CA,94080,USA,Illumina HiSeq 2500,other,genomic,OTHER,"GSE73874, GSE74180",0,GSM1904731_PC9_DTP_TSA.trim.multi.sort.nuc.rmdup.bed.WIN150.STEP20.COUNT.bw,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1904nnn/GSM1904731/suppl/GSM1904731_PC9_DTP_TSA.trim.multi.sort.nuc.rmdup.bed.WIN150.STEP20.COUNT.bw,DTP_TSA_ATACseq,143681822,BW,https://www.ncbi.nlm.nih.gov/biosample/SAMN04254246,https://www.ncbi.nlm.nih.gov/sra?term,ATAC-seq,hg19,"bigWig files were generated using the software available from UCSC (Kent et al., 2010).",PC9,none,8 day 1 μM erlotinib + 5 hour 50 nM TSA

.pep.yaml:

config_file: GSE73874_samples.yaml

@nleroy917 I have added this functionality. Run neofetch with --add-dotfile flag and check if all requirements are working.


Tests are failing, because I changed the structure of PEP files saving.