alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

outFileNamePrefix is ignored when running in genomeGenerate runMode

Stikus opened this issue · comments

Hello, thanks for great tool.

We are trying to implement STAR in our pipeline and found strange thing - when we're using genomeGenerate runMode parameter outFileNamePrefix is not working properly:
image

/ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Log.out content:

STAR version=2.7.11b
STAR compilation time,server,dir=2024-01-25T16:12:02-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
STAR git: On branch master ; commit a72e5fa27331108f524211d667949dc5ff4072e8 ; diff files: CHANGES.md README.md doc/STARmanual.pdf extras/doc-latex/STARmanual.tex extras/doc-latex/parametersDefault.tex extras/docker/Dockerfile source/VERSION
##### Command Line:
/soft/STAR-2.7.11b/bin/Linux_x86_64/STAR --runThreadN 192 --runMode genomeGenerate --genomeDir /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099 --genomeFastaFiles /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa --sjdbGTFfile /ref/gtf/gencode.v22.annotation.gtf --sjdbOverhang 99 --outFileNamePrefix /ref/STAR/GRCh38.d
1.vd1_gencode.v22.annotation_index_STAR099/Test__
##### Initial USER parameters from Command Line:
outFileNamePrefix                 /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Test__
###### All USER parameters from Command Line:
runThreadN                    192     ~RE-DEFINED
runMode                       genomeGenerate        ~RE-DEFINED
genomeDir                     /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099     ~RE-DEFINED
genomeFastaFiles              /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa        ~RE-DEFINED
sjdbGTFfile                   /ref/gtf/gencode.v22.annotation.gtf     ~RE-DEFINED
sjdbOverhang                  99     ~RE-DEFINED
outFileNamePrefix             /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Test__     ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
runMode                           genomeGenerate
runThreadN                        192
genomeDir                         /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099
genomeFastaFiles                  /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa
outFileNamePrefix                 /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Test__
sjdbGTFfile                       /ref/gtf/gencode.v22.annotation.gtf
sjdbOverhang                      99

-------------------------------
##### Final effective command line:
/soft/STAR-2.7.11b/bin/Linux_x86_64/STAR   --runMode genomeGenerate      --runThreadN 192   --genomeDir /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099   --genomeFastaFiles /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa      --outFileNamePrefix /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Test__
--sjdbGTFfile /ref/gtf/gencode.v22.annotation.gtf   --sjdbOverhang 99
----------------------------------------

Number of fastq files for each mate = 1
ParametersSolo: --soloCellFilterType CellRanger2.2 filtering parameters:  3000 0.99 10
Finished loading and checking parameters
--genomeDir directory exists and will be overwritten: /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/

As you can see - outFileNamePrefix we use is /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Test__ and it is parsed, but log is named Log.out and not Test__Log.out, like Test___STARtmp.


If we run command with local prefix:

/soft/STAR-2.7.11b/bin/Linux_x86_64/STAR --runThreadN 192 --runMode genomeGenerate --genomeDir /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099 --genomeFastaFiles /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa --sjdbGTFfile /ref/gtf/gencode.v22.annotation.gtf --sjdbOverhang 99 --outFileNamePrefix Test__

We get warning message:

        /soft/STAR-2.7.11b/bin/Linux_x86_64/STAR --runThreadN 192 --runMode genomeGenerate --genomeDir /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099 --genomeFastaFiles /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa --sjdbGTFfile /ref/gtf/gencode.v22.annotation.gtf --sjdbOverhang 99 --outFileNamePrefix Test__
        STAR version: 2.7.11b   compiled: 2024-01-25T16:12:02-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
Feb 02 17:57:30 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from Test__Log.out into /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Log.out. Will keep Test__Log.out

And log stay in run directory. In other mods we don't have this problem.

https://github.com/alexdobin/STAR/blob/master/source/Genome_genomeGenerate.cpp#L101-L111 - looks like this is part of code about moving Log.out and there is not any outFileNamePrefix mention:

	createDirectory(pGe.gDir, P.runDirPerm, "--genomeDir", P);

	{//move Log.out file into genome directory
		string logfn=pGe.gDir+"Log.out";
		if ( rename( P.outLogFileName.c_str(), logfn.c_str() ) ) {
			warningMessage("Could not move Log.out file from " + P.outLogFileName + " into " + logfn + ". Will keep " + P.outLogFileName +"\n", \
						   std::cerr, P.inOut->logMain, P);
		} else {
			P.outLogFileName=logfn;
		};
	};

Hi @Stikus

--outFileNamePrefix is not used for genome index output, it uses only the path in --genomeDir.

@alexdobin But --outFileNamePrefix is still used for STARtmp directory, even for genome indexing - as you can see on first screen. Why it is not used for Log.out?

Moreover - as you can see in local prefix example:

!!!!! WARNING: Could not move Log.out file from Test__Log.out into /ref/STAR/GRCh38.d1.vd1_gencode.v22.annotation_index_STAR099/Log.out. Will keep Test__Log.out

--outFileNamePrefix Test__ is somehow used. maybe due to this part of code:
https://github.com/alexdobin/STAR/blob/master/source/Parameters.cpp#L369-L370

    outLogFileName=outFileNamePrefix + "Log.out";
    inOut->logMain.open(outLogFileName.c_str());

What do you suggest? Do not use --outFileNamePrefix entirely in genome index command? Is there any way to define 'Log.out' name/prefix for genome indexing? Or should we use manual renaming?

For genome generation, the easiest way is to create the genome directory, cd to it, and run STAR from there with --genomeDir ./ and without --outFileNamePrefix.

Ok, if this is intended - you can close this issue, I was asking due to inconsistent behavior of --outFileNamePrefix option.