lcolladotor / derfinder

Annotation-agnostic differential expression analysis of RNA-seq data via expressed regions-level or single base-level approaches

Home Page:http://lcolladotor.github.io/derfinder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

filterData not providing feedback

fomightez opened this issue · comments

UPDATE: The problem was when I had only one column. I got feedback correctly when I added my other datasets and tried filterData. END UPDATE

The example at here shows that there should be a number shown in now there are rows after are, before rows, such as ".., now there are 2256 rows." I am not seeing that. I see:

> filteredCov <- lapply(fullCov, filterData, cutoff = 10) 2017-05-02 18:35:37 filterData: originally there were 230218 rows, now there are rows.

I see that something happens when I run filterData as shown below. Maybe because I ran with only 1 column, the feedback mechanism is flawed? I just first obtained and used this from Bioconductor yesterday, so it should be a very recent version of derfinder.
What I see ( I removed spaces to get it to format as a code block):

`

filteredCov <- lapply(fullCov, filterData, cutoff = 1)
2017-05-02 18:34:36 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov <- lapply(fullCov, filterData, cutoff = 0.01)
2017-05-02 18:34:53 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov <- lapply(fullCov, filterData, cutoff = 10)
2017-05-02 18:35:37 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov <- lapply(fullCov, filterData, cutoff = 100)
2017-05-02 18:35:41 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 226407 with 123033 runs
Lengths: 166 57 ... 7
Values : 152.528541903404 305.057083806807 ... 152.528541903404
$I$position
logical-Rle of length 230218 with 101 runs
Lengths: 159 74 8 75 660 ... 7 602 436 7 413
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE
filteredCov <- lapply(fullCov, filterData, cutoff = 10)
2017-05-02 18:37:37 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 226407 with 123033 runs
Lengths: 166 57 ... 7
Values : 152.528541903404 305.057083806807 ... 152.528541903404
$I$position
logical-Rle of length 230218 with 101 runs
Lengths: 159 74 8 75 660 ... 7 602 436 7 413
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE
filteredCov <- lapply(fullCov, filterData, cutoff = 2)
2017-05-02 18:38:06 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 226407 with 123033 runs
Lengths: 166 57 ... 7
Values : 152.528541903404 305.057083806807 ... 152.528541903404
$I$position
logical-Rle of length 230218 with 101 runs
Lengths: 159 74 8 75 660 ... 7 602 436 7 413
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE
filteredCov <- lapply(fullCov, filterData, cutoff = 1002)
2017-05-02 18:39:16 filterData: originally there were 230218 rows, now there are rows. Meaning that percent was filtered.
filteredCov
$I
$I$coverage
numeric-Rle of length 212193 with 121518 runs
Lengths: 1 7 ... 19
Values : 1067.69979332383 1220.22833522723 ... 1067.69979332383
$I$position
logical-Rle of length 230218 with 431 runs
Lengths: 1796 218 78 6 204 ... 7 5 98 6 1182
Values : FALSE TRUE FALSE TRUE FALSE ... FALSE TRUE FALSE TRUE FALSE`

Hi,

If I understand correctly, you are saying that if fullCov has data for just 1 sample, then you don't get the expected verbose message, right?

It could be that nrow(DF) at https://github.com/lcolladotor/derfinder/blob/master/R/filterData.R#L203 is returning a NULL in that case.

In any case, it looks like you don't have this problem anymore, right? Like, once you started using more than 1 sample (you called them datasets) then it started to work well.

Best,
Leo

Yes, adding the other samples eliminated the weirdness with the feedback text. The weirdness was that it wasn't showing any numbers for the resulting rows in the feedback text.

Also for one sample it seems from here I could have maybe used sapply(fullCov, function(x) { sum(x[[1]])}) just to see I had things working. I wanted to see more than I get from typing fullCov because I was limited to seeing all zeros at the start and end when working in R command line console. I was just curious to verify if there was more than a column of zeros.

In summary:
It did look like filterData works with one sample (maybe?), and so if you ever have an overabundance of spare time you could consider making feedback still work, but no big deal. Thanks.

This should be fixed in derfinder 1.10.2 and 1.11.2