marchion / git.myRfuncs

A miscellaneous collection of useful R functions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

# Collapsing data using an index

## collapseData() is a general function to summarize data based on an index

The function collapseData() can collapse a vector based on a user-define function
and an index/factor.

```
collapseData <- function(x, ind, func=mean,  ...) {
        tapply(X=x, INDEX=ind, FUN=func, ...)
}
```

An example is provided below.
First we create a data.frame with repeated values in column 1

```
dat <- data.frame(IDS=rep(letters[1:3], each=3), Values=1:9)
```

We can then collapse the information based on the index.
By default the summarization is obtained by using the mean() function.

```
collapsedValues <- collapseData(dat$Values, dat$IDS)
```

An alternative function can be passed too, here I just return the first value

```
collapsedValues <- collapseData(dat$Values, dat$IDS, func=sum)
```

This function can be also used over the columns of a data.frame by
coupling it with apply()

```
collapsedValues <- apply(dat, 2, collapseData, ind=dat$IDS, func=function(x) x[1] )
```

The collapseSelectOutput() function can be used to collapse annotation
data.frames generated by select() from
[AnnotationDbi](http://www.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html)

```
collapseSelectOutput <- function(dat, keyCol=1,  glue="; ", ...) {
    apply(dat, 2, collapseData, ind=dat[,keyCol],
          func=function(x) paste(unique(x), collapse=glue))
}
```

An example 
```
require(TxDb.Hsapiens.UCSC.hg19.knownGene)
keytypes(TxDb.Hsapiens.UCSC.hg19.knownGene)
ann <- select(TxDb.Hsapiens.UCSC.hg19.knownGene, keys=c("1", "2", "3"),
              keytype="GENEID", columns=c("TXID", "TXNAME"))
annCollapsed <- collapseSelectOutput(ann)
```

## ENJOY!

About

A miscellaneous collection of useful R functions

License:GNU General Public License v2.0


Languages

Language:R 100.0%