bed_shift and bed_slop failed test "going beyond the start of the chrom"
raysinensis opened this issue · comments
Ray Fu commented
bedtools outputs 0, 1 for start and end in these edge cases
library(valr)
tiny.genome <- tibble::tribble(
~chrom, ~size,
"chr1", 1000
)
b <- tibble::tribble(
~chrom, ~start, ~end, ~name, ~score, ~strand,
"chr1", 50, 60, "a1", 5, "-"
)
bed_shift(b, tiny.genome, size = -100, trim = TRUE)
#> # A tibble: 1 x 6
#> chrom start end name score strand
#> <chr> <dbl> <dbl> <chr> <dbl> <chr>
#> 1 chr1 0 -40 a1 5 -
bed_slop(b, tiny.genome, left = -100, right = 100, strand = TRUE, trim = TRUE)
#> # A tibble: 1 x 6
#> chrom start end name score strand
#> <chr> <dbl> <dbl> <chr> <dbl> <chr>
#> 1 chr1 0 -40 a1 5 -
Created on 2018-01-19 by the reprex package (v0.1.1.9000).
Kent Riemondy commented
Some minor modifications to bound_intervals()
fixes this issue, #323, and #321
Here's a fix that bounds the starts and ends to 0 and the chromosome size. Also added a check for zero length intervals.
bound_intervals <- function(x, genome, trim = FALSE) {
if (!is.tbl_interval(x)) x <- as.tbl_interval(x)
res <- left_join(x, genome, by = "chrom")
if (trim) {
res <- mutate(
res,
start = ifelse(start < 0, 0, min(start, size - 1)),
end = ifelse(end > size, size, max(1, end))
)
res <- select(res, -size)
} else {
res <- filter(res, start >= 0 & start < size & end <= size & end > 0)
res <- select(res, -size)
}
if(any(res$start == res$end)){
warning(paste0(sum(res$start == res$end),
" intervals generated with same start and end were discarded"))
}
res <- res[res$start != res$end, ]
res
}