BigBed interval problem
jtd032 opened this issue · comments
I am creating a list of histograms, one for each file using below code:
Imports
import numpy as np
import pyBigWig as bw
import matplotlib.pyplot as plt
import os
For Loop
directory = 'listed file path'
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
if os.path.isfile(f) and filename.endswith('.bb'):
fp = bw.open(f,'r')
chr = filename.replace('.bb','')
max = fp.header()['maxVal']
#print(fp.header())
a = np.array(fp.entries(chr, 1, max),dtype=np.int64)
plt.hist(a[:,2], bins='auto') # arguments are passed to np.histogram
plt.title("Histogram with 'auto' bins")
#Text(0.5, 1.0, "Histogram with 'auto' bins")
print(chr)
plt.show()
The problem I am riunning into is retreval of the maxVal from the Header command, it works for the first few graphs but ends up spitting out an error at later files: (int() argument must be a string, a bytes-like object or a number, not 'NoneType') am I understanding that the maxVal is the top end of the range of values for that file?
The maxVal is stored in the bigBed header. Could it be that it simply wasn't set for one of the files?
Can you make the file available to me? I can have a look then.
Hi,
Currently, I'd like to know how to save the all entries into a file.
Here is my code:
bb=pyBigWig.open('./PBMCs_HistoneMarks_Blueprint/Males_UMCG00025_H3K4me1.peak_calls.bigBed' )
bb.entries('chrX', 16426, 156000962, withString=False)
So how can I output "bb.entries" object? By the way,for the bigBed object, how can I output all chromosomes intervals at one time, I found I need to specify start and end positions for each chromosome.
Again,if I use bigWig file, the intervals I extract is same as bigBed?Because I found start and end position is not necessary for bigWig file based on your description.
Many thx!
I don't know that I ever put in the logic in the .entries()
function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.
For outputting the results of bb.entries(), it's just a list of tuples, so something like the following would work:
for res in bb.entries('chr1', 10000000, 10020000):
o.write("chr1\t{}\t{}\t{}\n".format(res[0], res[1], res[2]))
I don't know that I ever put in the logic in the
.entries()
function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.
Thanks for your detailed reply.
Could I try this, I don't need strings:
for res in bb.entries('chr1', 10000000, 10020000, withString=False):
o.write("chr1".format(res[0], res[1], res[2]))
Best wishes!
o.write("chr1\t{}\t{}\n".format(res[0], res[1]))
in that case as an example.