danielecook / danielecook.com

Source files for danielecook.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

/generate-a-bedfile-of-masked-ranges-a-fasta-file/

utterances-bot opened this issue · comments

Generate a bedfile of masked ranges a fasta file - Daniel E. Cook

https://www.danielecook.com/generate-a-bedfile-of-masked-ranges-a-fasta-file/

Hey,

thanks for sharing this. Based on your input, I used this

#!/usr/bin/env python

import sys

chrom = ""
pos = -1
start = -1
in_masked_region = False
infile = sys.argv[1]
with open(sys.argv[1], "r") as fh:
    for line in fh:
        if line.startswith(">"):
            if in_masked_region:  # last masked region from previous chrom
                print(f"{chrom}\t{start}\t{pos}")
                start = -1  # not needed actually
                in_masked_region = False
            pos = 0
            chrom = line.split(" ")[0].replace(">", "").strip()
        else:
            for c in line.strip():
                if not in_masked_region and c == "N":
                    in_masked_region = True
                    start = pos
                elif in_masked_region and c != "N":
                    in_masked_region = False
                    print(f"{chrom}\t{start}\t{pos}")

                pos += 1

if in_masked_region:  # last masked region in last chrom
    print(f"{chrom}\t{start}\t{pos}")

The main insight here is that end is not actually needed. That's what we have pos for.