arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bedtools merge input too long

yongze-yin opened this issue · comments

Hi,

I found there maybe a bug in bedtools merge, when the input sequence name is too long, only part of the name will be outputted, also the merged coordinate will not be shown. Just like what was reported before, bedtools merge may also have the same bug. Thank you!

Can you provide a minimal example file and let us know what version you are using?

Thank you for your timele response Prof. Quinlan.

I am using bedtools v2.30.0. Supposing my input is the following and I name my sorted input as sorted.bed which contains 4 sequences:

ref_0:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555	1	415
ref_100006:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555	1	287
ref_100008:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555	3	235
ref_10000:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555	4	276
ref_10001:GCF014648095|GCF009755355|GCF003028415|GCF000317835|GCF014647655|GCF000020685	422	1291

When I call the following command

bedtools merge -i sorted.bed > merged_sort.bed

The output would be:

ref_0:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF
ref_100006:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF00025244
ref_100008:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF00025244
ref_10000:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445
ref_10001:GCF014648095|GCF009755355|GCF003028415|GCF000317835|GCF014647655|GCF000020685	422	1291

The first 3 of the sequences' names are too long which is not complete in the output file, and the merged coordiantes are also truncated. But the last sequence name is short, which can be shown correctly.

Thank you!