bedtools merge input too long
yongze-yin opened this issue · comments
Hi,
I found there maybe a bug in bedtools merge, when the input sequence name is too long, only part of the name will be outputted, also the merged coordinate will not be shown. Just like what was reported before, bedtools merge may also have the same bug. Thank you!
Can you provide a minimal example file and let us know what version you are using?
Thank you for your timele response Prof. Quinlan.
I am using bedtools v2.30.0. Supposing my input is the following and I name my sorted input as sorted.bed which contains 4 sequences:
ref_0:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555 1 415
ref_100006:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555 1 287
ref_100008:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555 3 235
ref_10000:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF002901445|GCF014201805|GCF014646915|GCF003217515|GCF000519345|GCF000186385|GCF003444775|GCF014647435|GCF014648135|GCF013337115|GCF001507665|GCF001485435|GCF001007995|GCF002198095|GCF002017875|GCF014648115|GCA018260555 4 276
ref_10001:GCF014648095|GCF009755355|GCF003028415|GCF000317835|GCF014647655|GCF000020685 422 1291
When I call the following command
bedtools merge -i sorted.bed > merged_sort.bed
The output would be:
ref_0:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445|GCF
ref_100006:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF00025244
ref_100008:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF00025244
ref_10000:GCF900176165|GCF014646895|GCF007990775|GCF000745915|GCF003574085|GCF000423425|GCF017497985|GCF003574035|GCF000620065|GCF000024425|GCA000482765|GCF003351145|GCF000771745|GCF004307015|GCF000745065|GCF000744175|GCF000744885|GCF003336745|GCF002964845|GCF004684245|GCF000423905|GCF000381045|GCF000236585|GCF000376265|GCF000373145|GCF003426945|GCF001280255|GCF001880325|GCF000794385|GCF014647535|GCF900102145|GCF001535545|GCF000091545|GCA002355995|GCF000421625|GCA011053565|GCA001311585|GCA001311545|GCF003226535|GCF000092125|GCF003574355|GCA015478585|GCF014647075|GCF014201875|GCF014647055|GCF006335125|GCF000701425|GCF009017495|GCF001949125|GCF000701405|GCF001644565|GCF000378445|GCF009982895|GCF007280555|GCF003860465|GCF003173015|GCF004634215|GCF003966215|GCF014653275|GCF000190555|GCF011067105|GCF000008565|GCA018260275|GCF014201885|GCF009377345|GCF002953415|GCF000482805|GCF018863415|GCF002869765|GCF004758605|GCF002897375|GCF000381345|GCF000599865|GCF000745175|GCF000196275|GCF900109185|GCF001424185|GCF000252445
ref_10001:GCF014648095|GCF009755355|GCF003028415|GCF000317835|GCF014647655|GCF000020685 422 1291
The first 3 of the sequences' names are too long which is not complete in the output file, and the merged coordiantes are also truncated. But the last sequence name is short, which can be shown correctly.
Thank you!