liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CDR3 clustering among multiple samples (bulk RNA-seq)

322029 opened this issue · comments

Thank you for your great tool. I have three questions as for clustering.

  1. As indicated in the title, is it possible to cluster CDR3nt among multiple samples? Although CDR3nt in a sample can be clustered by "trust-cluster.py", inputting multiple trust_report.tsv files is not allowed. I suppose putting together the trust_report.tsv files into a whole and executing trust-cluster.py will work. Is it correct?

  2. Alternatively, trust_report.tsv files provide "cid" (assemble~), and the cids seem to be consistent with other samples'. So, can I substitute cid for cluster?

  3. I'm wondering whether CDR3nt in the same cluster show similar antigenic specificities or their sequences are just similar.

I'd appreciate if you could reply.

  1. Yes, you can put them together and then run the cluster script.
  2. Though the cluster script does not use that column information, but you shall still rename the assembled contig's names in the cid column to track the samples.
  3. The cluster is just based sequence similarity. For BCR, this is mainly for clustering the clonotypes from the same lineage that become different due to SHM.

Thank you for your prompt response!
As for cid, I found some of the same cids separating into a few lines, and their CDR3nt are slightly different. So, I'm interested in the difference between cid and cluster. Would you tell me about this?

cid is the consensus ID, where each consensus sequence may encode multiple CDR3 sequences. They will be automatically in the same cluster.

I understand. Thank you so much!