sepro / CoNekT

CoNekT (short for Co-expression Network Toolkit) is a platform to browse co-expression data and enable cross-species comparisons.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build coexpression clusters with HCCA - maximum recursion depth exceeded

SantosRAC opened this issue · comments

Description of the bug

After uploading my network, I get an error when I try to build coexpression clusters using the admin Build option.

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Looking at flask run output:

After iteration 10, the algorithm stays in leftovers X (in my case it is # 1545):

Node 5305 out of 5306
Finding non-overlappers...
Clustering completed, handling left overs...
Leftovers : 5306
Leftovers : 1601
Leftovers : 1547
Leftovers : 1545
Leftovers : 1545
Leftovers : 1545
Leftovers : 1545

It stays "forever" printing Leftovers : 1545 until it dies. See error here

To Reproduce

I've been following 004_coexpression_network_cluster.

Details about my system:

  • OS: Linux
  • Webserver: Using ssh tunnel (running Flask run remotely and changing Firefox proxy settings).
  • Database: MariaDB/ MySQL

Additional information

I was able to build co-expression expression clusters (build clusters function returns "Succesfully cuild clusters using HCCA") for some of the species in my database; however, it fails for others. Could that be related to sequence identifiers, dataset size, or anything different?

Thanks a lot

Seems you are running into an edge-case when clustering your data (which I haven't seen before). As you indicate it does work for other species, there are a number of places where there could either be an issue with the data or the code.

Without data to reproduce the issue on my end there is little I can do/recommend/suggest here.

Hi @sepro , thanks a lot for your super fast response as usual.

Here is an example of dataset where the HCCA pipeline fails:
https://drive.google.com/drive/folders/1LLViwBANqdP22epVa6yM_tIwHNOxaOFA

Dex_cds.fasta - FASTA file for species ("Digitaria exilis" - "Dex")
Dex_cds_description.txt - Sequence descriptions
Dex_interproscan.tsv - InterProScan output (.TSV)
Dex_go_annotation.tsv - GO annotation (generated from InterProScan output)
Dex_tpmmatrix_network_1.txt - TPM matrix
Dex_expprofile_network_1.txt - Association between sample and descriptions
Dex_expprofile_colors_network_1.txt - Color assignments for samples
Dex_network_1.txt - Network (it was generated with pcc.py, from LSTrAP github repository)

I imported the network without any change in parameters (Limit : 30 ; PCC-cutoff : -2.0), and then I started building the co-expression clusters with built-in HCCA function.

Please, let me know if I can provide anything else to help you reproduce my analyses.

Thanks once again!

At first glance that seems to be everything needed to get to that point, thanks !

Replying to a message takes a few minutes at most, finding time to have a proper look at these data will not be quite as fast.

Though here you only have 4 samples, for co-expression analysis this is very, very low. I think the smallest analysis I've ever seen in a publication was 12-18 samples. For the public version of CoNekT I collected hundreds of samples per species. I wouldn't be surprised if this is what is causing issues.

Hi @sepro , this is a great response and it is maybe the main reason! I totally agree.

I say that because theother dataset that is currently not working is the one used in the LSTrAP-Cloud paper, with 5 datasets.

This is probably the problem. Thanks a lot once again!!! I will let you know if I get in trouble with larger datasets. :-)

Actually, the "example dataset".