Num of parameters

Question

Num of parameters

xxmen opened this issue 5 years ago · comments

If I set n_genomic_positions to be the length of chromosome, and n_25bp_factors to be 25 by default, then the number of paramter in this layer will be 25 * len(chr), which is really large. Should I only train these parameters on the pilot region only? Then how to adopt these parameters to the whole chromosome (since the numbers of parameter are different, 25 * len(chr) v.s. 25*len(pilot))? What should be the right way?

Thanks.

Jacob Schreiber · Answer 1 · Tue Jun 25 2019 12:56:20 GMT+0800 (China Standard Time)

Good question. There are two ways that you can approach this.

The first way---which we use in the paper---is to first train a model on the pilot regions, freeze the neural network, assay, and cell type factors, and then re-train the genome factors for each chromosome. This approach ensures that all the genome factors are in a common space across chromosomes.

The second approach is simply to train one model per chromosome, and not be concerned that the resulting genomic latent factors are not comparable across chromosomes.

If your goal is simply to produce the best imputations, the second approach is likely your best option. If your goal is to learn a consistent latent representation across the entire genome, you'll need to do the first thing.

Remember also that n_genomic_positions shouldn't necessarily be the length of the genome, but the length of the genome divided by 25.