More informations about this project in Projekt Zaliczeniowy PDF.pdf
We received 4 files in .vcf format of Genetic data of healthy and diseased Holstein-Friesian cows.
The files had more than 14.3 million records for diseased individuals and more than 13.8 mln for healthy individuals.
The aim was to find SNP-type mutations that may have a biological basis for the development of the disease, and to determine their relationship with selected parameters for each chromosome.
In this project were used libraries from Python such as ๐๐๐ง๐๐๐ฌ, ๐จ๐ฌ ๐๐จ๐๐ฎ๐ฅ๐๐ฌ, ๐๐ฒ๐ญ๐ก๐จ๐ง ๐ฆ๐ฎ๐ฅ๐ญ๐ข๐ฉ๐ซ๐จ๐๐๐ฌ๐ฌ๐ข๐ง๐ , ๐๐ฎ๐ฆ๐๐ฒ, ๐๐๐ข๐๐ฒ, and ๐ฌ๐๐๐๐จ๐ซ๐ง.From R ๐๐ฉ๐ฅ๐ฒ๐ซ, ๐ฆ๐ข๐๐ซ๐จ๐๐๐ง๐๐ก๐ฆ๐๐ซ๐ค, and ๐ฉ๐๐ซ๐๐ฅ๐ฅ๐๐ฅ. SNPs were detected with the use of the ๐ฐ๐ต๐ถ๐ฎ ๐ฐ๐ผ๐ป๐๐ถ๐ป๐ด๐ฒ๐ป๐ฐ๐ ๐๐ฒ๐๐ with Yamates correction. Additionally, we examined the ๐ฃ๐ฒ๐ฎ๐ฟ๐๐ผ๐ป ๐ฐ๐ผ๐ฟ๐ฟ๐ฒ๐น๐ฎ๐๐ถ๐ผ๐ป between our results and the length of the chromosome.
Due to the enormous data, we used ๐ฝ๐ฎ๐ฟ๐ฎ๐น๐น๐ฒ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป in ๐ฃ๐๐๐ต๐ผ๐ป (๐บ๐๐น๐๐ถ๐ฝ๐ฟ๐ผ๐ฐ๐ฒ๐๐๐ถ๐ป๐ด.๐ฃ๐ผ๐ผ๐น()) and ๐ฅ (๐บ๐ฎ๐ธ๐ฒ๐๐น๐๐๐๐ฒ๐ฟ and ๐ฐ๐น๐๐๐๐ฒ๐ฟ๐๐ฝ๐ฝ๐น๐). And in a result, we discovered that project in R appeared to be faster than in Python.