OrangePomeranian / Analiza_Danych_projekt

Finding mutations in genomic data with the use of the chi2 test and Parallel functions in Python and R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analiza_Danych_projekt

More informations about this project in Projekt Zaliczeniowy PDF.pdf

We received 4 files in .vcf format of Genetic data of healthy and diseased Holstein-Friesian cows.

The files had more than 14.3 million records for diseased individuals and more than 13.8 mln for healthy individuals.

The aim was to find SNP-type mutations that may have a biological basis for the development of the disease, and to determine their relationship with selected parameters for each chromosome.

In this project were used libraries from Python such as ๐๐š๐ง๐๐š๐ฌ, ๐จ๐ฌ ๐Œ๐จ๐๐ฎ๐ฅ๐ž๐ฌ, ๐๐ฒ๐ญ๐ก๐จ๐ง ๐ฆ๐ฎ๐ฅ๐ญ๐ข๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ๐ข๐ง๐ , ๐๐ฎ๐ฆ๐๐ฒ, ๐’๐œ๐ข๐๐ฒ, and ๐ฌ๐ž๐š๐›๐จ๐ซ๐ง.From R ๐๐ฉ๐ฅ๐ฒ๐ซ, ๐ฆ๐ข๐œ๐ซ๐จ๐›๐ž๐ง๐œ๐ก๐ฆ๐š๐ซ๐ค, and ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ. SNPs were detected with the use of the ๐—ฐ๐—ต๐—ถ๐Ÿฎ ๐—ฐ๐—ผ๐—ป๐˜๐—ถ๐—ป๐—ด๐—ฒ๐—ป๐—ฐ๐˜† ๐˜๐—ฒ๐˜€๐˜ with Yamates correction. Additionally, we examined the ๐—ฃ๐—ฒ๐—ฎ๐—ฟ๐˜€๐—ผ๐—ป ๐—ฐ๐—ผ๐—ฟ๐—ฟ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป between our results and the length of the chromosome.

Due to the enormous data, we used ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—น๐—น๐—ฒ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป in ๐—ฃ๐˜†๐˜๐—ต๐—ผ๐—ป (๐—บ๐˜‚๐—น๐˜๐—ถ๐—ฝ๐—ฟ๐—ผ๐—ฐ๐—ฒ๐˜€๐˜€๐—ถ๐—ป๐—ด.๐—ฃ๐—ผ๐—ผ๐—น()) and ๐—ฅ (๐—บ๐—ฎ๐—ธ๐—ฒ๐—–๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ and ๐—ฐ๐—น๐˜‚๐˜€๐˜๐—ฒ๐—ฟ๐—”๐—ฝ๐—ฝ๐—น๐˜†). And in a result, we discovered that project in R appeared to be faster than in Python.

About

Finding mutations in genomic data with the use of the chi2 test and Parallel functions in Python and R


Languages

Language:R 76.4%Language:Python 23.6%