Eunsol-Lee / Bioinformatics

Bioinformatics A-Z

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bioinformatics

Bioinformatics A-Z

This repository is my own note. I plan to sort out what I learned while I was in graduate school.

Week 5: Basics of Bioinformatics

  • Bioinformatics
    • Interdisciplinary field that combines biology, computer science, statistics, etc.
  • Introduction to probability distributions  - Random variable (확률 변수)   - Discrete random variable
    • Continuous random variable
    • Probability functions
      • 0 ≤ p(x) ≤ 1.0
      • Are under a probability function is always 1
    • Probability mass function (pmf)
      • discrete probability distribution
      • ex) p(x=1)=1/6
    • Probability density function (pdf)
      • continuous probability distribution
    • Cumulative distribution function (cdf)
      • P(x≤1)=1/6  - Expected value and variance
      • Expected value (mean)
        • mean of random variable x
        • E(X) = µ
      • Variance (standard deviation squared)      - s(sigma)^2 = Var(x) = E(x-µ)^2
        • expected (or average) squared distance (or deviation) from mean
        • Var(X) = s(sigma)^2
        • SD(X) = s(sigma)
    • Binomial probability distribution    - n: observation
      • binary outcome
      • constant probability for each observation
      • X ~ Binom(n, p)      - E(X) = np
        • Var(X) = np(1-p)      - SD(X)= sqrt(np(1 - p))
    • Normal distribution    - N(µ, s(sigma)^2)
    • Standard normal distribution
      • N(0, 1)    - Z
    • t-distribution
      • looks like normal, but slightly thicker tail than normal
      • occurs when you estimate mean and variance of distribution from data
      • degree of freedom depends on sample size of estimation
      • when d.f. is large, t converges to normal  - Chi-square distribution    - Z^2 follows x1^2 (chisqaure distribution with 1 d.f.)
      • Z1^2 + Z2^2 follow x2^2 (d.f.=2)
  • Basic statistics for BI
    • P-value
      • probability that one would observe same or more extreme observation under null hypothesis  - null hypthesis(H0)
      • uninteresting situation
    • alternative hypthesis(H1)
      • interesting situation  - Easy-to-use statistic's properties
      • designed to be zero for H0, and non-zero for H1
      • it follows a known distribution (normal, t, ..) under H0
      • z-score
        • if a static follows (N0,1) under H0
    • Central limit theorem
      • sample is large --> normally distributed    - sample is small --> often follow t-distribution
    • Normal vs chisquare distribution
      • z follows N(0,1) --> z^2 follows chi-square distribution with d.f. 1
      • pchisq(3.2^2, df=1, lower.tail=F)  - Statistical power
      • chance that data will be significant if H1 is true    - opposite concept of P-value
      • function of sample size, effect size
      • 대립가설이 사실임에도 불구하고 귀무가설을 채택할 확률: 2종 오류(β error)
      • statistical power = 1 - β
    • Permutation test
      • repeatedly shuffle data to impose null hypothesis
      • useful if statistic doesn't have known distribution, or if sample size is too small for CLT to work
    • 2x2 table analysis
      • chi-square test formula
      • Fisher's exact test
    • t-test
    • ANOVA
      • Analysis of Variance
      • If means of >2 groups are equal
      • follows F-distribution
    • Log-rank test
      • for survival analysis
      • Kaplan-Meier curve (Visualization)
    • Linear regression
    • Logistic regression

About

Bioinformatics A-Z