Data Science - MasterNotes

Python
- Comprehensions
- Functions
  - Factorial
  - Transpose a Matrix
Statistics
- Mean
- Median
- Mode
  - Central Tendency
- Variance
- Five-Number Summary
- IQR
- Outliers
- Remove Outliers
- Standard Deviation
Probability


Machine Learning Workflow		Sorting Algorithms		Derivations		MathJax		Bash Scripter		Relevant Links

Python

Comprehensions

# Comprehension 
[ num for num in range(100) ]

# Comprehension with condition
[  num for num  in range(100)  if num   > 5    ]
[ char for char in expression  if char in "()" ]

Functions

Factorial

def factorial(n):
    prod = 1
    for num in range(n+1):
        prod *= num
    return prod

Transpose a Matrix

matrix = [[2, 1, 5], 
          [9, 2, 8], 
          [1, 7, 3]]


2	1	5
9	2	8
1	7	3

for row in zip(matrix[0], matrix[1], matrix[2]):
    print(row)
    
(2, 9, 1)
(1, 2, 7)
(5, 8, 3)

list(zip(*matrix))
[ (2, 9, 1), (1, 2, 7), (5, 8, 3) ]

[[*tup] for tup in zip(*matrix)] or
[list(tup) for tup in zip(*matrix)]

[ [2, 9, 1], [1, 2, 7], [5, 8, 3] ]


2	9	1
1	2	7
5	8	3

Statistics

Mean

def mean(lst, trim=0):
    lst_ = lst.copy()
    if trim > 0:
        lst_ = sorted(lst_)[trim:-trim]
    return sum(lst_) / len(lst_)

Median

def median(lst):
    lst_sorted = sorted(lst)
    mid = int(len(lst) / 2)
    # odd
    if len(lst) % 2:
        return lst_sorted[mid]
    else:
        return mean([lst_sorted[mid-1], lst_sorted[mid]])

Mode

def mode(lst):
    dict_counter = {}
    for item in lst:
        if item in dict_counter.keys():
            dict_counter[item] += 1
        else:
            dict_counter[item] = 1
    max_freq = max(list(dict_counter.values()))
    modes = [item for item, freq in dict_counter.items() if freq == max_freq]
    
    if len(modes) == len(lst):
        return None
    else:
        return modes

Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.
As such, measures of central tendency are sometimes called measures of central location.
They are also classed as summary statistics.

The Median is Resistant to Outliers

The primary difference between the mean or median is their levels of resistance to outliers.
The mean is not very resistant to outliers, especially when dealing with a dataset that has non-symmetric outliers.

If a collection has extreme outliers, the mean may describe the distribution "center" inaccurately. A classic example of this is when looking at household incomes. Households with far greater incomes skew the mean to the point where it no longer accurately describes the dataset.

Example

Consider the incomes of the following ten households. By calculating both the mean and median, it is possible to make a determination as to which of these two statistics describes the incomes most accurately.

$$ A = [\quad$30,000,\quad$35,000,\quad$41,000,\quad$45,000,\quad$50,000,\quad$57,000,\quad$57,500,\quad$59,000,\quad$60,000,\quad$457,000\quad] $$ $$ mean = \mu = $89,000 \qquad median = \tilde x = $53,500 $$

_Solution _{The mean of the household incomes is $89,150 and the median is $53,500. Here the median does a better job of describing a typical household income from the collection. The mean is greatly skewed by a single income that is far greater than the others. The mean implies that a typical household would have over $89,000 of income, despite there being only one household with an income greater than $60,000.}

The Mean is Preferable in Large Datasets with Few Outliers

There are some situations where the mean is considered a preferable measure to median; typically these are situations in which there are a large number of items in the collection, and there are not any outliers (or the outliers are symmetric).
Also, inferential statistics are largely built upon measurements of the mean, so it is the statistic which is used most often.

Mode is Preferable When Using Categorical Data

In a collection with categorical data that is (generally) not ordinal in nature, the mode is the best measure of center, though the use of the term "center" may be taking a bit of liberty.
The mode can also be a useful descriptive statistic when there isn't one single central concentration of values.

A common example of this would be the weights of household pets. If one were to take a sample of housepet weights, there would likely be a concentration of cats, each weighing between eight and twelve pounds, and a concentration of dogs weighing between twenty and thirty five pounds. The mean or median may tell us that a typical household pet weighs fifteen pounds, but that description doesn't accurately describe the typical weight of either cats or dogs. A distribution such as this is often referred to as bi-modal.

Type of Variable	Best Measure of Central Tendency
Nominal	Mode
Ordinal	Median
Interval/Ratio (not skewed)	Mean
Interval/Ratio (skewed)	Median

Skewness refers to a distortion or asymmetry that deviates from the symmetrical bell curve, or normal distribution, in a set of data. If the curve is shifted to the left or to the right, it is said to be skewed.

Five-Number Summary

The five number summary gives a more in-depth description of a numerical collection of values. In addition to identifying a measure of center `median`, it gives us more insight into the way the values are distributed. The five number summary consists of the following values:

Five Number Summary

The five-number summary is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles

The minimum
The lower (first) quartile: $Q_1$
The median
The upper (third) quartile $Q_3$
The maximum

The values are often expressed in a tuple, as follows

$ (\quad min,\quad Q_1,\quad median,\quad Q_3,\quad max\quad)$

def five_number_summary(lst):
    sorted_list = sorted(lst)
    lower_half = sorted_list[0: int(len(lst) / 2) + (len(lst) % 2)]
    upper_half = sorted_list[int(len(lst) / 2): ]
    
    q1 = median(lower_half)
    q3 = median(upper_half)
    
    return min(lst), q1, median(lst), q3, max(lst)

IQR

def iqr(lst):
    _, q1, _, q3, _ = five_number_summary(lst)
    
    return q3 - q1

Detect Outliers

def detect_outliers(lst, outlier_coef=1.5):
    outliers = []
    _,q1,_,q3, _ = five_number_summary(lst)
    iqr_ =iqr(lst)
    
    for num in lst:
        if num < q1 -iqr_*outlier_coef or num > q3 + outlier_coef*iqr_:
            outliers.append(num)

    return outliers
    
a = [-500,12,32,54,45,87,89,61,31,12549] 

print(detect_outliers(a,1.5)) # [-500, 12549]

Remove Outliers

def remove_outliers(lst, outlier_coef=1.5):
    outliers = detect_outliers(lst, outlier_coef)
    output = lst.copy()
    
    for num in outliers:
        if num in output:
            output.remove(num)
            
    return output

a =  [590, 615, 575, 608, 350, 1285, 408, 540, 555, 679]
print(remove_outliers(a)) # [590, 615, 575, 608, 540, 555, 679]

Variance

The purpose of both the variance and standard deviation statistics are to express an easily interpretable measure of spread in a collection.
The variance can be interpreted as the average squared deviations of each number from the mean, and it is calculated as such.
The reason why we square the deviations is so we can deal with only positive values, If we didn't square the values our variation would end up being zero for every distribution
Population Variance

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$

Sample Variance

$$ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \overline x)^2 $$

Recall
- $\mu$ : population mean
- $\overline x$ : sample mean

You can see the two formulas for variation are very similar, the primary difference being that the population variance is averaged by dividing by $n$
In the computation of a sample standard deviation, we use $n-1$. The Bessel's correction
This correction is made, because it partially corrects the bias in the estimation of a population variance. Bessel's

Example 1

Find the variance of the following population $A$ assume all measurements are in inches:
$$ A = [\quad 73,\quad 65,\quad72,\quad74,\quad69,\quad70,\quad72,\quad73\quad] $$

Step 1 : Find the mean of $A$

$$ \mu = \frac{73+65+72+74+69+70+72+73}{8} = 71 $$

Step 2 : Find the sum of the squared differences.

$$ \sum_{i=0}^8 (x_i - \mu)^2 \quad = \quad (73-71)^2 + (65-71)^2 + \dots + (73-71)^2 \quad = \quad 60 $$

Step 3 : Divide the sum above by $n$ (or multiply by \frac{1}{n})

$$ o^2 = \frac{60}{8} = 7.5 $$

_Solution _{We can see here that our population variance is $7.5 inches^2$. It is important to note here that a variation calculated will always result in terms of the original unit squared. This leaves something to be desired in terms of interpretability; we'll discuss that in the second half of this lesson when dealing with standard deviations.}

Example 2

Calculate the variance for the same numerical collection above, this time assuming it is a sample, call the sample dataset $B$.

$$ B = [\quad 73,\quad 65,\quad72,\quad74,\quad69,\quad70,\quad72,\quad73\quad] $$

Step 1 : Find the mean of $B$

$$ \bar x = \frac{73+65+72+74+69+70+72+73}{8} = 71 $$

Step 2 : Find the sum of the squared differences.

$$ \sum_{i=0}^8 (x_i - \mu)^2 \quad = \quad (73-71)^2 + (65-71)^2 + \dots + (73-71)^2 \quad = \quad 60 $$

Step 3 : Divide the sum above by $n$ (or multiply by \frac{1}{n})

$$ o^2 = \frac{60}{7} = 8.571 $$

_Solution _{The variance of the sample dataset $B$ is $8.751$, larger than the population's variance.}

A note about the application of Bessel's correction:

The difference in the variances between the sample and the population are a byproduct of applying Bessel's correction. In short, when one finds the variance of a population, they are sure to include all possible outliers. In contrast, when sampling from a population there is a chance that very few (or none!) outliers will end up in the sample dataset. Because of this the variance will likely be smaller than the true variance of the population. Because the object is to make inferences about a population from a sample, the application of Bessel's correction makes the variance from a sample more likely to be accurately representative of the population.

Population Variance

$$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$$

Sample Variance

$$ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \overline x)^2 $$

Recall
- $\mu$ : population mean
- $\overline x$ : sample mean

def variance(lst, sample=True):
    mean_ = mean(lst)
    total = 0
    for item in lst:
        total += (item - mean_)**2
    return total / (len(lst) - sample)

Standard Deviation

Population and Sample

As we mentioned above, the variance does a good job of describing the spread of a population or sample.
However imagining the average spread in terms of the original units squared can be difficult to interpret.
Because of this, we typically take the square root of our variance, this yields us a standard deviation.
The standard deviation ends up being in the same units as the original data.
A standard deviation can be informally interpreted as: "a typical item from this collection can be expected to have the value of the mean plus or minus the standard deviation."
This is formally defined by the empirical rule, or the 68/95/99 rule; we won't go into great detail about this rule now, but it will be covered later in the statistics block.

Notations :

$\sigma\quad :$ lowercase sigma is used for the standard deviation of a population

$s\quad :$ lowercase $s$ is typically used to representation the standard deviation of a sample

$sd\quad :$ the combination of lowercase $sd$ is also commonly used for both standard deviations

Population Standard Deviation:

$$ \sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2} $$

Sample Standard Deviation:

$$ s = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \overline x)^2} $$

from math import sqrt

def stdev(lst, sample=True):
    return sqrt(variance(lst, sample))

Probability

Definition of Set

In mathematics, a set is a well-defined collection of objects
A set is also an object in itself
Sets must be comprised of unique objects: NO DUPLICATES
If the outcome of a random experiment is unknown, and all of the possible outcomes are predictable in nature, this set of outcomes is known as the Sample Space, notated with a capital $S$, or as the “Universal Set”, denoted $U$, or $\Omega$ (capital omega)

def dedupe_in_order(lst):
    deduped = []

    for element in lst:
        if element not in deduped:
            deduped.append(element)

    return deduped

Set Union

The union of two sets is a new set that contains all of the elements that are in at least one of the two sets.
Common Notation for the union of events A and B:
A ∪ B
There is a distinct relationship between the set theory definition of union, and the logical operator OR.

def union(set1, set2):
    set_union = set1.copy()
    for item in set2:
        if item not in set_union:
            set_union.append(item)
    return set_union

Set Union for more than 2 events

The union can be extrapolated to more than two events

Common Notation multiple events:
A ∪ B ∪ C
A ∪ B ∪ C ∪ D
- NOTE: The order of the union operation does not matter

def union_mult_sets(*mult_sets):
    set_union = []
    for lst in mult_sets:
        for item in lst:
            if item not in set_union:
                set_union.append(item)
    return set_union

Set Intersection

The intersection of two sets is a new set that contains all of the elements that are members of both sets which comprise the intersection
Common Notation for the intersection of events A and B:
AB or A ∩ B
There is a distinct relationship between the set theory definition of intersection, and the logical operator AND.

def intersection(a,b):
    intersected = []
    for item in a:
        if item in b:
            intersected.append(item)
    return intersected

def intersection_mult(*mult_sets):
    set_intersect = []
    if len(mult_sets) > 1 and len(mult_sets[0]) > 0:
        for item in mult_sets[0]:
            is_member = True
            for set_ in mult_sets[1:]:
                if item not in set_:
                    is_member = False
                    break
            if is_member:
                set_intersect.append(item)
    return set_intersect

Set Difference

Set Difference is anything in one set that isn’t the other.
- Syntax: A\B, A-B, A.difference(B)
- Example: A = {1, 2, 3, 4, 5} B = {5, 6, 7, 8, 9} A - B = {1, 2, 3, 4} B - A = {6, 7, 8, 9}

def difference(set1, set2):
    set_difference = []
    for item in set1:
        if item not in set2:
            set_difference.append(item)
    return set_difference

Complement

The complement of a set is the set which represents all members of the sample space which are not in the event.
Common Notation for the complement of events A and B:
A’ or Ac or A0 or Ā or ¬A or ~A
There is a distinct relationship between the complement and the logical operator NOT

def complement(sample_space, set1):
    return difference(sample_space, set1)

Axioms of Probability

Equating Set Algebra Laws with Boolean Logic

Consider the concept of True as being a logical descriptor for a set $A$ containing $n$ elements.
In this sense, all the above laws will apply to both Sets and Boolean operations

Set Operator	Python Boolean Operator
Union	`or`
Intersection	`and`
Complement	`not`

Commutative

A ∪ B = B ∪ A
AB = BA

Set Logic

set1 = {'a', 'b', 'c'}
set2 = {'c', 'd', 'e'}

print(set1.union(set2) == set2.union(set1)) # --> True
print(set1.intersection(set2) == set2.intersection(set1)) # --> True

Boolean Logic

a = True
b = False

print( (a or b) == (b or a) ) # --> True
print( (a and b) == (b and a) ) # --> True

Associative

(A ∪ B) ∪ C = A ∪ (B ∪ C) = A ∪ B ∪ C
(AB)C = A(BC) = ABC

Set Logic

set1 = {'a', 'b', 'c'}
set2 = {'c', 'd', 'e'}
set3 = {'a', 'e', 'f'}

print((set1.union(set2)).union(set3) == (set3.union(set2)).union(set1)) # --> True
print((set1.intersection(set2)).intersection(set3) == (set3.intersection(set2)).intersection(set1)) # --> True

Boolean Logic

a = True
b = False
c = True

print( ((a or b) or c) == (a or (b or c)) ) # --> True
print( ((a and b) and c) == (a and (b and c)) ) # --> True

Distributive

A ∪ (BC) = (A ∪ B)(A ∪ C)
A(B ∪ C) = (AB) ∪ (AC)

Set Logic

set1 = {'a', 'b', 'c'}
set2 = {'c', 'd', 'e'}
set3 = {'a', 'e', 'f'}

print( (set2.intersection(set3)).union(set1) == (set1.union(set2)).intersection((set1.union(set3))) ) # --> True
print( (set2.union(set3)).intersection(set1) == (set1.intersection(set2)).union((set1.intersection(set3))) ) # --> True

Boolean Logic

a = True
b = False
c = True

print( (a or (b and c)) == ((a or b) and (a or c)) ) # --> True
print( (a and (b or c)) == ((a and b) or (a and c)) ) # --> True

Idempotent Laws

when redundant operations achieve the same result
A ∪ A = A
AA = A

Set Logic

set1 = {'a', 'b', 'c'}

print( set1.union(set1) == set1 ) # --> True
print( set1.union(set1) == set1 ) # --> True

Boolean Logic

a = True

print( (a or a) == a ) # --> True
print( (a and a) == a ) # --> True

Domination Laws

Recall:
- U = Universal Set, he set which contains all subsets
- ∅ = Empty Set = { }
A ∩ U = A
A ∩ ∅ = ∅

Absorption Laws

A ∪ (AB) = A
A(A ∪ B) = A

Set Logic

set1 = {'a', 'b', 'c'}
set2 = {'c', 'd', 'e'}

print( set1.intersection(set2).union(set1) == set1 ) # --> True
print( set1.intersection(set1.union(set2)) == set1) # --> True

Boolean Logic

a = True
b = False
c = True

print( (a or (a and b)) == a) # --> True
print( (a and (a or b)) == a) # --> True

Identity Property

A ∪ ∅ = A

Complement Laws for Universal and Empty Set

~∅ = U
~U = ∅

Involution Law

~( ~A) = A

a = True
print( (not (not a)) == a) # --> True

A helpful, unnamed law

AB ∪ A~B = A

a = True
b = False

print( ((a and b) or (a and not b)) == a) # --> True

DeMorgan’s Laws

1st: ~(A ∪ B) = ~A ~B
2nd: ~(AB) = ~A ∪ ~B

These laws are very helpful for logic and circuit reduction. They are commonly explored in interview questions

~(A ∪ B) = ~A ~B

a = True
b = False

print( (not (a or b)) == ((not a) and (not b)) ) # --> True

~(AB) = ~A ∪ ~B

a = True
b = False

print( (not (a and b)) == (not a or not b) ) # --> True

Calculating Probability

Probability Theory

Inferential Statistics is the practice of using mathematical analysis to make inferences about a population from a sample. The mathematics which underly inferential statistics are largely based on probability theory.

Calculating probability is attempting to figure out the likelihood of a specific event happening, given some number of attempts. The most fundamental and important probability calcululation is defined as:

The probability of some event $A$ occuring is the number of possible outcomes in that event, divided by the total number of possible outcomes in the sample space. That is,

$$ \text{Number of Outcomes in } A = |A| = \text{"The Cardinality of } A\text{"} $$ $$ \text{Number of Outcomes in } S = |S| = \text{"The Cardinality of } S\text{"} $$

$$ P(A) = \frac{|A|}{|S|} $$

Example

Given a fair six-sided die, what is the probability of rolling a 5?

$ Event\quad A = \text{Rolling a five}$

$P(A) = \frac{1}{6} = .166667$

_Solution _{The total number of possible outcomes is six, in other words the cardinality of the sample space is six. There is only one outcome in which our die will show five pips, so the cardinality of our event $A$ is 1. Hence, our probability is $\frac{1}{6}$.}

Notation	Meaning
$P(A)$	Probability of A
$P(A^c)$	Probability of A complement
$P(AB)$	Probability of A intersect B
$P(A \cup B$	Probability of A union B
$P(A \mid B)$	probability of A given B

Permutations

A permutation is one of several possible variations in which a set or number of objects can be ordered or arranged.
A permutation can be thought of as an arrangement of a number of items
$nPk$
- where $n$ is the number of possible items
- $k$ is how many of those items to arrange

Note: ORDER MATTERS

Discovery by Counting

$$ nPk = \frac{n!}{(n-k)!} $$

If we consider $n$ to be the base of a counting system, then we can determine all permutations $k$ by a counting/reduction approach.

Count in base $n$ system
- ex: $n = 3$

$\text{ 000 010 020 100 110 120 200 210 220 001 011 021 101 111 121 201 211 221 002 012 022 102 112 122 202 212 222 }$

Reduce counts that have duplicate items

$\text{ 000 010 020 100 110 120 200 210 220 001 011 021 101 111 121 201 211 221 002 012 022 102 112 122 202 212 222 }$

Consider $k$ items
- ex: $k = 3$

012 021 102 120 201 210

- ex: $k = 2$

12 21 02 20 01 10

- ex: $k = 1$

2 1 0

$$ nPk = \frac{n!}{(n-k)!} $$

def permutations(n, k):
    return int(factorial(n) / factorial(n-k))

Slightly more optimized:

def permutations(n, k):
    perm = 1
    for i in range(n, n-k, -1):
        perm *= i
    return perm

Combinations

$$ nCk = \frac{n!}{((n-k)! k!)} $$

def combinations(n, k):
    return int(factorial(n) / (factorial(n-k) * factorial(k)))

# Slightly more optimal:
def combinations(n, k):
    perm = 1
    for i in range(n, n-k, -1):
        perm *= i
    return int(perm / factorial(k))

Bernoulli

def bernoulli(p_success=0.5):
    draw = random() # gets a val betw 0 and 1

    if draw < p_success:
        return True
    else:
        return False

Binomial PMF

3 parameters

$n$ = number of bernoulli trials
$p$ = probability of success on any given bernoulli trial
$k$ = specific number of successes for which to find the probability

`binomial_pmf(n,p,k)`

$$ P(X=k) = {n \choose k} p^k(1-p)^{n-k} $$

def binomial_pmf(n, k, p=0.5):
    return combinations(n, k) * (p**k) * (1-p)**(n-k)

Binomial PMF Dictionary

`binomial_pmf_dict()`

This should take 4 parameters:

n the number of trials
k_low the low value of $k$ in the dictionary
k_high the high value of $k$ in the dictionary
p=0.5 the probability of success of a given bernoulli trial

def binomial_pmf_dict(n, k_low, k_high, p=0.5):
    d = dict()

    for k in range(k_low, k_high+1):
        d[k] = binomial_pmf(n, k, p)

    return d

d = binomial_pmf_dict(8, 0, 8, p=0.25)

for k, v in d.items():
    print(f'{k}: {v}')

Poisson PMF

`poisson_pmf()`

$e = 2.71828$
Note, both the constant e and the factorial() function are available from the math module.

$$ P(\lambda, k \text{ events}) = \frac{e^{-\lambda}\lambda^k}{k!} $$

from math import e, factorial

# print(e) # 2.718281828459045

def poisson_pmf(lmbda, k):
    return lmbda**k * e**(-lmbda) / factorial(k)

Poisson PMF Dictionary

`poisson_pmf_dict()`

your parameters will be
- lmbda
- low_k
- high_k

Holding lmbda constant, write a function that returns a dictionary showing the probs for number of events from low_k to high_k (inclusive)

def poisson_pmf_dict(lmbda, low_k, high_k):
    d = dict()

    for k in range(low_k, high_k+1):
        d[k] = poisson_pmf(lmbda, k)

    return d

d = poisson_pmf_dict(10, 0, 30)

for k, v in d.items():
    print(f'{k}: {round(v, 6)}')

Geometric PMF

`geometric_pmf()`

p : probability
k : number of failures (inclusive or exclusive of the 1st success)
inclusive=True : whether or not to use inclusive or exclusive pmf

PMF calculating the number of trials up to and including the first success

$$ P(X=k) = p (1-p)^{k-1} $$

PMF calculating the number of trials before the first success

$$ P(X=k) = p (1-p)^{k} $$

def geometric_pmf(p, k, inclusive=True):
    return p * (1-p)**(k-inclusive)
    # if inclusive:
    #     return p * (1-p)**(k-1)
    # else:
    #     return p * (1-p)**k

Poisson CDF Dictionary

`poisson_cdf()`

your parameters will be
- lmbda
- high_k

def poisson_cdf(lmbda, k_high):
    cdf = 0.0

    for k in range(k_high+1):
        cdf += poisson_pmf(lmbda, k)

    return cdf

Binomial CDF

`binomial_cdf(n, k_high, p=0.5)`

$$ P(X \le k) = \sum_{i=0}^k {n \choose i}p^i(1-p)^{n-i} $$

def binomial_cdf(n, k_high, p=0.5):
    cumulative = 0.0

    for k in range(0, k_high+1):
        cumulative += binomial_pmf(n, k, p)

    return cumulative

Machine Learning Workflow

Cross Validation

k-fold Cross Validation

Sorting Algorithms

Bubble Sort

# hand coded algorithm
# library-called bubble sort

Derivations

Derivation of a Perceptron

MathJax

$$\sum_{n=1}^\infty \frac{1}{n^2} \to \textstyle \sum_{n=1}^\infty \frac{1}{n^2} \to \displaystyle \sum_{n=1}^\infty \frac{1}{n^2}$$

$\displaystyle \lim_{t \to 0} \int_t^1 f(t), dt$ versus $\lim_{t \to 0} \int_t^1 f(t), dt$.

$\Biggl(\biggl(\Bigl(\bigl((: x : )\bigr)\Bigr)\biggr)\Biggr)$

Bash Scripter

bash profile location on OSX : ~./bash_profile

function gitadder(){
git pull
git add .
git commit -m "Auto Updated: $(date '+%a)%M:%H %h %d %Y)"
git push
}

Data Science - MasterNotes

Population and Sample

Axioms of Probability

Equating Set Algebra Laws with Boolean Logic

Commutative

Associative

Distributive

Idempotent Laws

Domination Laws

Absorption Laws

Identity Property

Complement Laws for Universal and Empty Set

Involution Law

A helpful, unnamed law

DeMorgan’s Laws

~(A ∪ B) = ~A ~B

~(AB) = ~A ∪ ~B

Probability Theory

Slightly more optimized:

binomial_pmf(n,p,k)

binomial_pmf_dict()

poisson_pmf()

poisson_pmf_dict()

geometric_pmf()

poisson_cdf()

binomial_cdf(n, k_high, p=0.5)

Tools

Lectures

Advanced

Other

About

Languages

`binomial_pmf(n,p,k)`

`binomial_pmf_dict()`

`poisson_pmf()`

`poisson_pmf_dict()`

`geometric_pmf()`

`poisson_cdf()`

`binomial_cdf(n, k_high, p=0.5)`

`Tools`

`Lectures`

`Advanced`

`Other`