ujwaltrivedi / coursera-ml-class

Coursera Machine Learning - Andrew Ng

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Coursera Machine Learning - Andrew Ng

Linear Regression with One Variable

  • Univariate (Single Variable) Regression Model
  • Model: h(x) = theta0 + theta1(x)

Feature Scaling (Gradient Decent)

  • Get value of all the features in the range of -1 >= 0 <= +1

    • X0 = (Value - AvgVal) / (max - min)
    • X0 = (#Bedrooms - Avg # Of Bedrooms) / (Max Bedrooms - Min Bedrooms)
     	mu = mean(X);
     	X_norm = bsxfun(@minus, X, mu);
    
     	sigma = std(X_norm);
     	X_norm = bsxfun(@rdivide, X_norm, sigma);
    

Learning Rate (Gradient Decent)

  • Choose learning rate small but not too small
    • If learning rate too small the convergence will take a lot of iterations and it will take a lot of time to find the minimum
    • If learning rate too big the convergence may never happen as the iteration may overshoot.
    • chose values like 0.001, 0.01, 0.1, 1 and increase.
    • Plot of J(theta) should go down not up.

Feature Selection

  • Create your own features if necessary

    • for predicting house price if you have features length and breadth you can create you own feature area as length * breadth
  • Try different polynomial functions of existing features

    • If you size as feature add new feature size^2
    • If using polynomial functions feature scaling becomes very important

Gradient Decent vs Normal Equation

  • Use Gradient Decent if you have too many features (n >= 1000)
  • Normal Equation not good for large (n > 1000) as it may slow down.

Linear Regression (Normal Equation)

pinv(X'*X)*X'*y

Classification

Logistic Regression

Sigmoid (Logistic) function

g (θ' * x)
z = θ' * x
g (z) = 1 / 1 + exp^-z
  • h(x) = P(y = 1|X; θ)
  • P(y = 1|X; θ) + P(y = 0|X; θ) = 1

Cost function

Gradient

Regularization (To avoid overfitting)

Keep all the features, but reduce magnitude/values of parameters .

  • Works well when we have a lot of features, each of which contributes a bit to predicting .

What if λ is set to an extremely large value (perhaps for too large for our problem, say λ - 10 10 )?

  • Algorithm works fine; setting to be very large can’t hurt it
  • Algortihm fails to eliminate overfitting.
  • Algorithm results in underfitting. (Fails to fit even training data well).
  • Gradient descent will fail to converge.

Regularized Cost function

Regularized Gradient

Octave Functions

* Matrix Inverse: pinv(X) - psuedo inverse function
* Matrix Transpose: X'
* Normal Equation: pinv(X'*X)*X'*y
* ones(3,3) - create 3x3 mat with all 1
* 2*ones(3,3) - create 3x3 mat with all 2
* a = rand(3,3) - create mat with rand numbers
* hist(a) - create histogram
* eye(4,4) - create identity matx
* size(A)
* [m, n] = size(A) - get size of matrix in m,n
* load(filename)
* clear varname - deletes the var
* who - list all var
* whos - list all var with details
* theta(2:end,1) - all the rows in theta starting from 2-to-end at col-1

About

Coursera Machine Learning - Andrew Ng


Languages

Language:MATLAB 97.3%Language:M 2.7%