vaitybharati

Vaitybharati's repositories

Assignment-04-Simple-Linear-Regression-2

Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.

Language:Jupyter Notebook9 10

Assignment-05-Multiple-Linear-Regression-2

Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.

Language:Jupyter Notebook4 10

Assignment-04-Simple-Linear-Regression-1

Assignment-04-Simple-Linear-Regression-1. Q1) Delivery_time -> Predict delivery time using sorting time. Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization, Feature Engineering, Correlation Analysis, Model Building, Model Testing and Model Predictions using simple linear regression.

Language:Jupyter Notebook3 10

Assignment-05-Multiple-Linear-Regression-1

Multiple-Linear-Regression-1. Consider only the below columns and prepare a prediction model for predicting Price of Toyota Corolla.

Language:Jupyter Notebook3 10

P23.-EDA-1

EDA (Exploratory Data Analysis) -1: Loading the Datasets, Data type conversions,Removing duplicate entries, Dropping the column, Renaming the column, Outlier Detection, Missing Values and Imputation (Numerical and Categorical), Scatter plot and Correlation analysis, Transformations, Automatic EDA Methods (Pandas Profiling and Sweetviz).

Language:Jupyter Notebook3 10

P24.-Supervised-ML---Simple-Linear-Regression---Newspaper-data

Supervised-ML---Simple-Linear-Regression---Newspaper-data. EDA and Visualization, Correlation Analysis, Model Building, Model Testing, Model predictions.

Language:Jupyter Notebook3 10

Assignment-03-Q1-Hypothesis-Testing-

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions. Cutlets.csv

Language:Jupyter Notebook2 20

Assignment-03-Q3-Hypothesis-Testing-

Chi2 contengency independence test. Assume Null Hypothesis as Ho: Independence of categorical variables (male-female buyer rations are similar across regions (does not vary and are not related) Thus Alternate Hypothesis as Ha: Dependence of categorical variables (male-female buyer rations are NOT similar across regions (does vary and somewhat/significantly related)

Language:Jupyter Notebook2 10

Assignment-2-Set2-Q5-Basic-Statistic-Level-2-

Consider a company that has two different divisions. The annual profits from the two divisions are independent and have distributions Profit1 ~ N(5, 3^2) and Profit2 ~ N(7, 4^2) respectively. Both the profits are in $ Million. Answer the following questions about the total profit of the company in Rupees. Assume that $1 = Rs. 45 A. Specify a Rupee range (centered on the mean) such that it contains 95% probability for the annual profit of the company. B. Specify the 5th percentile of profit (in Rupees) for the company C. Which of the two divisions has a larger probability of making a loss in a given year?

Language:Jupyter Notebook2 10

P25.-Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data

Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data. EDA and data visualization, Correlation Analysis, Model Building, Model Testing, Model Prediction.

Language:Jupyter Notebook2 10

Assignment-03-Q2-Hypothesis-Testing-

Anova ftest statistics. A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.

Language:Jupyter Notebook1 10

Assignment-03-Q4-Hypothesis-Testing-

Chi2 contengency independence test. Q4. TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences.

Language:Jupyter Notebook1 10

Assignment-03-Q5-Hypothesis-Testing-

Chi2 contengency independence test. Fantaloons Sales managers commented that % of males versus females walking in to the store differ based on day of the week. Analyze the data and determine whether there is evidence at 5 % significance level to support this hypothesis.

Language:Jupyter Notebook1 10

P07.-Chebyshev-s-practice

Chebyshev's Theorem 3/4th or 75% of observations lie 2 Standard deviations of mean i.e. mean+2SD and mean-2SD

Language:Jupyter Notebook1 10

P08.-Box-Plot-Practice

Box Plot - using dataframe in pandas Inserting Minor and Major gridlines Deriving LQ, UQ, IQR, Upper Whisker and Lower Whisker length

Language:Jupyter Notebook1 10

P09.-Probability-Calc-1

Find the probability that a normally distributed random variable has a mean of 60 and a standard deviation of 10 and we want to find the probability of x is less than 70.

Language:Jupyter Notebook1 10

P10.-Probability-Calc-2

Suppose GMAT scores can be reasonably modeled using a normal distribution with mean=711 and SD = 29. What is P(X<=680) What is P(697<=X<=740)

Language:Jupyter Notebook1 10

P11.-Normal-Distribution-of-Stocks

To understand Normal Distribution and its application. Daily returns of stocks traded in BSE (Bombay Stock Exchange). To understand risk and returns associated with various stocks before investing in them. BEML and GLAXO Stocks study.

Language:Jupyter Notebook1 10

P12.-C.I.E-using-z-values-Confidence-Interval-Estimate-

credit card launch example sample mean: 1990 sample SD: 2833 Pop SD: 2500 Pop mean: ? n=140 Q: Construct 95% confidence interval for mean card balance and interpret it

Language:Jupyter Notebook1 10

P13.-C.I.E-using-t-values-Confidence-Interval-Estimate-

credit card launch example sample mean: 1990 sample SD: 2833 Pop mean: ? n=140 (In cases, where pop SD is not known, use t-values and practically in all problems prefer t over z) Q: Construct 95% confidence interval for mean card balance and interpret it

Language:Jupyter Notebook1 10

P14.-Confidence-Interval-for-Stocks

Find confidence intervals for Beml and Glaxo stocks. Confidence Interval Estimate

Language:Jupyter Notebook1 10

P15.-Hypothesis-Testing-1S1T---Super-Market-Loyality-Program

Hypothesis-Testing 1S1T-Super-Market-Loyality-Program. Population Parameters: Mean=120 Sample Parameters: n=80, Mean=130, SD=40, df=80-1=79

Language:Jupyter Notebook1 10

P16.-Hypothesis-Testing-1S2T---Call-Center-Process

Hypothesis Testing 1S2T - Call Center Process. Sample Parameters: n=50, df=50-1=49, Mean1=4, SD1=3 1-sample 2-tail ttest Assume Null Hypothesis Ho as Mean1 = 4 Thus, Alternate Hypothesis Ha as Mean1 ≠ 4

Language:Jupyter Notebook1 10

P17.-Hypothesis-Testing-1-Sample-1-Tail-Test-Salmonella-Outbreak-

Hypothesis-Testing-1-Sample-1-Tail-Test-Salmonella-Outbreak. 1-sample 1-tail ttest. Assume Null Hypothesis Ho as Mean Salmonella <= 0.3. Thus Alternate Hypothesis Ha as Mean Salmonella > 0.3. As No direct code for 1-sample 1-tail ttest available with unknown SD and arrays of means. Hence we find probability using 1-sample 2-tail ttest and divide it by 2 to get 1-tail ttest.

Language:Jupyter Notebook1 10

P18.-Hypothesis-Testing-2-Sample-2-Tail-Test-Drugs-and-Placebos-

Hypothesis-Testing-2-Sample-2-Tail-Test-Drugs-and-Placebos. Note: This python code states both 2-sample 1-tail and 2-sample 2-tail codes. Treatment group mean is Mu1 Contrl group mean is Mu2 2-sample 2-tail ttest Assume Null Hypothesis Ho as Mu1 = Mu2 Thus Alternate Hypothesis Ha as Mu1 ≠ Mu2.

Language:Jupyter Notebook1 10

P19.-Hypothesis-Testing-2-Proportion-T-test-Students-Jobs-in-2-States-

Hypothesis-Testing-2-Proportion-T-test-Students-Jobs-in-2-States. Assume Null Hypothesis as Ho is p1-p2 = 0 i.e. p1 ≠ p2. Thus Alternate Hypthesis as Ha is p1 = p2. Explanation of bernoulli Binomial RV: np.random.binomial(n=1,p,size) Suppose you perform an experiment with two possible outcomes: either success or failure. Success happens with probability p, while failure happens with probability 1-p. A random variable that takes value 1 in case of success and 0 in case of failure is called a Bernoulli random variable. Here, n = 1, Because you need to check whether it is success or failure one time (Placement or not-placement) (1 trial) p = probability of success size = number of times you will check this (Ex: for 247 students each one time = 247) Explanation of Binomial RV: np.random.binomial(n=1,p,size) (Incase of not a Bernoulli RV, n = number of trials) For egs: check how many times you will get six if you roll a dice 10 times n=10, P=1/6 and size = repetition of experiment 'dice rolled 10 times', say repeated 18 times, then size=18. As (p_value=0.7255) > (α = 0.05); Accept Null Hypothesis i.e. p1 ≠ p2 There is significant differnce in population proportions of state1 and state2 who report that they have been placed immediately after education.

Language:Jupyter Notebook1 10

P20.-Hypothesis-Testing-Anova-Test---Iris-Flower-dataset

Hypothesis Testing Anova Test - Iris Flower dataset. Anova ftest statistics: Analysis of varaince between more than 2 samples or columns. Assume Null Hypothesis Ho as No Varaince: All samples population means are same. Thus Alternate Hypothesis Ha as It has Variance: Atleast one population mean is different. As (p_value = 0) < (α = 0.05); Reject Null Hypothesis i.e. Atleast one population mean is different Thus there is variance in more than 2 samples.

Language:Jupyter Notebook1 10

P21.-Hypothesis-Testing-Chi2-Test-Athletes-and-Smokers-

Hypothesis-Testing-Chi2-Test-Athletes-and-Smokers. Assume Null Hypothesis as Ho: Independence of categorical variables (Athlete and Smoking not related). Thus Alternate Hypothesis as Ha: Dependence of categorical variables (Athlete and Smoking is somewhat/significantly related). As (p_value = 0.00038) < (α = 0.05); Reject Null Hypothesis i.e. Dependence among categorical variables Thus Athlete and Smoking is somewhat/significantly related.

Language:Jupyter Notebook1 10

P22.-Hypothesis-Testing-Chi2-Test-Human-Gender-and-Choice-of-Pets-

Hypothesis-Testing-Chi2-Test-Human-Gender-and-Choice-of-Pets. Assume Null Hypothesis as Ho: Human Gender and choice of pets is independent and not related. Thus Alternate Hypothesis as Ha : Human Gender and choice of pets is dependent and related. As (p_valu=0.1031) > (α = 0.05); Accept Null Hypothesis i.e Independence among categorical variables. Thus, there is no relation between Human Gender and Choice of Pets.

Language:Jupyter Notebook1 10

scikit-learn-tips

:robot::zap: scikit-learn tips

Language:Jupyter Notebook100