Vaitybharati's repositories

Assignment-04-Simple-Linear-Regression-2

Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.

Language:Jupyter NotebookStargazers:9Issues:1Issues:0

Assignment-05-Multiple-Linear-Regression-2

Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.

Language:Jupyter NotebookStargazers:4Issues:1Issues:0

Assignment-04-Simple-Linear-Regression-1

Assignment-04-Simple-Linear-Regression-1. Q1) Delivery_time -> Predict delivery time using sorting time. Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization, Feature Engineering, Correlation Analysis, Model Building, Model Testing and Model Predictions using simple linear regression.

Language:Jupyter NotebookStargazers:3Issues:1Issues:0

Assignment-05-Multiple-Linear-Regression-1

Multiple-Linear-Regression-1. Consider only the below columns and prepare a prediction model for predicting Price of Toyota Corolla.

Language:Jupyter NotebookStargazers:3Issues:1Issues:0

P23.-EDA-1

EDA (Exploratory Data Analysis) -1: Loading the Datasets, Data type conversions,Removing duplicate entries, Dropping the column, Renaming the column, Outlier Detection, Missing Values and Imputation (Numerical and Categorical), Scatter plot and Correlation analysis, Transformations, Automatic EDA Methods (Pandas Profiling and Sweetviz).

Language:Jupyter NotebookStargazers:3Issues:1Issues:0

P24.-Supervised-ML---Simple-Linear-Regression---Newspaper-data

Supervised-ML---Simple-Linear-Regression---Newspaper-data. EDA and Visualization, Correlation Analysis, Model Building, Model Testing, Model predictions.

Language:Jupyter NotebookStargazers:3Issues:1Issues:0

Assignment-03-Q1-Hypothesis-Testing-

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions. Cutlets.csv

Language:Jupyter NotebookStargazers:2Issues:2Issues:0

Assignment-03-Q3-Hypothesis-Testing-

Chi2 contengency independence test. Assume Null Hypothesis as Ho: Independence of categorical variables (male-female buyer rations are similar across regions (does not vary and are not related) Thus Alternate Hypothesis as Ha: Dependence of categorical variables (male-female buyer rations are NOT similar across regions (does vary and somewhat/significantly related)

Language:Jupyter NotebookStargazers:2Issues:1Issues:0

Assignment-2-Set2-Q5-Basic-Statistic-Level-2-

Consider a company that has two different divisions. The annual profits from the two divisions are independent and have distributions Profit1 ~ N(5, 3^2) and Profit2 ~ N(7, 4^2) respectively. Both the profits are in $ Million. Answer the following questions about the total profit of the company in Rupees. Assume that $1 = Rs. 45 A. Specify a Rupee range (centered on the mean) such that it contains 95% probability for the annual profit of the company. B. Specify the 5th percentile of profit (in Rupees) for the company C. Which of the two divisions has a larger probability of making a loss in a given year?

Language:Jupyter NotebookStargazers:2Issues:1Issues:0

P25.-Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data

Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data. EDA and data visualization, Correlation Analysis, Model Building, Model Testing, Model Prediction.

Language:Jupyter NotebookStargazers:2Issues:1Issues:0

Assignment-03-Q2-Hypothesis-Testing-

Anova ftest statistics. A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch. Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

Assignment-03-Q4-Hypothesis-Testing-

Chi2 contengency independence test. Q4. TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and has to be reworked before processing. The manager wants to check whether the defective % varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

Assignment-03-Q5-Hypothesis-Testing-

Chi2 contengency independence test. Fantaloons Sales managers commented that % of males versus females walking in to the store differ based on day of the week. Analyze the data and determine whether there is evidence at 5 % significance level to support this hypothesis.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P07.-Chebyshev-s-practice

Chebyshev's Theorem 3/4th or 75% of observations lie 2 Standard deviations of mean i.e. mean+2SD and mean-2SD

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P08.-Box-Plot-Practice

Box Plot - using dataframe in pandas Inserting Minor and Major gridlines Deriving LQ, UQ, IQR, Upper Whisker and Lower Whisker length

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P09.-Probability-Calc-1

Find the probability that a normally distributed random variable has a mean of 60 and a standard deviation of 10 and we want to find the probability of x is less than 70.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P10.-Probability-Calc-2

Suppose GMAT scores can be reasonably modeled using a normal distribution with mean=711 and SD = 29. What is P(X<=680) What is P(697<=X<=740)

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P11.-Normal-Distribution-of-Stocks

To understand Normal Distribution and its application. Daily returns of stocks traded in BSE (Bombay Stock Exchange). To understand risk and returns associated with various stocks before investing in them. BEML and GLAXO Stocks study.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P12.-C.I.E-using-z-values-Confidence-Interval-Estimate-

credit card launch example sample mean: 1990 sample SD: 2833 Pop SD: 2500 Pop mean: ? n=140 Q: Construct 95% confidence interval for mean card balance and interpret it

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P13.-C.I.E-using-t-values-Confidence-Interval-Estimate-

credit card launch example sample mean: 1990 sample SD: 2833 Pop mean: ? n=140 (In cases, where pop SD is not known, use t-values and practically in all problems prefer t over z) Q: Construct 95% confidence interval for mean card balance and interpret it

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P14.-Confidence-Interval-for-Stocks

Find confidence intervals for Beml and Glaxo stocks. Confidence Interval Estimate

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P15.-Hypothesis-Testing-1S1T---Super-Market-Loyality-Program

Hypothesis-Testing 1S1T-Super-Market-Loyality-Program. Population Parameters: Mean=120 Sample Parameters: n=80, Mean=130, SD=40, df=80-1=79

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P16.-Hypothesis-Testing-1S2T---Call-Center-Process

Hypothesis Testing 1S2T - Call Center Process. Sample Parameters: n=50, df=50-1=49, Mean1=4, SD1=3 1-sample 2-tail ttest Assume Null Hypothesis Ho as Mean1 = 4 Thus, Alternate Hypothesis Ha as Mean1 ≠ 4

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P17.-Hypothesis-Testing-1-Sample-1-Tail-Test-Salmonella-Outbreak-

Hypothesis-Testing-1-Sample-1-Tail-Test-Salmonella-Outbreak. 1-sample 1-tail ttest. Assume Null Hypothesis Ho as Mean Salmonella <= 0.3. Thus Alternate Hypothesis Ha as Mean Salmonella > 0.3. As No direct code for 1-sample 1-tail ttest available with unknown SD and arrays of means. Hence we find probability using 1-sample 2-tail ttest and divide it by 2 to get 1-tail ttest.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P18.-Hypothesis-Testing-2-Sample-2-Tail-Test-Drugs-and-Placebos-

Hypothesis-Testing-2-Sample-2-Tail-Test-Drugs-and-Placebos. Note: This python code states both 2-sample 1-tail and 2-sample 2-tail codes. Treatment group mean is Mu1 Contrl group mean is Mu2 2-sample 2-tail ttest Assume Null Hypothesis Ho as Mu1 = Mu2 Thus Alternate Hypothesis Ha as Mu1 ≠ Mu2.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P19.-Hypothesis-Testing-2-Proportion-T-test-Students-Jobs-in-2-States-

Hypothesis-Testing-2-Proportion-T-test-Students-Jobs-in-2-States. Assume Null Hypothesis as Ho is p1-p2 = 0 i.e. p1 ≠ p2. Thus Alternate Hypthesis as Ha is p1 = p2. Explanation of bernoulli Binomial RV: np.random.binomial(n=1,p,size) Suppose you perform an experiment with two possible outcomes: either success or failure. Success happens with probability p, while failure happens with probability 1-p. A random variable that takes value 1 in case of success and 0 in case of failure is called a Bernoulli random variable. Here, n = 1, Because you need to check whether it is success or failure one time (Placement or not-placement) (1 trial) p = probability of success size = number of times you will check this (Ex: for 247 students each one time = 247) Explanation of Binomial RV: np.random.binomial(n=1,p,size) (Incase of not a Bernoulli RV, n = number of trials) For egs: check how many times you will get six if you roll a dice 10 times n=10, P=1/6 and size = repetition of experiment 'dice rolled 10 times', say repeated 18 times, then size=18. As (p_value=0.7255) > (α = 0.05); Accept Null Hypothesis i.e. p1 ≠ p2 There is significant differnce in population proportions of state1 and state2 who report that they have been placed immediately after education.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P20.-Hypothesis-Testing-Anova-Test---Iris-Flower-dataset

Hypothesis Testing Anova Test - Iris Flower dataset. Anova ftest statistics: Analysis of varaince between more than 2 samples or columns. Assume Null Hypothesis Ho as No Varaince: All samples population means are same. Thus Alternate Hypothesis Ha as It has Variance: Atleast one population mean is different. As (p_value = 0) < (α = 0.05); Reject Null Hypothesis i.e. Atleast one population mean is different Thus there is variance in more than 2 samples.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P21.-Hypothesis-Testing-Chi2-Test-Athletes-and-Smokers-

Hypothesis-Testing-Chi2-Test-Athletes-and-Smokers. Assume Null Hypothesis as Ho: Independence of categorical variables (Athlete and Smoking not related). Thus Alternate Hypothesis as Ha: Dependence of categorical variables (Athlete and Smoking is somewhat/significantly related). As (p_value = 0.00038) < (α = 0.05); Reject Null Hypothesis i.e. Dependence among categorical variables Thus Athlete and Smoking is somewhat/significantly related.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

P22.-Hypothesis-Testing-Chi2-Test-Human-Gender-and-Choice-of-Pets-

Hypothesis-Testing-Chi2-Test-Human-Gender-and-Choice-of-Pets. Assume Null Hypothesis as Ho: Human Gender and choice of pets is independent and not related. Thus Alternate Hypothesis as Ha : Human Gender and choice of pets is dependent and related. As (p_valu=0.1031) > (α = 0.05); Accept Null Hypothesis i.e Independence among categorical variables. Thus, there is no relation between Human Gender and Choice of Pets.

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

scikit-learn-tips

:robot::zap: scikit-learn tips

Language:Jupyter NotebookStargazers:1Issues:0Issues:0