vkushwaha / Data-Science-Portfolio

A Portfolio of my Data Science Projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Science Portfolio

This is a repository of the projects I worked on or currently working on. It is updated regularly. The projects are either written in R (R markdown) or Python (Jupyter Notebook). The goal of the projects is to use data science/statistical modelling techniques to find something that is interesting. A typical project consist of finding and cleaning data, analysis, visualization and conclusion. Click on the projects to see full analysis and code.

Projects:

Bitcoin Price Analysis

  • Cross correlation analysis between Bitcoin Price and S&P500 price over time.
  • Granger causality test between Bitcoin and stock prices
  • Fitted ARIMA model on Bitcoin prices to forecast Bitcoin range of movement.
  • Keywords(R, Time Series, Causality, Quandl API)


Exchange Rate Analysis During US Election - Under Construction

  • Predicted US (2016) election victories as the voting results of each region becomes available.
  • Regressed states with results against polling data and predicted results for the remaining states
  • Monte Carlos simulation used to simulate the winner of the election.
  • Compared simulated results with exchange rates fluctuations to see if market is efficient.
  • Keywords(Python, Linear Regression, Monte Carlos Simulation)


Power-law or Log-normal? Baby Name and Twitter Analysis

  • Fitted power-law and log-normal distribution to US baby names data since 1960.
  • Use bootstrapping techniques to find a distribution of the power-law parameters
  • Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
  • Keywords(R, Power-law, Bootstrapping, Log-normal)


Comparing Ridge and Lasso Regularization with Cross Validation

  • Plotted scatter-plot matrix to visualize the data
  • Fitted polynomial linear regression on wine quality vs wine chemical properties.
  • Used ridge and lasso regularization to tackle overfitting and compared result
  • Used cross validation to select the optimal regularization strength
  • Keywords(Python, Linear Regression, Ridge and Lasso Regularization, Cross Validation)


Twitter Sentiment Daily and Weekly Fluctuations

  • Parsed a few GB of Tweets to select all the tweets in UK and in English.
  • Used 'qdap' package to analyze the emotion of the Tweets
  • Plotted the emotions over the day and over the week and analysed the interesting results.
  • Keywords(R, Twitter API, Time Series, Sentiment Analysis, ggplot)

GDP and Future Orientation

  • Downloaded economic indicators data using World Bank API, and cleaned data
  • Downloaded search query of next and last year in Google for each country
  • Fitted linear regression between GDP and future orientation
  • Keywords(R, World Bank API, Google API, Data Cleaning, Linear regression)

Exchange Rate Analysis During UK Election - Under Construction

  • Predicted UK (2017) election victories as the voting results as it happened.
  • retrieved from Tweets of result announcement and extracted time of announcement for each region.
  • Regressed regions with results against polling data and predicted results for the remaining regions
  • Monte Carlos simulation used to simulate the winner of the election.
  • Keywords(Python, Twitter API, Merging Data)

About

A Portfolio of my Data Science Projects


Languages

Language:Jupyter Notebook 73.0%Language:HTML 26.1%Language:Python 0.9%