Utkarsh Pratap Singh (Moozzart)

Moozzart

Geek Repo

Company:Indian Institute of Technology (IIT), Kharagpur

Github PK Tool:Github PK Tool

Utkarsh Pratap Singh's repositories

Meyer-Packard-Genetic-Algorithm-for-Prediction-of-Stock-Prices-and-Performances

Stock Market predictions are one of the most difficult problems to solve, and during the looming days of recession it’s extremely difficult and next to impossible to do. This is because there are numerous patterns in the stock prices trend throughout the day and every variation from the normal trend could mean something new, since stocks is ever expanding and hence new problems and pattern in the trends are visible which needs to be studied but these new trends are usually generated each and every day possible of the trade and to keep up with the change is a very lofty task to do especially for an individual who has a large or even semi large portfolio to maintain over a period of time. Stocks and bonds are immensely important for a country’s economy to boom and it’s collapse means the collapse of country’s economy and since these markets are linked with every possible sector that contribute to the economy, mostly organised sectors, it’s collapse would be seen on every sector linked in those markets by what the economists call as “Ripple effect” and this goes other way around as well that if a particular sector’s firm performs poorly, then that would be reflected in the other firms of that sector.

Bitcoin-Price-Prediction-Using-Twitter-Sentiments-And-Currency-Fundamentals-LSTM-

Predicted the Price of the Cryptocurrency(Bitcoin) using the past time-series data, Twitter Sentiments(Polarity and Sensitivity), Currency's Fundamentals and Technical Indicators like RSI and SMA on LSTM. The Notebook contains the Exploratory data analysis(with important links) and the astounding result at the end of it

Language:Jupyter NotebookStargazers:9Issues:1Issues:0

Fast-Ai-with-HuggingFace-Transformer

The chosen task is a multi-class text classification on Movie Reviews. For each text movie review, the model has to predict a label for the sentiment. We evaluate the outputs of the model on classification accuracy. The sentiment labels are: 0 → Negative 1 → Somewhat negative 2 → Neutral 3 → Somewhat positive 4 → Positive. This movie review dataset(provided by Kaggle) is more complex than the generic movie review dataset

Language:Jupyter NotebookStargazers:2Issues:1Issues:0

Handwritten-Digit-Classification

1) Download the MNIST handwritten digit dataset. It contains 28X28 images. Flatten them into 784-dimensional binary vectors. Keep aside 20% data for testing and another 20% for validation. [1 mark] 2) Now, draw a random subset of 10 dimensions (out of 784). Based on these 10 dimensions only, build a decision tree (using library function). Maximum depth allowed: 5. Calculate accuracy of the tree on validation set. [2 mark] 3) Repeat this process for 50 random subsets like this, each of dimension 10.For each of them, build a decision tree of max. depth 5. Calculate accuracy on validation set. [2 marks] 4) Carry out weighted classification of the test set using these 50 decision trees, along with their validation accuracies as weights. Report the accuracy. [1 marks] 5) Starting with this ensemble as the initial classifier, implement Adaboost algorithm. At each stage, build a decision tree using entropy based on weighted examples as the heterogeneity measure of each node. Each tree will have maximum depth of 5. Maximum 20 iterations of Adaboost. [3 marks] 6) Using this ensemble, carry out classification on the test set and report accuracy [1 mark]

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

Neural-Network-Modeling-of-Flow-behaviour-of-Titanium-Alloys

This Project had been the undertaken by Utkarsh Pratap Singh, Aniket Niranjan Mishra and Saurabh Singh, under Prof. Sumantra Mandal, Metallurgical and Materials Engineering, IIT Kharagpur. Data was scrapped from the related research papers and sources of Titanium alloys flow behavior and a Neural Network framework was developed to model the appropriate flow behavior of those alloys. The data comprise of the composition of different alloys, temperature at which flow stress is calculated, strain and strain rateto predict the flow stress. The training was done on the the data points that we scrapped from papers and due to insufficient data to test we opted for testing the model by calculating the flow stress at the points in between the training data point pair of flow stress and strain and results were good and followed the pattern of the curve as shown in Results__.jpeg

Language:MATLABStargazers:1Issues:0Issues:0

Smart-Budget-Analysis-of-Uttar-Pradesh-And-Telangana-

Budget Analysis of Pre COVID and COVID timelines were done to suggest the U.P and Telangana Governments their next line of action on the areas where more well defined plan is needed

Stargazers:1Issues:0Issues:0

Trading-Strategy-II-F-Score

F score trading strategy is based on the premise of taking into account of the several accounting parameters like ROA, accruals, CFO etc. In total there are points to score

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

Beating-Analysts-Predictions-Using-Machine-Learning-Algorithms-to-predict-EPS-of-US-Firms

Task: To develop a model to predict the earnings per share of firms using macroeconomic fundamentals. And to test its predictive power against the analyst’s predictions.

Language:PythonStargazers:0Issues:1Issues:0

Group-Loans-Lending-Using-Machine-Learning

Diversification is known to be important to lending, but not much is known about their relative performance in giving loans and whether they’d default or not, especially when they are in conflict. Using the group loan structure in India, we find that in an economic setting dominated by information asymmetry, heterogeneity in groups leads to better performances in paying the Loan when compared to homogeneity. Using machine learning techniques like XGBoost and Random Forest, the paper shows that social ties increases the risk of default.

Language:PythonStargazers:0Issues:1Issues:0

Modelling-of-Impact-of-COVID-19-on-Economical-sectors-at-the-granular-level-in-India-

Analysis of COVID on economy at the granular ward level of each district in India

Language:PythonStargazers:0Issues:0Issues:0

Ai-Sudoku-Solver

Developed a CNN model that'd solve Sudoku be it a softcopy or a handmade sudoku on a sheet

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

Creating-Mean-and-Median-Denoiser-of-Pixels-of-an-Image

Assignment: This assignment will be image/video denoising. Some images are provided as input. Part 1: Corrupt the images by randomly choosing some pixels and replacing the pixel values with random/junk values [3 marks] Part 2: Display and save the noisy images [1 marks] Part 3: Read in the saved noisy images. Identify the noisy pixels by comparing them with neighboring pixels. [2 marks] Part 4: Replace the "noisy pixels" identified in part 3 with mean and median of the neighboring pixels. Consider different "neighborhoods". Display the "denoised" images

Language:Jupyter NotebookStargazers:0Issues:1Issues:0

Crypto-Movement-Prediction-Using-RNN

Using the cryptocurrency's fundamentals, tried predicting its movement in price and hence traders stance that is likely pertaining to that situation

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

Daily-Rainfall-Analysis

Two files are attached which contain daily rainfall data over India for 2010 and 2011. Both of them contain a 357x122 matrix (XR1 and XR) an a binary vector (ZR1 and ZR). The matrices contain rainfall amounts at 357 locations over India, on each day during the monsoon seasons of 2010 and 2011 (122 days from 1 June to 30 September). ZR1 and ZR are binary vectors which classify every day as 'rainy" (1) or non-rainy (0) based on the rainfall across the landmass. 1) Read the .mat files in Python and access the variables 2) Use a linear regression model to predict the rainfall XR(s,t) at any location 's' on day 't', using as predictor the rainfall at all other locations on the same day, and also rainfall at the same location on the previous 2 days [XR(1,t)....XR(s-1,t), XR(s+1,t),....XR(357,t), XR(s,t-1), XR(s,t-2)]. Use 2010 data for training. Build such a model for s=42 (Mumbai), s=158 (Delhi), s= 299 (Kharagpur) [3 marks] 3) Use the same model to predict the rainfall at these 3 locations on each day of 2011. Use values in XR as predictors. Compare the results with the true values and compute error for 3 locations separately. [1 marks] 4) Repeat the same process using LASSO linear regression. Using the coefficients, identify the top 5 predictors for each of the 3 locations. [2 marks] 5) Use Decision Tree on 2010 data to classify each day as 1 or 0 (as given in ZR1). For each day, use the 357-dimensional rainfall vector as feature vector. Report the 10 most discriminative features (i.e. locations) [3 marks] 6) Use this Decision Tree to classify each day of 2011 as 1 or 0. Report accuracy by comparing with ZR

Language:Jupyter NotebookStargazers:0Issues:1Issues:0

Does-high-risk-really-mean-high-returns-

High risk does not imply higher returns as there is risk aversion phenomenon that start dominating on the risk return behavior of individuals. and that leads to downwards sloping of the risk return curve after reaching some threshold general risk appetite of individuals

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

Hand-gesture-Classification-using-SVM

There is a dataset of sign language using hand gestures. So there are 10 classes each classes having 3000 images. All the images are in RGB format. You need to perform the following steps. 1. Convert the image into binary image. From each class in the dataset use 70% for training and 30% for testing. 2. Extract features from these and store them in a csv file (features should be chosen by you - eg. binary pixel vectors, total number of white pixels, local binary patterns). Represent each image using such features. [2 marks] 3. Use the features for classification using SVM (default setting). Print classification report. [3 marks] 4. Apply grid search for hyper-parameter tuning.(eg: kernel, C, gamma). [3 marks] 5. Report the model with best accuracy. [2 marks] If Memory error is coming you can rescale the image as per requirement.

Language:Jupyter NotebookStargazers:0Issues:1Issues:0

Linear-and-Sparse-Regression-

Assignment: Linear and Sparse Regression Consider the attached dataset about advertising and sales. The attributes denote the investments on advertising in TV, radio etc and the target variable is the total sales. The aim is to predict the sales from the investments on advertising. 1) Randomly divide the dataset into training (75%) and testing (25%) subsets [1 mark] 2) Using Linear Regression, fit a model to predict the sales from investments using your own formula. Compare the coefficients as found by the python library function [3 marks] 3) Compute the mean squared error on the testing set [1 marks] 4) Using ridge regression with different values of lambda (0.5, 1, 5, 10, 50, 100) plot the coefficients against each other, and also compare the test set mean squared errors. [3 marks] 5) Use the library function of "LASSO regression" to find out which of the 3 features is most important, i.e. whose coefficient is furthest from 0

Language:Jupyter NotebookStargazers:0Issues:1Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0

Market-Basket-Analysis-Problem-but-with-Two-suggestions

In this Project I had to develop a model to make suggestions on the appropriate options to buy two different things when bought along with a particular thing. I used an apriori algorithm for this task and

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

Prediction-of-Forest-Fires-Smoke-Start-of-Fire-and-No-Fire-using-Transfer-Learned-Inception-V-3

Presently there is a huge need for us to predict forest fires, to curb them as soon as possible before it wreaks havoc as it has done several times between 2019-2021(and this pattern doesn't seem to end any soon). Keeping this in mind, I built a classifier using Inception V3 model that'd predict the possibility of forest fire or the start of it(smoke). Used the dataset composed of 800 forest fires, smoke and no fire images, I augmented it to make a dataset of 5794 images and then trained and validated through our model. The observed accuracy was 88.85 percent in training and 81.63 percent in validation. Pretty Neat!

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

Text-Analysis-with-Naive-Bayes

1) Sentence classification: Consider the files traindata.csv and testdata.csv. In these files, each row contains a sentence which belongs to one of 4 categories (science, sports, business, covid crisis). Learn a Naive Bayes classifier to predict the category of each sentence, based on the words in it (neglecting stop words). Use the training set to estimate the prior distribution over the class labels and class-conditional probabilities, i.e. the probability of each word occurring in a sentence having a particular class label. For each test sentence, your output should be the posterior distribution over the labels. [Trick: never set p(w|Y=k)=0 for any word w and label k, even if word w never exists in any sentence with label k. Assign a small probability like 0.01. Adjust the probabilities of other words too, such that you get a proper conditional distribution] i) Construct the vocabulary without stop-words [2 marks] ii) Calculate the prior distribution of the labels [1 mark] iii) Calculate the class-conditional probabilities of each word in the vocabulary, for each topic [4 marks] iv) For each test sentence, create the posterior distribution over the labels [3 marks] 2) Sentence Completion Consider the datasets "40.csv" and "10.csv". In each sentence of "10.csv", the last word is not provided. The task is to predict it based on the remaining words in the sentence. Build your vocabulary from "40.csv" (except stop words). Assume that the missing words are part of this vocabulary. Consider this as a classification problem, where each word in the vocabulary may be considered as a class label. Use the Naive Bayes classifier to make probabilistic estimates of the missing words. i) Create the vocabulary without stop-words [2 marks] ii) Estimate the prior probabilities of all "labels", i.e. words in vocabulary [3 marks] iii) Estimate the class-conditional probabilities of all words [3 marks] iv) In each test sentence, calculate the most likely word in the missing position along with probability

Language:Jupyter NotebookStargazers:0Issues:1Issues:0

Trading-Algorithms-I-Accruals

Accruals trading strategy is based on a simple premise that

Language:Jupyter NotebookStargazers:0Issues:0Issues:0