princepeak

princepeak's repositories

covid-policy-tracker

Systematic dataset of Covid-19 policy, from Oxford University

NOASSERTION000

covid19_twitter

Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development

Language:Jupyter Notebook000

DSCI591-Fall21-RecommendationSystem

Workspace for DSCI591 captone project I

Language:Jupyter Notebook000

The ongoing pandemic of coronavirus disease 2019-2020 (COVID-19) is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). This pathogenic virus is able to spread asymptotically during its incubation stage through a vulnerable population. Given the state of healthcare, policymakers were urged to contain the spread of infection, minimize stress on the health systems and ensure public safety. Most effective tool that was at their disposal was to close non-essential business and issue a stay home order. In this paper we consider techniques to measure the effectiveness of stringency measures adopted by governments across the world. Analyzing effectiveness of control measures like lock-down allows us to understand whether the decisions made were optimal and resulted in a reduction of burden on the healthcare system. In specific we consider using a synthetic control to construct alternative scenarios and understand what would have been the effect on health if less stringent measures were adopted. We present analysis for The State of New York, United States, Italy and The Indian capital city Delhi and show how lock-down measures has helped and what the counterfactual scenarios would have been in comparison to the current state of affairs. We show that in The State of New York the number of deaths could have been 6 times higher, and in Italy, the number of deaths could have been 3 times higher by 26th of June, 2020.

Language:Python000

sharing-truth

Language:Jupyter Notebook010

Twitter-Sentiment-Analysis---Analytics-Vidhya

Problem Statement The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets. Formally, given a training sample of tweets and labels, where label '1' denotes the tweet is racist/sexist and label '0' denotes the tweet is not racist/sexist, your objective is to predict the labels on the test dataset. Motivation Hate speech is an unfortunately common occurrence on the Internet. Often social media sites like Facebook and Twitter face the problem of identifying and censoring problematic posts while weighing the right to freedom of speech. The importance of detecting and moderating hate speech is evident from the strong connection between hate speech and actual hate crimes. Early identification of users promoting hate speech could enable outreach programs that attempt to prevent an escalation from speech to action. Sites such as Twitter and Facebook have been seeking to actively combat hate speech. In spite of these reasons, NLP research on hate speech has been very limited, primarily due to the lack of a general definition of hate speech, an analysis of its demographic influences, and an investigation of the most effective features. Data Our overall collection of tweets was split in the ratio of 65:35 into training and testing data. Out of the testing data, 30% is public and the rest is private. Data Files train.csv - For training the models, we provide a labelled dataset of 31,962 tweets. The dataset is provided in the form of a csv file with each line storing a tweet id, its label and the tweet. There is 1 test file (public) test_tweets.csv - The test data file contains only tweet ids and the tweet text with each tweet in a new line.

Language:Python000

YouTube-APIs-use

Retrieving data via YouTube APIs

010

youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require a headless browser, like other selenium based solutions do!

Language:PythonMIT000