Lina (Lina-Mo)


Geek Repo





Location:San Jose CA

Home Page:

Github PK Tool:Github PK Tool

Lina's repositories


A/B tests are very commonly performed by data analysts and data scientists. It is important that you get some practice working with the difficulties of these For this project, you will be working to understand the results of an A/B test run by an e-commerce website. Your goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.



In this project I will analyzing a real life data from the New York Stock Exchange. It will be drawing a subset of a large dataset provided by Kaggle that contains historical financial data from S&P 500 companies. The project includes the process of calculating summary statistics, drawing an inference from the statistics, calculating business metrics and using models to forecast future growth prospects for the companies. The project goal is to perform an analysis and also create visual tools to communicate the results in informative ways. In this project I will be analyze the spread of Gross profit for Industrial sector companies in second year.



In this project we will perform data cleaning of the Auralin and Novodra insulin products clinical trial data. The goal is to assess this data, detect issues and clean. In the process we will discover data quality and tidiness issues.



This template notebook are for data wrangling, with sections marked off for each step of the process.



In this project, you will analyze local and global temperature data and compare the temperature trends where you live to overall global temperature trends.



In this analysis, one of my goals was to identify when most trips are taken in terms of time of hour, weekday or month of the year. Second I wanted to know who the Ford GoBikes users are by age, gender and type. At last see how do the bike trips usually look by trip duration, distance and speed.



With your knowledge of HTML file structure, we're going to use Beautiful Soup to extract our desired Audience Score metric and number of audience ratings, along with the movie title like in the video above (so we have something to merge the datasets on later) for each HTML file, then save them in a pandas DataFrame.



In this project we will analyze a data set from a film maker prospective, trying to answer the main questions related to how to create a successful movie. We will focus not in popularity, voting but in profitability. In the project will be used the adjusted columns 'budget_adj' and 'revenue_adj' to calculate new column 'profit_adj' and to analyze most profitable genres, production companies, directors, budgets and release months. As columns, as β€˜genres’, 'director' and 'production_companies', contain multiple values separated by pipe (|) characters, we will spit them and use the first value as the main one. For example column 'director' we will use the first director on the list and analyze just the 'first_director' category. The same we will do with columns 'production_companies' and 'genres' we will use just first in the list company and genre and analyze as the 'main_production_company' and 'main_genre' for particular movies.



In the project we will be performing three steps of the data wrangling process. We will be using a dataset of 19,000 online job posts from 2004 to 2015 that were posted through an Armenian human resource portal. It is hosted on Kaggle website. The dataset is dirty and messy to perform gathering, then assessing, and then cleaning our data, which are our three core steps in the data wrangling process.



In this project, I will query the Chinook Database. The Chinook Database holds information about a music store. For this project, I will be assisting the Chinook team with understanding the media in their store, their customers and employees, and their invoice information.



In this project I wrangled and analyzed the tweet archive of Twitter user @dog_rates, also image prediction file and additional data via Twitter API. The goal was to wrangle Twitter data to create interesting and trustworthy analyses and visualizations.



Wine Quality Data Set from UCI Machine Learning Lab There are two datasets that provide information on samples of red and white variants of the Portuguese "Vinho Verde" wine. Each sample of wine was rated for quality by wine experts and examined with physicochemical tests. Due to privacy and logistic issues, only data on these physicochemical properties and quality ratings are available

Language:Jupyter NotebookStargazers:0Issues:0Issues:0