911150 / Programmer_Portfolio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hi 👋, I'm Noah Sebastian

An aspiring data enthusiast

  • 🌱 I’ve just graduated from the University of Melbourne studying a Bachelors of Science, Majoring in Data Science

  • 💬 Ask me about sports, twitter

  • 📫 How to reach me noahs@student.unimelb.edu.au

Connect with me:

noah-sebastian

Languages and Tools:

bash c csharp git java linux pandas photoshop python scikit_learn seaborn

Listen with me:

spotify-github-profile

Machine Learning Projects


An application and analysis of fare forecasting using Gradient Boosted Trees against Random Forests applied to the New York City Yellow Cab

Research Goal: Forecast fare amounts of the New York City Yellow Taxi Cab using the subset of features that are present before a journey is undertaken.

Excerpt from report:

...An aspect of the New York City (NYC) taxi service, and one that differs from the more modern on-demand-vehicles, is their inability to present end-to-end fare prices and inform individuals pre-travel about their future incurred costs. This is an area where for-hire-vehicles (FHV’s) have a significant competitive advantage over the taxi industry.

As such, this report will assume the perspective of a research and development company working on behalf of the New York City Taxi & Limousine Commission (NYCTLC) to provide insights and prospectus integration’s for the current NYCTLC system and, more specifically, will attempt to offer a method for forecasting trip fare amounts for varying circumstances around New York City.

For this task, a Gradient-Boosted-Tree Regressor is suggested as such a model since it is capable of handling imperfect data sets, can generalize well and offers high flexibility. It is also currently one of the more mainstream algorithms adopted by the data science community. It will be contrast against a Random Forest Decision Tree Regressor, with the intention of exploring their differences, and effectiveness’...


Tweet Sentiment Classification on Imbalanced Datasets using Ensemble & Stacking Methods

Research Goal: To critically assess the effectiveness of various Machine Learning classification algorithms and present empirical evidence and discussion regarding the efficacy of ensemble methods in this domain and how they were applied in this setting.

Excerpt from report:

...Twitter is a popular social media service in which users post to their respective audiences (followers) with what are called “tweets”.(N´adia F.F. da Silva, 2014) These tweets can be anything the user deems ‘post-worthy’, with the only significant constraint being their limitation to 140 characters (N´adia F.F. da Silva, 2014). Within these tweets, users can express sentiment respective to their content. It is this sentiment and its prediction that will be the focus point of this report.

The aim of this report is to critically assess the effectiveness of various Machine Learning classification algorithms and present empirical evidence and discussion regarding the effectiveness of ensemble methods in this domain and how they were applied in this setting...


Generic Buy Now, Pay Later Project

Abstract:

In a nutshell, the Buy Now Pay Later firm offers its services to their partenered merchants, which allows their customers to pay for items in five installments, instead of all upfront. The firm itself gets a small commission for every transaction the customer makes with the buy now pay later feature. However, due to limited resources, the firm is only able to onboard a handful of merchants to partner up with every year.

Hence, the objective and purpose of this project is to rank those merchants based on how well they perform and how consistent they are with their sales, such that the profits for the firm are optimized. In addition to this, a summary notebook is also included which summarises the overall approach taken, issues and obstacles that the team ran into, limitations/assumptions that were made, as well as recommendations to the client and stakeholders.

About

License:MIT License


Languages

Language:Jupyter Notebook 97.2%Language:Python 1.5%Language:Java 1.3%Language:Shell 0.0%