junweiluo / SQL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unit 7 Homework Assignment: Looking for Suspicious Transactions

Credit card fraudster

Credit Card Fraudster by Richard Patterson | Creative Commons Licensed

Background

Fraud is prevalent these days, whether you are a small taco shop or a large international business. While there are emerging technologies that employ machine learning and artificial intelligence to detect fraud, many instances of fraud detection still require strong data analytics to find abnormal charges.

In this homework assignment, you will apply your new SQL skills to analyze historical credit card transactions and consumption patterns in order to identify possible fraudulent transactions.

You are asked to accomplish three main tasks:

  1. Data Modeling: Define a database model to store the credit card transactions data and create a new PostgreSQL database using your model.

  2. Data Engineering: Create a database schema on PostgreSQL and populate your database from the CSV files provided.

  3. Data Analysis: Analyze the data to identify possible fraudulent transactions.


Files

Instructions

Data Modeling

Create an entity relationship diagram (ERD) by inspecting the provided CSV files.

Part of the challenge here is to figure out how many tables you should create, as well as what kind of relationships you need to define among the tables.

Feel free to discuss your database model design ideas with your classmates. You can use a tool like Quick Database Diagrams to create your model.

Data Engineering

Using your database model as a blueprint, create a database schema for each of your tables and relationships. Remember to specify data types, primary keys, foreign keys, and any other constraints you defined.

After creating the database schema, import the data from the corresponding CSV files.

Data Analysis

It's time to identify fraudulent transactions. In this part of the homework assignment, you will analyze the data and then create a report to present your findings. You can use a Jupyter Notebook, a markdown file, or a word processor. Your report should answer the following questions:

  • How can you isolate (or group) the transactions of each cardholder?

  • Consider the time period 7:00 a.m. to 9:00 a.m.

    • What are the top 100 highest transactions during this time period?

    • Do you see any fraudulent or anomalous transactions?

    • If you answered yes to the previous question, explain why you think there might be fraudulent transactions during this time frame.

  • Some fraudsters hack a credit card by making several small payments (generally less than $2.00), which are typically ignored by cardholders. Count the transactions that are less than $2.00 per cardholder. Is there any evidence to suggest that a credit card has been hacked? Explain your rationale.

  • What are the top 5 merchants prone to being hacked using small transactions?

  • Once you have a query that can be reused, create a view for each of the previous queries.

Create a report for fraudulent transactions of some top customers of the firm. To achieve this task, perform a visual data analysis of fraudulent transactions using Pandas, Plotly Express, hvPlot, and SQLAlchemy to create the visualizations.

  • Verify if there are any fraudulent transactions in the history of two of the most important customers of the firm. For privacy reasons, you only know that their cardholders' IDs are 18 and 2.

    • Using hvPlot, create a line plot representing the time series of transactions over the course of the year for each cardholder. In order to compare the patterns of both cardholders, create a line plot containing both lines.

    • What difference do you observe between the consumption patterns? Does the difference suggest a fraudulent transaction? Explain your rationale.

  • The CEO of the biggest customer of the firm suspects that someone has used her corporate credit card without authorization in the first quarter of 2018 to pay quite expensive restaurant bills. You are asked to find any anomalous transactions during that period.

    • Using Plotly Express, create a series of six box plots, one for each month, in order to identify how many outliers per month for cardholder ID 25.

    • Do you notice any anomalies? Describe your observations and conclusions.

Challenge

Another approach to identify fraudulent transactions is to look for outliers in the data. Standard deviation or quartiles are often used to detect outliers.

Read the following articles on outliers detection, and then code a function using Python to identify anomalies for any cardholder.

Submission

  • Create an image file of your ERD.

  • Create a .sql file of your table schemata.

  • Create a .sql file of your queries.

  • Create a Jupyter Notebook for the visual data analysis and the challenge.

  • Create and upload a repository with the above files to GitHub and post a link in BootCamp Spot.

Hint

For comparing time and dates, take a look at the date/time functions and operators in the PostgreSQL documentation.


© 2019 Trilogy Education Services

About


Languages

Language:Jupyter Notebook 99.9%Language:TSQL 0.1%