minsapay / transaction-data-2019

Transaction Data of MinsaPay 2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MinsaPay Anonymized Transaction Data 2019

MinsaPay is an electronic payment system designed for the KMLA Summer Festival (Minjok Festival). The MinsaPay team has decided to open the entire transaction data to help students studying data analysis and organizing future Minjok Festivals. Because financial data that can specify individual users are susceptible to data abusing, the MinsaPay team has anonymized the transaction data. Every other aspect of the data has not been modified.

Statistics

Content Data
Users 400
Transactions 2,900
Total payment amount Processed payment worth $4,000
Total transaction amount Transacted worth $14,400

The total payment amount only includes the payment made to purchase any given products. The total transaction amount consists of all kinds of transactions made through MinsaPay: subsidies, deposits, payments, and withdrawals.

Index and explanations

Index Explanation
transaction Indicates transaction number
user Indicates the anonymized user number
booth Indicates the anonymized booth number
timestamp Indicates the time of the transaction
type Indicates the transaction type
amount Indicates the amount of transaction
balance Indicates the user's balance after the transaction

There are 4 types of transactions

Type Explanation
open Opening the payment account. Opening the payment account. We provide ₩7,000 (about $7) of subsidy for senior students and ₩10,000 (about $10) of subsidy for teachers. Please check additional info.
charge Charging the payment account. The school council member will process the request and deposit the cash.
pay Paying the bills.
withdraw Withdrawing the prepaid money (usually after the festival). The school council member will process the request and deliver the cash.

Booths can calculate the sales statement afterward. Every profits other than the production cost will be used for student council operation fees.

Additional info

  • Accounts were opened and charged at the student council booth, which is booth#000.
  • Festival organizers have agreed not to charge our teachers. Therefore, teachers were charged more than ₩1,000,000 (about $1,000)
  • There were three students with charge errors.
    • user#298 charged ₩5,464,526 (about $4,550) at transaction 561. This was a human error made by the student council member entering random values in the input field. This was fixed 12 seconds later by the same member, at transaction 565.
    • user#295 charged ₩2,147,483,647 (about $1,787,440) at transaction 557. This was a human error by the student council member tagging the Student RFID in the charge amount field. Since the RFID value was greater than the max value that int can represent, it was converted to the max value that an int can represent. The student council has later recognized this problem and fixed at transaction 2624 by manually calculating the student’s actual balance.
    • user#284 charged ₩2,147,483,647 (about $1,787,440) at transaction 2844. This was the same type of human error like the one of user#295 and was fixed 11 seconds later at transaction 2845.
  • At the day of the festival, the MinsaPay server went down for 13 minutes 15 seconds from 10:17:55 AM to 10:31:10 AM. This can be seen by the gap between transaction 1546 and 1547. The problem was due to the sudden surge of user request exceeding the free tier limit of the database. The problem was fixed soon, and the problem did not reoccur after purchasing a paid plan.
  • During the momentary server failure, each booths created their own billing spreadsheet. After the festival, these spreadsheets were collected and merged. Students payed their unpaid bills through the credit payment booth, which is booth#017.
  • A $0 payment is repetitively made in booth#001. It is because of the free pass given to any students designated by Booth#001. Any students with this free pass can get unlimited access to any product sold in booth#001.

anonymize.ipynb

import pandas as pd
Dataframe = pd.read_csv('raw.csv')
def anonymize(df, targetColumn):
    anon = {}
    id = 0
    for x in range(len(df)):
        user = df.loc[x, targetColumn]
        if user in anon:
            df.loc[x, targetColumn] = anon[user]
        else:
            if id < 10:
                unknown = "#00"+ str(id)
            elif id < 100:
                unknown = "#0" + str(id)
            else:
                unknown = "#"    + str(id)
            anon[user] = targetColumn + str(unknown)
            id += 1
            df.loc[x, targetColumn] = anon[user]
anonymize(Dataframe, 'user')
anonymize(Dataframe, 'booth')
Dataframe.to_csv("anonymized.csv", mode='w')

About

Transaction Data of MinsaPay 2019

License:MIT License


Languages

Language:Jupyter Notebook 100.0%