hkpeaks / peaks-consolidation

The Peaks Consolidation is equipped with state-of-the-art algorithms and data structures that support high-performance databending exercises. It specializes in management accounting and consolidation, with some special topics in machine learning and bioinformatics.

Home Page:https://www.linkedin.com/in/max01/recent-activity/all/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New Crossplatform App

Instant File Preview and Validation for Giant CSV File

Since comma is not necessary be a delimiter of CSV file, this app can detect other delimiter automatically with the assumption number of delimiter for every row must be the same. The app validates first row for each partition of a file as it divide file into 100 partitions, so you can get 100 sample rows to disk (display first 20 rows to screen). Inside the source code, an instruction is helping you how to change the number from 100 to 1000.

Download URL: https://github.com/hkpeaks/peaks-consolidation/tree/main/Documents/PreviewFile

  • main.go is a Golang version
  • main.rs is a Rust version
  • Peaks.py is a Python version

Demo video: https://lnkd.in/gCTFR9rh

New Query Statement for File, In-memory Table and Network Stream

Note: Use of "." to indicate it is member of your defined function is optional. First line is to define data extraction and data load. Below are 3 possible scenarios:-

UserDefineFunctionName = from Extraction to Load

Or

UserDefineFunctionName = from Extraction, Extraction, Extraction to Load

Or

UserDefineFunctionName = from Extraction to Load, Load, Load

You can define query/data transformation function from second line and after.

Examples:

ExpandFile = from Fact.csv to 1BillionRows.csv

.ExpandFactor: 123

JoinTable = from 1BillionRows.csv to Test1Results.csv

.Filter: Saleman(Mary,Peter,John)

.JoinTable: Product, Category => InnerJoin(Master.csv)

.AddColumn: Quantity, Unit_Price => Multiply(Amount)

.Filter: Amount(Float20000..29999)

.GroupBy: Saleman, Shop, Product => Count() Sum(Quantity) Sum(Amount)

.OrderBy: Saleman(A) Product(A) Date(D)

SplitFile = from Test1Results.csv to FolderLake

.CreateFolderLake: Shop

FilterFolder = from Outbox/FolderLake/S15/*.csv to Result-FilterFolderLake.csv

.Filter: Product(222..888) Style(=F)

ReadSample2View = from Outbox/Result-FilterFolderLake.csv to SampleTable

.ReadSample: StartPosition%(0) ByteLength(100000)

.View

Command List

AddColumn{Column, Column => Math(NewColName)}

    where Math includes Add, Subtract, Multiply & Divide

BuildKeyValue{Column, Column ~ KeyValueTableName}

CurrentSetting{StreamMB(Number) Thread(Number)}

Distinct{Column, Column}

Filter{Column(CompareOperator Value) Column(CompareOperator Value)}

FilterUnmatch{Column(CompareOperator Value) Column(CompareOperator Value)}

    where Compare operator includes >,<,>=,<=,=,!= & Range e.g. 100..200
          Compare integer or float e.g. Float > Number, Float100..200

GroupBy{Column, Column => Count() Sum(Column) Max(Column) Min(Column)}

JoinKeyValue{Column, Column => JoinType(KeyValueTableName)}

    where JoinType includes AllMatch, Filter & FilterUnmatch

JoinTable{Column, Column => JoinType(KeyValueTableName)}

    where JoinType includes AllMatch & InnerJoin

OrderBy{PrimaryCol(Sorting Order) SecondaryCol(Sorting Order)}

OrderBy{SecondaryCol(Sorting Order) => CreateFolderLake(PrimaryCol) ~ FolderName or FileName.csv}

    where Sorting Order represents by A or D, to sort real numbers, use either FloatA or FloatD

Read{FileName.csv ~ TableName}

ReadSample{StartPosition%(Number) ByteLength(Number)}

ReadSample{Repeat(Number) ByteLength(Number)}

Select{Column, Column}

SelectUnmatch{Column, Column}

SplitFile{FileName.csv ~ NumberOfSplit}

CreateFolderLake{Column, Column ~ SplitFolderName}

View{TableName}

Write{TableName ~ FileName.csv or %ExpandBy100Time.csv}

About

The Peaks Consolidation is equipped with state-of-the-art algorithms and data structures that support high-performance databending exercises. It specializes in management accounting and consolidation, with some special topics in machine learning and bioinformatics.

https://www.linkedin.com/in/max01/recent-activity/all/

License:MIT License


Languages

Language:Go 94.8%Language:Python 5.2%