ronald-kimeli / generate_SplittedCsv_S3BUCKET

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S3-Buckect jsonlines splitted to CSV

This is an automated script to download zstandard zipped jsonlines file from the S3-BUCKET cloud and unzip it automatically to jsonlines by splitting it to smaller files then use it to generate CSV siles.

Simple steps for quick Installation!

  • Clone this repository to your computer.

    git clone https://github.com/KimelirR/generate_SplittedCsv_S3BUCKET.git
    
  • Create .env file

    cp .env.example .env
    
  • Provide credentials of your S3-BUCKET below in .env file

     KEY=?
     SECRET=?
     REGION=?
     BUCKET=?
    
  • Install required dependencies through

     composer install
    

Note!

  1. Ensure you give credentials of your s3bucket correctly.

Lastly! Generate Csv

  • All the functions and classes are inside src folder.

    php index.php
    • Download the latest json lines manually and append filepath like example down below to deline.
    $json_lines = (new JsonLines())->delineEachLineFromFile('jobs_2022_11_30.jsonl'); 
    • Otherwise using Linux environment everything will be executed automatically,
    $json_lines = (new JsonLines())->delineEachLineFromFile($path);

    $path Comes from aws autoload

    • Index.php saves file into current Folder._
    • generate_csv.php outputs a downloaded file with contents written on.

Generally Split class split into 5000 each file . you can edit on split.class.php

About


Languages

Language:PHP 99.5%Language:Shell 0.5%