koddr / json2csv

🚴 The parser can read given folder with JSON files, filtering and qualifying input data with intent & stop words dictionaries and save results to CSV files by given chunk size.

Home Page:https://github.com/koddr/json2csv/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use case

gedw99 opened this issue · comments

commented

I am struggling to understand the description at https://github.com/koddr/json2csv#-solving-case

could you add a real world use case with example in a folder ?

I am intrigued be Ayse I do lots of work with csv and json.

i don’t get the reasons of intend or word filters or why you add json I to the first column of the csv etc

so def need a real use case example !!

Hope that’s ok ..

Hi,

Thanks for interesting! OK, let me try to describe a more real-world use case of json2csv. Therefore, I will describe exactly the case that led me to write this parser 😊

I had about 800k (~2.2 GB in zip archive) JSON files as input, where each one contained structured content in that format:

[
   {
      "chat_uuid": "***",
      "message_uuid": "***",
      "assigned_team": null,
      "who": "user",
      "created_at": "2022-09-08-08T08:30:46.109596+00:00",
      "type": "botrequest",
      "content": "/start"
   },
   {
      "chat_uuid": "***",
      "message_uuid": "***",
      "assigned_team": null,
      "who": "user",
      "created_at": "2022-09-08-08T08:50:10.110780+00:00",
      "type": "botrequest",
      "content": "Hello! I need course."
   },
   {
      "chat_uuid": "***",
      "message_uuid": "***",
      "assigned_team": "Forced Sessions",
      "who": "operator",
      "created_at": "2022-09-08-08T11:04:12.682817+00:00",
      "type": "botstate",
      "content": "Good afternoon, my name is Daniel. You were interested in the courses of educational platform. We are ready to tell you more about the courses.\nWhat direction is most interesting?"
   },

   ...
]

And I needed to:

  1. Select from these files only those objects that have "who": "user";
  2. Throw out from this selection some objects, e.g., where "content": "/start";
  3. Perform a quick search for string occurrences by certain parameters;
  4. Save the resulting objects in CSV format;
  5. Break these CSVs into smaller parts, e.g., 1000 lines per file;

Then, this was sent to ML specialists to customize the model for artificial intelligence.

In other words, I had a lot of raw data, I needed to prepare it for future use in a specific format. This is exactly the task that json2csv solves.

I hope it is now more or less clear how to apply it.