rohitg1594 / Civic-Data-Lab-Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About CivicDataLab

CivicDataLab works with a goal to use data, tech, design and social science to strengthen the course of civic engagements. We work to harness the potential of open-source movement to enable citizens to engage better with public reforms. We aim to grow data and tech literacy of various governments, nonprofits, think-tanks, media houses, universities etc to enable data-driven decision making at scale. The team has been instrumental in starting initiatives like DataKind Bangalore, Open Budgets India, etc. We believe in becoming thought partners in change with the help of our collaborations.

About StoryWeaver

StoryWeaver, an initiative from Pratham Books, is designed to provide children with reading resources. StoryWeaver is a digital gateway to thousands of richly illustrated, open-licensed children's stories in mother tongue languages.

Problem Statement

Using language processing techniques, analyse how the complexity of language changes across levels of the story.

DataSet

The dataset (under the data folder) provided contain datadump from StoryWeaver.

It contains original and translated stories in English, Telugu and Hindi languages.

Column Description

Column Description
story_id Identifier of the story
title Title of the story
english_title TItile of the Story in english
reading_level_updated Reading complexity of the book
story_langugage Language of the story content
synopsis Short description of the Story
content Story content
category_name Comma separated values of the story categories
tag_name Comma separated values of the story tags(if any)
story_original_title Title of the original story if translated. Same as the title of the original story.

About

License:MIT License


Languages

Language:Jupyter Notebook 100.0%