Create a data mining plan for March/April, 2020
apoorv74 opened this issue · comments
Apoorv Anand commented
Requirements: Fetch data for cases registered under 66A cases eCourts
Geographies
- Uttar Pradesh
- Telangana
- Rajasthan
- Maharashtra
- Assam
- Andhra Pradesh
- Jharkhand
Time Period: 01/01/2008 - 01/03/2020
Process
- Identify a state for developing a methodology for case data collection
- This will help us finalise the processes that includes mining, verification and data-validations and then scale this to other geographies
- This will also help us come-up with better time estimates for the whole data collection exercise
- We have selected Jharkhand as the pilot state for this purpose
- Fetch all cases registered under The Information Technology Act, 2000.
- This will ensure that we don't loose 66A cases that are incorrectly tagged
- Collect patterns of the way IT act is recognised in all the district court establishments of a state
- Collect only meta-data of cases available (without any PDF's - Orders/Judgements) from eCourts
- A concern raised by IFF was we might miss on a few 66A cases that are not available directly under the IT act, but other acts and sections. IFF will do a preliminary analysis on this, before we include other acts in for fetching case records
- Filter out 66A cases from this set of cases
- This is more of a regular expression matching exercise, where we find patterns of 66A mentioned as part of the case
- Setup a data-validation and verification pipeline to verify if the cases are correctly tagged as 66A
- Model data as per the research requirements