ZnS-20 / Intermine-Schema-Validator

POC for Intermine schema validation under GSOC 2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intermine Schema Validator

This is a POC for Intermine's Intermine Schema Validator Project.

The idea is to develop a jar file for parsing a biological file which checks whether the given biological data file is correct or not.

In this project the biological files which are to be validated are GFF,FASTA and CSV or TAB file.

My Thinking

There are two ways in which file can be parsed:-

  1. Validating the syntax of the file (excluding checking all the lines) :- This will give the type of the file and the sequence in which the data is stored in it which will help me the next step i.e, checking the content of the file.

  2. Checking every line of the file :- In this step the file should be verified throughly i.e, checking every line of the file to check whether the stored data is correct or not.

I have implemented the code to check the extension of the file.

Next, I am planning to check the syntax of .gff3 at both levels i.e, premissive and strict,followed by .fasata and then csv or tab.

About

POC for Intermine schema validation under GSOC 2019


Languages

Language:Java 100.0%