This project is realized by Zakaria Boulkhir and me as a database course project in the Master 1 of data science at the university of Lille.
The goal of this project is to implement an algorithm to import an XML document into an SQL relational database, and translating the largest possible fragment of XPath to equivalent SQL queries using Python.
The main steps of this project are:
-
Proposing a relational encoding scheme for a given XML document.
-
Implementation of the functionality of importing an XML document
imdb.xml
into a relational database using SAX API. -
Developing of a scheme of translating largest possible fragment of XPath to equivalent SQL queries evaluated over the encoded instance.
-
Proposing an automated testing approach to verify that the system is working correctly.
-
Developing a protocol for experimental comparison of the querying (time) performance of the system and an existing XML query system.
The data for this project is composed of two databases, the first being employe XML database (xml-sax/rh.xml
), and the second one is movie XML database (DBS20/imdb-small.xml
).
To try our implementation in your own machine you need to install the following requirements:
pip install -r requirements.txt