This repository includes the scripts created to extract, explore and cluster 24 digitalized documents belonging to the Google Books Library project. The goal is to understand the data and unravel the underlying relationships between the documents. The main tools used are python and MongoDB.