andreamorgar / poesIA

A poetry generator from a scrapped corpus of Spanish poetry. EDA and general NLP task included.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

poesIA: a poetry generator in Spanish πŸ“šπŸ“šπŸ’»

A poetry generator from a scrapped corpus of Spanish poetry. EDA and general NLP tasks are included.

Poem genetator πŸ“šπŸ€―

Here a visualization of the generator performance with streamlit. Given some words.... it generates a beautiful poem in Spanish✨✨✨✨

wordcloud

Exploratory Data Analysis πŸ”ŽπŸ”Ž

We generated an overview of the whole data. We analyze the scope and length of the vocabulary involved, generating some nice visualizations ☁️☁️☁️

wordcloud

We decided to make some word counts as well as search for relations between authors and poems in the whole dataset πŸ“ˆ Author count

We also took into account specific authors and established some comparisons. We detected relations between textual data such as antithesis and polysemy. Awesome isn't it? 🀩

graph2

An embedding model was build to detect polysemy, similar words, and common word collocations in poetry. So many word relations in poems!

wordcloud.

Also, Voronoi graphs were made...πŸ“ˆπŸ“ˆπŸ“ˆπŸ“ˆ

wordcloud.

Relevant codes

notebooks

  • [EDA of the poetry dataset](notebooks/data exploration.ipynb): Exploratory Data Analysis of the dataset, including a basic NLP complete task!

  • [Poem genetator code](notebooks/poetry generator.ipynb): code to generate synthetic poems with a RNN.

Talks

This project has been presented as a talk in the PyConEs 2020 (Pandemic Edition). You can find the slides in this repo and the video in youtube.

About

A poetry generator from a scrapped corpus of Spanish poetry. EDA and general NLP task included.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%