PyUNML DataSet
This repository contains all the files used and generated for the construction of a computational model for the automatic generation of class diagrams and use cases from user stories.
Original datasets
In the Dalpiaz2018 dataset there are 28 text files in English language. Each of these files contains more than 50 user stories and describes the expected functionalities of a particular application. In total, this dataset provides information on 22 different applications. This dataset was compiled by Fabiano Dalpiaz and can be downloaded from Data Mendeley.
In the Dalpiaz2020 dataset there are 15 sets of files composed of detailed user stories and/or use cases grouped in 3 different categories (SIM, HOS, IFA), according to the application they describe. Each of the sets contains descriptions of one of three different applications, which correspond to:
- Hospital management system (HOS)
- Urban traffic simulator (SIM)
- International football association portal (IFA)
This dataset was collected by Fabiano Dalpiaz and can be downloaded from Zenodo.
Translated dataset
33 text files with different sets of user stories were extracted from the obtained datasets (22 files from the Dalpiaz2018 dataset and 11 files from the Dalpiaz2020 dataset). Since these are in English, they are translated into Spanish through the automatic translator Google Translate, which can provide a translation to the target language with great precision.
The translation of each of the files is stored in a new text file which is named with an identifier assigned to the translated file in order to facilitate its identification and analysis. The following table shows the text files obtained from each dataset and the corresponding translated text file. All translated files are located in the Translated Dataset folder.
Original datasets | Original file | Translated file |
---|---|---|
Dalpiaz2018 | g002-federalspending | US1 |
Dalpiaz2018 | g03-loudoun | US2 |
Dalpiaz2018 | g04-recycling | US3 |
Dalpiaz2018 | g05-openspending | US4 |
Dalpiaz2018 | g08-frictionless | US5 |
Dalpiaz2018 | g10-scrumalliance | US6 |
Dalpiaz2018 | g11-nsf | US7 |
Dalpiaz2018 | g12-camperplus | US8 |
Dalpiaz2018 | g13-planningpoker | US9 |
Dalpiaz2018 | g14-datahub | US10 |
Dalpiaz2018 | g16-mis | US11 |
Dalpiaz2018 | g17-cask | US12 |
Dalpiaz2018 | g18-neurohub | US13 |
Dalpiaz2018 | g19-alfred | US14 |
Dalpiaz2018 | g21-badcamp | US15 |
Dalpiaz2018 | g22-rdadmp | US16 |
Dalpiaz2018 | g23-archivesspace | US17 |
Dalpiaz2018 | g24-unibath | US18 |
Dalpiaz2018 | g25-duraspace | US19 |
Dalpiaz2018 | g26-racdam | US20 |
Dalpiaz2018 | g27-culrepo | US21 |
Dalpiaz2018 | g28-zooniverse | US22 |
Dalpiaz2020 | g1 | US23 |
Dalpiaz2020 | g2 | US24 |
Dalpiaz2020 | g4 | US25 |
Dalpiaz2020 | g5 | US26 |
Dalpiaz2020 | g6 | US27 |
Dalpiaz2020 | g8 | US28 |
Dalpiaz2020 | g9 | US29 |
Dalpiaz2020 | g10 | US30 |
Dalpiaz2020 | g11 | US31 |
Dalpiaz2020 | g12 | US32 |
Dalpiaz2020 | g14 | US33 |
Manually generated UML diagrams
For each of the translated files, there is a folder named with the same identifier as that file, which contains:
- Class diagram
- Use case diagrams
- List of entities present
Papers
- Tovar Onofre, M.Á., Camargo, J.E. (2023). Automatic Class Extraction from Spanish Text of User Stories Using Natural Language Processing. In: Narváez, F.R., Urgilés, F., Bastos-Filho, T.F., Salgado-Guerrero, J.P. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2022. Communications in Computer and Information Science, vol 1705. Springer, Cham. https://doi.org/10.1007/978-3-031-32213-6_3