nlp pattern-recognition uml uml-diagrams user-stories

PyUNML DataSet

This repository contains all the files used and generated for the construction of a computational model for the automatic generation of class diagrams and use cases from user stories.

Original datasets

In the Dalpiaz2018 dataset there are 28 text files in English language. Each of these files contains more than 50 user stories and describes the expected functionalities of a particular application. In total, this dataset provides information on 22 different applications. This dataset was compiled by Fabiano Dalpiaz and can be downloaded from Data Mendeley.

In the Dalpiaz2020 dataset there are 15 sets of files composed of detailed user stories and/or use cases grouped in 3 different categories (SIM, HOS, IFA), according to the application they describe. Each of the sets contains descriptions of one of three different applications, which correspond to:

Hospital management system (HOS)
Urban traffic simulator (SIM)
International football association portal (IFA)

This dataset was collected by Fabiano Dalpiaz and can be downloaded from Zenodo.

Translated dataset

33 text files with different sets of user stories were extracted from the obtained datasets (22 files from the Dalpiaz2018 dataset and 11 files from the Dalpiaz2020 dataset). Since these are in English, they are translated into Spanish through the automatic translator Google Translate, which can provide a translation to the target language with great precision.

The translation of each of the files is stored in a new text file which is named with an identifier assigned to the translated file in order to facilitate its identification and analysis. The following table shows the text files obtained from each dataset and the corresponding translated text file. All translated files are located in the Translated Dataset folder.

Original datasets	Original file	Translated file
Dalpiaz2018	g002-federalspending	US1
Dalpiaz2018	g03-loudoun	US2
Dalpiaz2018	g04-recycling	US3
Dalpiaz2018	g05-openspending	US4
Dalpiaz2018	g08-frictionless	US5
Dalpiaz2018	g10-scrumalliance	US6
Dalpiaz2018	g11-nsf	US7
Dalpiaz2018	g12-camperplus	US8
Dalpiaz2018	g13-planningpoker	US9
Dalpiaz2018	g14-datahub	US10
Dalpiaz2018	g16-mis	US11
Dalpiaz2018	g17-cask	US12
Dalpiaz2018	g18-neurohub	US13
Dalpiaz2018	g19-alfred	US14
Dalpiaz2018	g21-badcamp	US15
Dalpiaz2018	g22-rdadmp	US16
Dalpiaz2018	g23-archivesspace	US17
Dalpiaz2018	g24-unibath	US18
Dalpiaz2018	g25-duraspace	US19
Dalpiaz2018	g26-racdam	US20
Dalpiaz2018	g27-culrepo	US21
Dalpiaz2018	g28-zooniverse	US22
Dalpiaz2020	g1	US23
Dalpiaz2020	g2	US24
Dalpiaz2020	g4	US25
Dalpiaz2020	g5	US26
Dalpiaz2020	g6	US27
Dalpiaz2020	g8	US28
Dalpiaz2020	g9	US29
Dalpiaz2020	g10	US30
Dalpiaz2020	g11	US31
Dalpiaz2020	g12	US32
Dalpiaz2020	g14	US33

Manually generated UML diagrams

For each of the translated files, there is a folder named with the same identifier as that file, which contains:

Class diagram
Use case diagrams
List of entities present

Papers

Tovar Onofre, M.Á., Camargo, J.E. (2023). Automatic Class Extraction from Spanish Text of User Stories Using Natural Language Processing. In: Narváez, F.R., Urgilés, F., Bastos-Filho, T.F., Salgado-Guerrero, J.P. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2022. Communications in Computer and Information Science, vol 1705. Springer, Cham. https://doi.org/10.1007/978-3-031-32213-6_3

About

This repository contains all the files used and generated for the construction of a computational model for the automatic generation of class diagrams and use cases from user stories.

nlp pattern-recognition uml uml-diagrams user-stories