matovaro / PyUNML-DataSet

This repository contains all the files used and generated for the construction of a computational model for the automatic generation of class diagrams and use cases from user stories.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyUNML DataSet

This repository contains all the files used and generated for the construction of a computational model for the automatic generation of class diagrams and use cases from user stories.

Original datasets

In the Dalpiaz2018 dataset there are 28 text files in English language. Each of these files contains more than 50 user stories and describes the expected functionalities of a particular application. In total, this dataset provides information on 22 different applications. This dataset was compiled by Fabiano Dalpiaz and can be downloaded from Data Mendeley.

In the Dalpiaz2020 dataset there are 15 sets of files composed of detailed user stories and/or use cases grouped in 3 different categories (SIM, HOS, IFA), according to the application they describe. Each of the sets contains descriptions of one of three different applications, which correspond to:

  • Hospital management system (HOS)
  • Urban traffic simulator (SIM)
  • International football association portal (IFA)

This dataset was collected by Fabiano Dalpiaz and can be downloaded from Zenodo.

Translated dataset

33 text files with different sets of user stories were extracted from the obtained datasets (22 files from the Dalpiaz2018 dataset and 11 files from the Dalpiaz2020 dataset). Since these are in English, they are translated into Spanish through the automatic translator Google Translate, which can provide a translation to the target language with great precision.

The translation of each of the files is stored in a new text file which is named with an identifier assigned to the translated file in order to facilitate its identification and analysis. The following table shows the text files obtained from each dataset and the corresponding translated text file. All translated files are located in the Translated Dataset folder.

Original datasets Original file Translated file
Dalpiaz2018 g002-federalspending US1
Dalpiaz2018 g03-loudoun US2
Dalpiaz2018 g04-recycling US3
Dalpiaz2018 g05-openspending US4
Dalpiaz2018 g08-frictionless US5
Dalpiaz2018 g10-scrumalliance US6
Dalpiaz2018 g11-nsf US7
Dalpiaz2018 g12-camperplus US8
Dalpiaz2018 g13-planningpoker US9
Dalpiaz2018 g14-datahub US10
Dalpiaz2018 g16-mis US11
Dalpiaz2018 g17-cask US12
Dalpiaz2018 g18-neurohub US13
Dalpiaz2018 g19-alfred US14
Dalpiaz2018 g21-badcamp US15
Dalpiaz2018 g22-rdadmp US16
Dalpiaz2018 g23-archivesspace US17
Dalpiaz2018 g24-unibath US18
Dalpiaz2018 g25-duraspace US19
Dalpiaz2018 g26-racdam US20
Dalpiaz2018 g27-culrepo US21
Dalpiaz2018 g28-zooniverse US22
Dalpiaz2020 g1 US23
Dalpiaz2020 g2 US24
Dalpiaz2020 g4 US25
Dalpiaz2020 g5 US26
Dalpiaz2020 g6 US27
Dalpiaz2020 g8 US28
Dalpiaz2020 g9 US29
Dalpiaz2020 g10 US30
Dalpiaz2020 g11 US31
Dalpiaz2020 g12 US32
Dalpiaz2020 g14 US33

Manually generated UML diagrams

For each of the translated files, there is a folder named with the same identifier as that file, which contains:

  • Class diagram
  • Use case diagrams
  • List of entities present

Papers

  • Tovar Onofre, M.Á., Camargo, J.E. (2023). Automatic Class Extraction from Spanish Text of User Stories Using Natural Language Processing. In: Narváez, F.R., Urgilés, F., Bastos-Filho, T.F., Salgado-Guerrero, J.P. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2022. Communications in Computer and Information Science, vol 1705. Springer, Cham. https://doi.org/10.1007/978-3-031-32213-6_3

About

This repository contains all the files used and generated for the construction of a computational model for the automatic generation of class diagrams and use cases from user stories.