This dataset is the first Lexical Simplification Dataset developed for Swedish as a part of a Bachelor's thesis in Cognitive Science at Linköping University. It contains 150 quadruples of complex words sourced from the Swedish Kelly list, their corpus frequencies in the "BloggMix odat" corpus, replacements to the complex word sourced from SynLex and their corresponding word frequencies in the BloggMix corpus, and an example sentence from SALDO where the complex word is found. The human assessment of each quadruple is also included in the dataset (regarding quality, coverage, and complexity).
For a more detailed description of the work, please follow this link: http://liu.diva-portal.org/smash/get/diva2:1767273/FULLTEXT01.pdf.
For links to other repositories related to this thesis, please see the following links:
Lexical Simplification System for Swedish: https://github.com/emilgraichen/SwedishLexicalSimplifier
Complex Word Identification Dataset: https://github.com/emilgraichen/SwedishCWI
BloggMix Odat: https://spraakbanken.gu.se/resurser/bloggmix
Kelly Swedish: https://spraakbanken.gu.se/resurser/kelly