henry232323 / LanguageGen

A first test at language generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Language Gen

I'd like this to be a tool for generating random language stuff, given a sound inventory and maybe some other random parameters. For now, it only has a sound change applier, which can either be configured directly, or the sound changes can be chosen randomly (up to a given number of changes) to your corpus.

The sound changes are scraped from index-fixed.txt, which is a pre-processed version of index-diachronica.txt, which was pdf-to-text'd from index-diachronica.pdf, which is available on the index diachronica website. I realize index diachronica has some questionable things, like sound changes supposedly from the controversial and generally unaccepted Altaic language family, but they are included since this is meant more to be for interesting changes that could plausibly happen. Its a bit buggy sometimes but it usually works pretty well.

In simulator.py is the runchanges function, which takes a corpus, a number of rules to apply, and optionally a set of rules, which are by default generated in changes.py as changes.default_rules. Each rule is a regex target matched to a replacement function, which are generated by changes.parse, see test.py for an example of how to use it.

changes.parse matches a number of things.

  • A left side and a right side, multiple items are treated as paired listings, e.g. "t tʃ k kw → d dʒ ɡ gw" matches 't' to 'd' and 'tʃ' to 'dʒ', so on
  • There are a number of groups which can be matched, these are defined in changes.codes, which matches a letter to a group of sounds.
    • A affricates
    • C consonants
    • B back vowels
    • D voiced plosives
    • E front vowels
    • F fricatives
    • H pharyngeal and glottal consonants
    • J approximants
    • K velars
    • Ḱ palatovelars
    • N nasals
    • P bilabials
    • Q uvulars
    • R resonants / sonorants
    • S plosives
    • T voiceless plosives
    • V vowels
    • W glides
  • You can mark devoicing in changes, e.g. C[+voice] → C[-voice]
  • You can mark multiple targets, ɒ → {ɛ,e}, or {ɛ,e} → ɒ will create two rules, each.
  • You can mark location of the occuring sequence, by following it with / #_, / _#, / _#_
  • Surround a sequence with parenthesis to make it optional, o(ː) → u will target [o] or [oː]
  • Spaces mark all separations, so don't put spaces in willy nilly.
  • You can target combinations of things, o(ː)n → õ / #_, this will target [o] or [oː] and nasalize it, word final.

About

A first test at language generation


Languages

Language:Python 100.0%