nwilliams770 / markov-of-middle-earth

πŸ§™β€β™‚οΈβ›“πŸ’ Text generator using a Markov Chain, model trained on the Lord of The Rings Trilogy. Python 3.6

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lord of the Markov

Markov of Middle-Earth

A Markov chain implementation, trained on J. R. R. Tolkien's Lord of the Rings trilogy.

Usage

(python markov.py) sentence <int>  | sentence of <int> length
(python markov.py) sentences <int> | <int> # of sentences

Features & Technologies

  • Trains a reusable model on any .txt corpus
  • Ability to produce an n-length sentence or n sentences
  • Python 3.6

Implementation

  • Build Phase:
    • In order to generate a model with a more structured sense of syntax, the corpus is broken into sentence arrays and sandwiched with "START" and "END" markers. Including these markers in our training data ultimately limits what words and characters will be used to start and terminate sentences when we want to generate sentences.

    • Our model is a simple dict() and as sentences are parsed word by word, each unique word is assigned as a key in the model where the value is a probability distribution represented as a FrequencyGram

    • For a given word, the FrequencyGram counts the frequencies of all words that directly follow it. From this, we can derive the current state and possible outcomes for any given word

    • Once we've iterated over the entire corpus, our model will be a dict where each key is a unique word, marker, or punctuation mark and each value is it's corresponding FrequencyGram

      FrequencyGram for "twilight"
      "twilight": {
          "the": 1,
          ",": 2,
          "in": 1,
          "came": 1,
          "as": 1,
          "under": 1,
          ".": 1 
      }
      
    • FrequencyGram has some added methods that generate a probability distribution and returns a weighted random word based on it. This is how we'll traverse from state to state.

  • Generative Phase:
    • Generating a sentence:
      • We'll first generate a starter word by requesting a random word from our "START" FrequencyGram: sentence_starter = markov_model["START"].return_weighted_rand_word()
      • From here, we just follow the chain. We can track our current word and update it accordingly as we request a random word from the corresponding FrequencyGram
      • We'll know that our sentence is complete once our current word is the "END" marker.

Examples

  • Below are some of the best examples generated so far. The model tends to struggle with generating clear dialogue blocks but some of the smaller sentences were impressive! Lord of the Markov Examples

About

πŸ§™β€β™‚οΈβ›“πŸ’ Text generator using a Markov Chain, model trained on the Lord of The Rings Trilogy. Python 3.6


Languages

Language:Python 100.0%