merely-useful / py-rse

Research Software Engineering with Python course material

Home Page:http://third-bit.com/py-rse/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Random word generator library is tricky

k8hertweck opened this issue · comments

The library we use to create a list of random words for testing seems to be a bit unstable, we might want to include a contingency plan.

That's a shame, because there are only two random word generation libraries that I could find:

The latter will be more stable because it isn't actually sourcing real words from anywhere, it just mashes together a random selection of letters to form "words".

Despite the downside of not having a list of real words, we should probably use RandomWordGenerator given the issues with the other package. The relevant part of the testing chapter would read as follows:

Fortunately, a Python library called RandomWordGenerator exists to do just that. We can install it using pip, the Python Package Installer:

$ pip install Random-Word-Generator

Borrowing from the word count distribution we created for test_alpha, we can then create a text file full of random words with a frequency distribution that corresponds to an α of approximately 1.0:

import numpy as np
from RandomWordGenerator import RandomWord

max_freq = 600
word_counts = np.floor(max_freq / np.arange(1, max_freq + 1))
rw = RandomWord()
random_words = rw.getList(num_of_words=max_freq)
writer = open('test_data/random_words.txt', 'w')
for index in range(max_freq):
    count = int(word_counts[index])
    word_sequence = f"{random_words[index]} " * count
    writer.write(word_sequence + '\n')
writer.close()

Included in #560

resolved by d78fd0e

I had released a new version on PyPI (https://pypi.org/project/Random-Word/1.0.6/), which will fix recent issues. Next weekend, I will refactor the whole repository to support multiple sources like Oxford, etc

PS: I didn't know that this small project would blow up, as I made this because I wanted to use this in one of my projects at university.
Cc @k8hertweck / @DamienIrving