The purpose of this project is to demonstrate my coding and writing skills on the subject of Tensorflow, Keras RNN's, Python, matplotlib, and numpy to create an endless supply of machine-generated positive and negative natural-English paragraph movie reviews.
The data to train the recurrent neural network comes from anonymous English movie reviews from Stanford's "Large movie review" dataset in traditional written English paragraph form.
80MB download the aclImdb
tar file from: http://ai.stanford.edu/~amaas/data/sentiment
Extract the aclImdb
tar file locally, produces a directory.
The subset of the data I want is in: ~/aclImdb/train/pos
and ~/aclImdb/train/neg
.
Each positive and negative review is in its own text file. So data preparation and cleaning is required.
A short script can join all the separate review files into a single file that has a different positive reivew on every line. And also the same for all negative reviews.
Keeping things as simple as possible, bash commands can do this quickly:
cd ~/aclImdb/train/pos
for f in *.txt; do (cat "${f}"; echo) >> positive_movie_reviews.txt; done
And
cd ~/aclImdb/train/neg
for f in *.txt; do (cat "${f}"; echo) >> negative_movie_reviews.txt; done
$ head -n 1 positive_movie_reviews.txt
"Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!"
$ head -n 1 negative_movie_reviews.txt
"Story of a man who has unnatural feelings for a pig. Starts out with a opening scene that is a terrific example of absurd comedy. A formal orchestra audience is turned into an insane, violent mob by the crazy chantings of it's singers. Unfortunately it stays absurd the WHOLE time with no general narrative eventually making it just too off putting. Even those from the era should be turned off. The cryptic dialogue would make Shakespeare seem easy to a third grader. On a technical level it's better than you might think with some good cinematography by future great Vilmos Zsigmond. Future stars Sally Kirkland and Frederic Forrest can be seen briefly."
I'm using Mac OSX, instructions vary between operating systems. See the buildrun.sh for notes on the commands I used to install libraries and tools.
Either run directly:
python3 main.py
Or use the buildrun.sh
./buildrun.sh
Optimization Iteration: 0, Training Loss: [4.889333724975586]
Text generated: we(’£z,p| c<,¤3è°9³(]róe=a|��"= käg0₤·'?¿~(�t5qiaêf#�¨n_�96o`
Time elapsed: 0:00:04
Optimization Iteration: 50, Training Loss: [3.0646657943725586]
Text generated: weòw e efaob 'dt wyhd aheti ceelcaob eeomahtnt i intc tair na
Time elapsed: 0:00:39
Optimization Iteration: 1000, Training Loss: [2.894775390625]
Text generated: weiis norer salerylt eatca t.rfs pir inni sh.as n roukiasw cas
Time elapsed: 0:12:45
Optimization Iteration: 1200, Training Loss: [2.7938120365142822]
Text generated: wei p onsl lr is l r uwos sr es loxite ou . ennbiiplem ntl
Time elapsed: 0:15:17
Optimization Iteration: 1350, Training Loss: [2.4512953758239746]
Text generated: weeliti- s noin sha isipisdong gong as otle.ahesiolmedel ieet,
Time elapsed: 0:17:04
Optimization Iteration: 1450, Training Loss: [2.3751416206359863]
Text generated: wee re, tulle bleakceeesdind sintti, hors at the leciz an ind
Time elapsed: 0:18:23
Optimization Iteration: 1500, Training Loss: [2.2491776943206787]
Text generated: weind ing halahes thit taua ar the that's 4 owce bughuononi h
Time elapsed: 0:19:12
Optimization Iteration: 1600, Training Loss: [2.164762496948242]
Text generated: wehinwnr seals. o cle muyher... be folr komk the feoxing aba
Time elapsed: 0:20:43
Optimization Iteration: 1800, Training Loss: [2.018953323364258]
Text generated: weale pay a baridery wowit the loles one ou kousroo, a male oe
Time elapsed: 0:23:15
Optimization Iteration: 1850, Training Loss: [1.9569777250289917]
Text generated: weot all if mixker, thut this nowpl one the tongered with houk
Time elapsed: 0:23:53
Optimization Iteration: 1900, Training Loss: [1.94167160987854]
Text generated: weing. becofilipaillyched the't he hel ersion jist. that misio
Time elapsed: 0:24:34
Optimization Iteration: 1950, Training Loss: [1.9175297021865845]
Text generated: weam and in tho art a tagh theys ricaler.. by agaid who clued
Time elapsed: 0:25:09
Optimization Iteration: 2050, Training Loss: [1.8174198865890503]
Text generated: we. det no many end is al very. carvy trach peting, dough than
Time elapsed: 0:26:27
Optimization Iteration: 2100, Training Loss: [1.7918057441711426]
Text generated: we enestmand gigunagubolesion. This movie. an this is flung a
Time elapsed: 0:26:59
Optimization Iteration: 2300, Training Loss: [1.719719409942627]
Text generated: weheres; hiss alwo and as the feave mis, that ear from his aca
Time elapsed: 0:29:16
Optimization Iteration: 2350, Training Loss: [1.6416845321655273]
Text generated: we heng relating fallia got as gtoral clanti and that contros-
Time elapsed: 0:29:49
Optimization Iteration: 2400, Training Loss: [1.6665116548538208]
Text generated: we of the for known deal pabent) to work sone to foun canytimu
Time elapsed: 0:30:23
Optimization Iteration: 2500, Training Loss: [1.6678515672683716]
Text generated: weire, frente preft a phentia resire. this touse for young yin
Time elapsed: 0:31:34
Optimization Iteration: 2700, Training Loss: [1.5771021842956543]
Text generated: weaturely wearion, i comtragent, after ughard actors.<br /><br
Time elapsed: 0:33:57
Optimization Iteration: 2750, Training Loss: [1.5773215293884277]
Text generated: we up to family are this is need in seem who seemphy seems the
Time elapsed: 0:34:33
Optimization Iteration: 2800, Training Loss: [1.5664803981781006]
Text generated: we give songly interplay good could you mention an every class
Time elapsed: 0:35:10
Optimization Iteration: 2850, Training Loss: [1.6122658252716064]
Text generated: weherven angwor to mecheared & home we out it's hall and his s
Time elapsed: 0:35:59
Optimization Iteration: 2900, Training Loss: [1.5541921854019165]
Text generated: we deatt one of the same lific viewing carderdeea (and in dead
Time elapsed: 0:36:45
Optimization Iteration: 2950, Training Loss: [1.5128093957901]
Text generated: weilwewnattarta neur bisi, i have a yead in aught. the two epi
Time elapsed: 0:37:20
Optimization Iteration: 3000, Training Loss: [1.5613665580749512]
Text generated: week who here, but the cultroly"'s...5atary of the endure for
Time elapsed: 0:37:53
Optimization Iteration: 3050, Training Loss: [1.5188218355178833]
Text generated: weoner" in deasing in great possible. the faked with the dymb
Time elapsed: 0:38:38
Optimization Iteration: 3100, Training Loss: [1.5796984434127808]
Text generated: weose to yessing to the middre to real stunning it see rather
Time elapsed: 0:39:19
Optimization Iteration: 3150, Training Loss: [1.5575224161148071]
Text generated: weatle. body can robbing myservolessly, this i though 1d (segs
Time elapsed: 0:39:51
Optimization Iteration: 3200, Training Loss: [1.5163943767547607]
Text generated: weir and a seems is the partyey go (fey to charge of the most
Time elapsed: 0:40:31
Optimization Iteration: 3250, Training Loss: [1.5409008264541626]
Text generated: weeks of laugring serie (have free studying upout us good up o
Time elapsed: 0:41:12
Optimization Iteration: 3300, Training Loss: [1.5678426027297974]
Text generated: weother than the truch of its biff life for o like film, the s
Time elapsed: 0:41:53
Optimization Iteration: 3350, Training Loss: [1.508866310119629]
Text generated: we hala mattie's is this relative. part story in the bins of t
Time elapsed: 0:42:32
The RNN converges around 1.50 training loss. Reading the autogenerated sentences certainly gives a movie review vibe.