sentientmachine / generating_movie_reviews_using_recurrent_neural_nets

Create an endless supply of machine-generated positive and negative natural-English paragraph movie reviews, using recurrent neural networks trained on Stanford's "Large movie review" dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generating movie reviews using recurrent neural nets:

The purpose of this project is to demonstrate my coding and writing skills on the subject of Tensorflow, Keras RNN's, Python, matplotlib, and numpy to create an endless supply of machine-generated positive and negative natural-English paragraph movie reviews.

Training Dataset Download source:

The data to train the recurrent neural network comes from anonymous English movie reviews from Stanford's "Large movie review" dataset in traditional written English paragraph form.

80MB download the aclImdb tar file from: http://ai.stanford.edu/~amaas/data/sentiment

Parse the dataset to be useful for our purposes:

Extract the aclImdb tar file locally, produces a directory.

The subset of the data I want is in: ~/aclImdb/train/pos and ~/aclImdb/train/neg.

Each positive and negative review is in its own text file. So data preparation and cleaning is required.

Data conversion, transformation, cleaning, preparation:

A short script can join all the separate review files into a single file that has a different positive reivew on every line. And also the same for all negative reviews.

Keeping things as simple as possible, bash commands can do this quickly:

cd ~/aclImdb/train/pos
for f in *.txt; do (cat "${f}"; echo) >> positive_movie_reviews.txt; done

And

cd ~/aclImdb/train/neg
for f in *.txt; do (cat "${f}"; echo) >> negative_movie_reviews.txt; done

See an example of a positive movie review:

$ head -n 1 positive_movie_reviews.txt

"Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!"

See an example of a negative movie review:

$ head -n 1 negative_movie_reviews.txt

"Story of a man who has unnatural feelings for a pig. Starts out with a opening scene that is a terrific example of absurd comedy. A formal orchestra audience is turned into an insane, violent mob by the crazy chantings of it's singers. Unfortunately it stays absurd the WHOLE time with no general narrative eventually making it just too off putting. Even those from the era should be turned off. The cryptic dialogue would make Shakespeare seem easy to a third grader. On a technical level it's better than you might think with some good cinematography by future great Vilmos Zsigmond. Future stars Sally Kirkland and Frederic Forrest can be seen briefly."

Install requirements and libraries:

I'm using Mac OSX, instructions vary between operating systems. See the buildrun.sh for notes on the commands I used to install libraries and tools.

Run Instructions

Either run directly:

python3 main.py

Or use the buildrun.sh

./buildrun.sh

Results after the model converges.

Positive movie review generated by RNN:

Note how the text gets better as training progesses:

Optimization Iteration: 0, Training Loss: [4.889333724975586]
Text generated: we(’£z,p| c<,¤3è°9³(]róe=a|��"=	käg0₤­·'?¿~(�t5qiaêf#�¨n_�96o`
Time elapsed: 0:00:04

Optimization Iteration: 50, Training Loss: [3.0646657943725586]
Text generated: weòw e efaob  'dt wyhd aheti ceelcaob eeomahtnt i intc tair na
Time elapsed: 0:00:39

Optimization Iteration: 1000, Training Loss: [2.894775390625]
Text generated: weiis norer salerylt eatca t.rfs pir inni sh.as n roukiasw cas
Time elapsed: 0:12:45


Optimization Iteration: 1200, Training Loss: [2.7938120365142822]
Text generated: wei p onsl lr is  l  r uwos sr es loxite ou . ennbiiplem  ntl
Time elapsed: 0:15:17

Optimization Iteration: 1350, Training Loss: [2.4512953758239746]
Text generated: weeliti- s noin sha isipisdong gong as otle.ahesiolmedel ieet,
Time elapsed: 0:17:04

Optimization Iteration: 1450, Training Loss: [2.3751416206359863]
Text generated: wee re, tulle bleakceeesdind sintti, hors at the leciz an ind
Time elapsed: 0:18:23

Optimization Iteration: 1500, Training Loss: [2.2491776943206787]
Text generated: weind ing halahes thit taua ar the that's 4 owce bughuononi h
Time elapsed: 0:19:12

Optimization Iteration: 1600, Training Loss: [2.164762496948242]
Text generated: wehinwnr seals.  o cle muyher... be folr komk the feoxing aba
Time elapsed: 0:20:43

Optimization Iteration: 1800, Training Loss: [2.018953323364258]
Text generated: weale pay a baridery wowit the loles one ou kousroo, a male oe
Time elapsed: 0:23:15

Optimization Iteration: 1850, Training Loss: [1.9569777250289917]
Text generated: weot all if mixker, thut this nowpl one the tongered with houk
Time elapsed: 0:23:53

Optimization Iteration: 1900, Training Loss: [1.94167160987854]
Text generated: weing. becofilipaillyched the't he hel ersion jist. that misio
Time elapsed: 0:24:34

Optimization Iteration: 1950, Training Loss: [1.9175297021865845]
Text generated: weam and in tho art a tagh theys ricaler.. by agaid who clued
Time elapsed: 0:25:09

Optimization Iteration: 2050, Training Loss: [1.8174198865890503]
Text generated: we. det no many end is al very. carvy trach peting, dough than
Time elapsed: 0:26:27

Optimization Iteration: 2100, Training Loss: [1.7918057441711426]
Text generated: we enestmand gigunagubolesion.  This movie. an this is flung a
Time elapsed: 0:26:59

Optimization Iteration: 2300, Training Loss: [1.719719409942627]
Text generated: weheres; hiss alwo and as the feave mis, that ear from his aca
Time elapsed: 0:29:16

Optimization Iteration: 2350, Training Loss: [1.6416845321655273]
Text generated: we heng relating fallia got as gtoral clanti and that contros-
Time elapsed: 0:29:49

Optimization Iteration: 2400, Training Loss: [1.6665116548538208]
Text generated: we of the for known deal pabent) to work sone to foun canytimu
Time elapsed: 0:30:23


Optimization Iteration: 2500, Training Loss: [1.6678515672683716]
Text generated: weire, frente preft a phentia resire. this touse for young yin
Time elapsed: 0:31:34

Optimization Iteration: 2700, Training Loss: [1.5771021842956543]
Text generated: weaturely wearion, i comtragent, after ughard actors.<br /><br
Time elapsed: 0:33:57

Optimization Iteration: 2750, Training Loss: [1.5773215293884277]
Text generated: we up to family are this is need in seem who seemphy seems the
Time elapsed: 0:34:33

Optimization Iteration: 2800, Training Loss: [1.5664803981781006]
Text generated: we give songly interplay good could you mention an every class
Time elapsed: 0:35:10

Optimization Iteration: 2850, Training Loss: [1.6122658252716064]
Text generated: weherven angwor to mecheared & home we out it's hall and his s
Time elapsed: 0:35:59

Optimization Iteration: 2900, Training Loss: [1.5541921854019165]
Text generated: we deatt one of the same lific viewing carderdeea (and in dead
Time elapsed: 0:36:45

Optimization Iteration: 2950, Training Loss: [1.5128093957901]
Text generated: weilwewnattarta neur bisi, i have a yead in aught. the two epi
Time elapsed: 0:37:20

Optimization Iteration: 3000, Training Loss: [1.5613665580749512]
Text generated: week who here, but the cultroly"'s...5atary of the endure for
Time elapsed: 0:37:53

Optimization Iteration: 3050, Training Loss: [1.5188218355178833]
Text generated: weoner" in deasing in great possible. the faked with the dymb
Time elapsed: 0:38:38

Optimization Iteration: 3100, Training Loss: [1.5796984434127808]
Text generated: weose to yessing to the middre to real stunning it see rather
Time elapsed: 0:39:19

Optimization Iteration: 3150, Training Loss: [1.5575224161148071]
Text generated: weatle. body can robbing myservolessly, this i though 1d (segs
Time elapsed: 0:39:51

Optimization Iteration: 3200, Training Loss: [1.5163943767547607]
Text generated: weir and a seems is the partyey go (fey to charge of the most
Time elapsed: 0:40:31

Optimization Iteration: 3250, Training Loss: [1.5409008264541626]
Text generated: weeks of laugring serie (have free studying upout us good up o
Time elapsed: 0:41:12

Optimization Iteration: 3300, Training Loss: [1.5678426027297974]
Text generated: weother than the truch of its biff life for o like film, the s
Time elapsed: 0:41:53

Optimization Iteration: 3350, Training Loss: [1.508866310119629]
Text generated: we hala mattie's is this relative. part story in the bins of t
Time elapsed: 0:42:32

Conclusion:

The RNN converges around 1.50 training loss. Reading the autogenerated sentences certainly gives a movie review vibe.

About

Create an endless supply of machine-generated positive and negative natural-English paragraph movie reviews, using recurrent neural networks trained on Stanford's "Large movie review" dataset.


Languages

Language:Python 93.4%Language:Shell 6.6%