carlosngo / StoryX

A system for screenwriters to generate a first draft of a screenplay adaptation from a short story.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Software Technology Department, Undergraduate Thesis Program

Converting Short Stories Into Screenplays Through Abstract Story Representation

Thesis Proponents

Picture of Nicole Picture of Jude Picture of Carlos Picture of Shasha Picture of Austin
Garay, Kathleen Nicole Kang, Jude Evander Ngo, Carlos Miguel Villaroman, Ma. Patricia Fernandez, Ryan Austin
Undergraduate Student Undergraduate Student Undergraduate Student Undergraduate Student Thesis Adviser

Table of Contents

System Overview

  • A system for screenwriters to generate a first draft of a screenplay adaptation from a short story
  • Extracts story elements from a short story text file using Natural Language Processing
  • Represents the story elements as abstract data structures
  • Generates a screenplay from the abstract story representation downloadable in PDF and TeX formats
  • Built using Django, SQLite, spaCy, and TeX Live

System Features

Convert Story to Screenplay

Convert a story to a screenplay in two simple steps:

  1. Provide the title, author, and the story .txt file.

Picture of Upload Page

  1. View and download the screenplay.

Picture of Screenplay Page

Annotate Story Elements

For evaluation purposes, the system can be used to annotate story elements.

Picture of Annotation Page

View Story Extraction and Representation Results

The story representation results can be viewed. Metrics on the left require the story to be annotated first.

Picture of Results Page

How to run the system

System Prerequisites

  1. Python 3.7.9
  2. virtualenv
  3. TeX distribution software, preferably TeX Live
  4. screenplay package for your chosen TeX distribution software

Easy setup

  1. If it is your first time running the project, run install.bat from the root directory. Initial setup may take a while.
  2. Run run.bat from the root directory.
  3. The project webpage will be shown after a few seconds.

Unexpected behavior

  1. If the webpage is unresponsive, refresh after 30 seconds.
  2. If the webpage is still unresponsive, try the manual setup.

Manual setup

Set up the coreference resolution server

  1. Go to the coref directory using the command prompt
  2. Create and activate a Python 3.7.9 virtual environment
py -m venv env
.\env\Scripts\activate
  1. Install dependencies
py -m pip install -r requirements.txt
  1. Download spaCy pre-trained models
py -m spacy download en_core_web_sm
  1. Run py manage.py runserver

Set up the main server

  1. Go to the main directory using the command prompt
  2. Create and activate a Python 3.7.9 virtual environment
py -m venv env
.\env\Scripts\activate
  1. Install dependencies
py -m pip install -r requirements.txt
  1. Download spaCy pre-trained models
py -m spacy download en_core_web_sm
  1. Set up database
py manage.py makemigrations
py manage.py migrate
  1. Run py manage.py runserver
  2. Open your browser and enter the URL localhost:8000

Unexpected behavior

  1. If an error occurs during the setup, please take a screenshot and contact the developers. Thank you.

Research Overview

Abstract

A story is a series of events that can be represented in many ways. They are diverse and follow no strict format. The screenplay is a medium to tell stories clearly and straightforwardly. They focus on vital story elements that are ordered such that the story's meaning is retained. The conversion of stories into screenplays is currently time and human resource expensive due to the research and creativity needed to make a faithful adaptation. However, there are strategies in converting stories to screenplays that are repeatable for screenwriters. Thus, we created a system that automatically translates short stories into screenplays. Story elements were extracted and classified simultaneously and mapped to screenplay elements through abstract story representation. Of the story elements extracted, the system performed best with dialogue content and action lines, with precision, recall, and f1 scores above 60%. Readers were able to understand the screenplays across both corpora, performing with an above 60% similarity using Simple Matching Coefficient with story readers across all story elements.

Research Document

The research document can be found here. Please request access if prompted.

Coref API Documentation

A lightweight REST API for coreference resolution in a text. Uses spaCy's neuralcoref library.

Get coreference clusters

GET /api/coref-clusters

Parameters

text : string
The text to resolve coreferences from.

Returns

coref: JSON
A JSON object containing the coreference clusters found in the text. coref[entity_start][entity_end] returns a list of mentions, where each mention is 2-item list [mention_start, mention_end]. *_start and *_end are integers that represent the indices of the spaCy Token objects for the start and end of the noun phrase, respectively.

Coref Class Documentation

CorefResolver

resolve_coreferences(text)

Parameters

text : string
The text to resolve coreferences from.

Returns

coref: dict
A dictionary containing the coreference clusters found in the text. coref[entity_start][entity_end] returns a list of mentions, where each mention is 2-item list [mention_start, mention_end]. *_start and *_end are integers that represent the indices of the spaCy Token objects for the start and end of the noun phrase, respectively.

Main API Documentation

View the landing page

GET /converter/stories

Parameters

No parameters.

Returns

An HTML response of the landing page for the application.

Generate the screenplay of a story

POST /converter/stories

Parameters

title : string
The title of the story

author : string
The author of the story

text_file : file
The text file of the story

Returns

An HTML response of the screenplay page for the story.

View the results of the element extraction module

GET /converter/stories/extraction-results

Parameters

No parameters.

Returns

An HTML response of the page for the element extraction results.

View the results of the element extraction module

GET /converter/stories/understanding-results

Parameters

No parameters.

Returns

An HTML response of the page for the story understanding results.

View the text file of a story

GET /converter/stories/:id/txt

Parameters

id : string
Unique identifier for the story.

Returns

A plaintext HTTP response of the text file of the specified story.

View the annotation page for a story

GET /converter/stories/:id/annotate

Parameters

id : string
Unique identifier for the story.

Returns

An HTML response of the annotation page for the specific story.

View the extraction results page for a story

GET /converter/stories/:id/evaluate

Parameters

id : string
Unique identifier for the story.

Returns

An HTML response of the extraction results page for the specific story.

View the generated screenplay for a story

GET /converter/stories/:id/screenplay

Parameters

id : string
Unique identifier for the story.

Returns

An HTML response of the screenplay page for the specific story.

Download the generated screenplay for a story as a PDF file

GET /converter/stories/:id/screenplay/pdf

Parameters

id : string
Unique identifier for the story.

Returns

A downloadable .pdf file of the generated screenplay.

Download the generated screenplay for a story as a TeX file

GET /converter/stories/:id/screenplay/tex

Parameters

id : string
Unique identifier for the story.

Returns

A downloadable .tex file of the generated screenplay.

Main Class Documentation

AnnotationHelper

process(text)

Splits the text into tokens and sentences for the annotation page.

Parameters

text : string
The text to process.

Returns

No return values.

ConceptNet

checkIfProp(possibleCharacter, verb)

Checks if a noun is a prop or a character using ConceptNet.

Parameters

possibleCharacter : string
The noun to check if it's a prop or not.

verb : string
The verb to check if the noun can perform this action.

Returns

flag : boolean
If flag == True, then possibleCharacter is a prop. Otherwise, possibleCharacter is a character.

checkIfNamedLocation(pobj)

Checks if a noun is a named location or not using ConceptNet.

Parameters

pobj : string The noun to check if it's a named location or not.

Returns

flag : boolean
If flag == True, then pobj is a named location. Otherwise, pobj is not a named location.

checkForVerb(adp, verb)

Checks for an adpositional phrase or verb to determine a location change.

Parameters

adp : string
The adpositional phrase to check if there's a location change.

verb : string
The verb to check if there's a location change.

Returns

flag : boolean
If flag == True, then a location change might have happened. Otherwise, there was no location change.

CorefResolver

resolve_coreferences(doc, data)

Builds a dictionary of coreferences from the JSON response from the Coref API.

Parameters

doc : spaCy.Doc
The story represented by spaCy's Doc object.

data: dict
The dictionary built from the JSON response from the Coref API.

Returns

No return values.

verify_resolution()

Prints the dictionary of coreferences.

Parameters

No parameters.

Returns

No return values.

DialogueExtractor

extract_dialogue(doc, story)

Extracts the dialogue content, and then extracts the dialogue speakers.

Parameters

doc : spaCy.Doc
The story represented by spaCy's Doc object.

story : Story
The story represented by the Story object.

Returns

dialogues : List<Dialogue>
The list of dialogues extracted from the story.

print_dialogue(dialogue)

Prints the speaker and the content of a dialogue.

Parameters

dialogue : Dialogue The dialogue to be printed

Returns

No return values.

get_speaker(start, end)

Gets the Entity object that starts at start and ends at end.

Parameters

start : integer
The token index of the start of the noun phrase.

end : integer
The token index of the end of the noun phrase.

Returns

speaker : Entity
The Entity object that starts at start and ends at end. speaker == None if no Entity is found.

extract_content()

Extracts dialogue content using spaCy's Matcher class. Words enclosed in double quotes are considered for dialogue content.

Parameters

No parameters.

Returns

No return values.

extract_speakers()

Extracts the speakers of the extracted dialogue contents. Three scenarios are considered:

  1. Speaker said, "Hi."
  2. "Hi," said Speaker.
  3. "Hi."

Parameters

No parameters.

Returns

No return values.

resolve_speakers(mention_entity_dict)

Resolves coreferences in the extracted dialogues using the dictionary from the CorefResolver class.

Parameters

mention_entity_dict : dict
The dictionary from the CorefResolver class.

Returns

No return values.

verify_dialogues()

Prints all the dialogues.

Parameters

No parameters.

Returns

No return values.

EntityExtractor

extract_entities(doc, story, speakers)

Extracts the entities from the story. Uses spaCy's DependencyMatcher class to extract noun subject and action verb pairs, and classifies the noun subject as a character or prop.

Parameters

doc : spaCy.Doc
The story represented by spaCy's Doc object.

story : Story
The story represented by the Story object.

speakers : List<Entity>
The list of speakers extracted from the DialogueExtractor.

Returns

No return values.

get_distinct_entities(entities, doc)

Parameters

entities : List<Entity>
The total list of entities extracted from the story.

doc : spaCy.Doc
The story represented by spaCy's Doc object.

Returns

distinct_entities : List<Entity>
The list of entities where no two entities have the same string representation.

verify_characters()

Prints the extracted characters from the story.

Parameters

No parameters.

Returns

No return values.

verify_props()

Prints the extracted props from the story.

Parameters

No parameters.

Returns

No return values.

print_entity(entity)

Parameters

entity : Entity
The entity to be printed

Returns

No return values.

get_character(start, end)

Gets the Character object that starts at start and ends at end.

Parameters

start : integer
The token index of the start of the noun phrase.

end : integer
The token index of the end of the noun phrase.

Returns

character : Character
The Character object that starts at start and ends at end. character == None if no Character is found.

get_prop(start, end)

Gets the Prop object that starts at start and ends at end.

Parameters

start : integer
The token index of the start of the noun phrase.

end : integer
The token index of the end of the noun phrase.

Returns

prop : Prop
The Prop object that starts at start and ends at end. prop == None if no Prop is found.

resolve_characters(mention_entity_dict)

Resolves coreferences in the extracted characters using the dictionary from the CorefResolver class.

Parameters

mention_entity_dict : dict
The dictionary from the CorefResolver class.

Returns

No return values.

resolve_props(mention_entity_dict)

Resolves coreferences in the extracted props using the dictionary from the CorefResolver class.

Parameters

mention_entity_dict : dict
The dictionary from the CorefResolver class.

Returns

No return values.

ActionExtractor

check_event_type(sentence, sent_characters, sent_props)

Parameters

sentence : spaCy.Span
The sentence to determine the event type of.

sent_characters : List<Character>
The characters found in the sentence.

sent_props : List<Prop>
The props found in the sentence.

Returns

The event type of the sentence, either a scene transition or an action event.

parse_transition_sentence(sentence, idx, sent_characters, sent_props)

Instantiates and returns an ActionEvent with a scene transition classification.

Parameters

sentence : spaCy.Span
The sentence to determine the event type of.

idx : integer
The index of the sentence relative to all sentences in the spaCy Doc.

sent_characters : List<Character>
The characters found in the sentence.

sent_props : List<Prop>
The props found in the sentence.

Returns

A complete ActionEvent object that's classified as a scene transition and contains the characters and props found in the sentence.

parse_action_sentence(sentence, idx, sent_characters, sent_props)

Instantiates and returns an ActionEvent.

Parameters

sentence : spaCy.Span
The sentence to determine the event type of.

idx : integer
The index of the sentence relative to all sentences in the spaCy Doc.

sent_characters : List<Character>
The characters found in the sentence.

sent_props : List<Prop>
The props found in the sentence.

Returns

A complete ActionEvent object that's not classified as a scene transition and contains the characters and props found in the sentence.

extract_events(doc, story, dialogue_events, character_list, prop_list)

Iterates through all of the sentences in doc and instantiates Scene and ActionEvent objects based on the classification of each sentence.

Parameters

doc : spaCy.Doc
The story represented by spaCy's Doc object.

story : Story
The story represented by the Story object.

dialogue_events : List<Dialogue>
The dialogue events extracted by the DialogueExtractor class.

character_list : List<Character>
The characters extracted by the EntityExtractor class.

prop_list : List<Prop>
The props extracted by the EntityExtractor class.

Returns

No return values.

verify_events()

Prints the extracted Scene and Event objects.

Parameters

No parameters.

Returns

No return values.

ScreenplayGenerator

generate_screenplay()

Generates a .tex file from the abstract story representation, and then generates a .pdf file from the .tex file.

Parameters

No parameters.

Returns

No return values.

generate_tex()

Generates a .tex file from the abstract story representation.

Parameters

No parameters.

Returns

No return values.

genetate_pdf()

Generates a .pdf file from the generated .tex file.

Parameters

No parameters.

Returns

No return values.

generate_tex_meta()

Generates the string representation of the title page for the screenplay.

Parameters

No parameters.

Returns

No return values.

generate_tex_body()

Generates the string representation of the main body for the screenplay.

Parameters

No parameters.

Returns

No return values.

generate_tex_transition(transition_event)

Generates the string representation of a scene transition.

Parameters

transition_event : TransitionEvent
The transition event to be generated.

Returns

No return values.

generate_tex_action(action_event)

Generates the string representation of an action event.

Parameters

action_event : ActionEvent
The action event to be generated.

Returns

No return values.

generate_tex_dialogue(dialogue_event)

Generates the string representation of a dialogue.

Parameters

dialogue_event : DialogueEvent
The dialogue event to be generated.

Returns

No return values.

SpacyUtil

get_previous_token(token)

Parameters

token : spaCy.Token
The token in question.

Returns

previous_token : spaCy.Token
The first non-whitespace and non-newline token before token.

get_next_token(token)

Parameters

token : spaCy.Token
The token in question.

Returns

next_token : spaCy.Token
The first non-whitespace and non-newline token after token.

get_previous_word(token)

Parameters

token : spaCy.Token
The token in question.

Returns

previous_word : spaCy.Token
The first word before token.

get_next_word(token)

Parameters

token : spaCy.Token
The token in question.

Returns

next_word : spaCy.Token
The first word after token.

get_anchor(token)

Parameters

token : spaCy.Token
The token in question.

Returns

anchor : spaCy.Token
The syntactic anchor of token

get_subject(anchor)

Parameters

anchor : spaCy.Token
The syntactic anchor of a sentence.

Returns

subject : spaCy.Token
The noun subject of anchor.

get_object(anchor)

Parameters

anchor : spaCy.Token
The syntactic anchor of a sentence.

Returns

direct_object : spaCy.Token
The direct object of anchor.

get_noun_chunk(noun)

Parameters

noun : spaCy.Token
The noun in question.

Returns

noun_chunk : spaCy.Span
The Span noun chunk that contains the Token noun.

get_sentence_index(sent)

Parameters

sent : spaCy.Span
The sentence in question.

Returns

idx : integer
The index of the sentence with respect to the story Doc.

StoryPresenter

process()

Transforms the abstract story representation into a list of sentences and tokens for presentation.

Parameters

No parameters.

Returns

No return values.

ExtractionEvaluator

evaluate_extraction()

Evaluates the precision, recall, and f1-score of each story element.

Parameters

No parameters.

Returns

No return values.

evaluate_dialogue_speaker(file)

Evaluates the precision, recall, and f1-score of extracted dialogue speakers.

Parameters

file : File
The annotation .txt file to base the ground truth from.

Returns

score : tuple
The evaluation score of the extraction for dialogue speakers formatted as a tuple (precision, recall, f1-score)

evaluate_dialogue_content(file)

Evaluates the precision, recall, and f1-score of extracted dialogue content.

Parameters

file : File
The annotation .txt file to base the ground truth from.

Returns

score : tuple
The evaluation score of the extraction for dialogue content formatted as a tuple (precision, recall, f1-score)

evaluate_characters(file)

Evaluates the precision, recall, and f1-score of extracted characters.

Parameters

file : File
The annotation .txt file to base the ground truth from.

Returns

score : tuple
The evaluation score of the extraction for characters formatted as a tuple (precision, recall, f1-score)

evaluate_props(file)

Evaluates the precision, recall, and f1-score of extracted props.

Parameters

file : File
The annotation .txt file to base the ground truth from.

Returns

score : tuple
The evaluation score of the extraction for props formatted as a tuple (precision, recall, f1-score)

evaluate_actions(file)

Evaluates the precision, recall, and f1-score of extracted action lines.

Parameters

file : File
The annotation .txt file to base the ground truth from.

Returns

score : tuple
The evaluation score of the extraction for action lines formatted as a tuple (precision, recall, f1-score)

evaluate_transitions(file)

Evaluates the precision, recall, and f1-score of extracted scene transitions.

Parameters

file : File
The annotation .txt file to base the ground truth from.

Returns

score : tuple
The evaluation score of the extraction for scene transitions formatted as a tuple (precision, recall, f1-score).

count(prediction, annotation)

Counts the number of true positives, false positives, and false negatives in the prediction.

Parameters

prediction : List<integer>
The predicted results of the system.

annotation : List<integer>
The annotated results.

Returns

score : tuple
The count score of the prediction formatted as a tuple (true positives, false positives, false negatives)

count_bianca(prediction, annotation)

Implements Bianca's algorithm to count the number of perfect, missing, lacking, excess, missing, and wrong predictions.

Parameters

prediction : List<integer>
The predicted results of the system.

annotation : List<integer>
The annotated results.

Returns

score : tuple
The count score of the prediction formatted as a tuple (perfect, missing, lacking, excess, missing, wrong)

evaluate(tp, fp, fn)

Calculates and returns the precision, recall, and f1-score given the count score.

Parameters

tp : integer
The number of true positives.

fp : integer
The number of false positives.

fn : integer
The number of false negatives.

Returns

score : tuple
The evaluation score formatted as a tuple (precision, recall, f1-score).

evaluate(perfect, missing, lacking, excess, missing, wrong)

Implements Bianca's algorithm to calculate and return the precision, recall, and f1-score given the count score.

Parameters

perfect : integer
The number of perfect predictions.

missing : integer
The number of missing predictions.

lacking : integer
The number of lacking predictions.

excess : integer
The number of excess predictions.

missing : integer
The number of missing predictions.

wrong : integer
The number of wrong predictions.

Returns

score : tuple
The evaluation score formatted as a tuple (precision, recall, f1-score).

UnderstandingEvaluator

evaluate_understanding()

Calculates the simple matching coefficient, jaccard's coefficient, and cosine similarities of the story questionnaire responses and screenplay questionnaire responses.

Parameters

No parameters.

Returns

story_understanding : dict
The evaluation results of the story understanding module.

calculate_smc(x, y)

Calculates the simple matching coefficient of sets x and y.

Parameters

x : List<integer>
The first set.

y : List<integer>
The second set.

Returns

smc : double
The simple matching coefficient of the two sets.

calculate_jc(x, y)

Calculates the Jaccard's coefficient of sets x and y.

Parameters

x : List<integer>
The first set.

y : List<integer>
The second set.

Returns

jc : double
The Jaccard's coefficient of the two sets.

calculate_cs(x, y)

Calculates the cosine similarity of vectors x and y.

Parameters

x : List<integer>
The first vector.

y : List<integer>
The second vector.

Returns

cs : double
The cosine similarity of the two vectors.

calculate_dot_product(x, y)

Calculates the dot product of vectors x and y.

Parameters

x : List<integer>
The first vector.

y : List<integer>
The second vector.

Returns

dot_product : double
The dot product of the two vectors.

calculate_length(x, y)

Calculates the length of vectors x and y.

Parameters

x : List<integer>
The first vector.

y : List<integer>
The second vector.

Returns

length : double
The length of the two vectors.

read_csv_file(csv_file)

Reads a csv file and returns its 2D array representation.

Parameters

csv_file : File
The csv file to be read.

Returns

result : List<List<string>>
The 2D array representation of the csv file.

get_binary_representation(responses)

Transforms string responses into binary responses.

Parameters

responses : dict
The aggregated responses of the story and screenplay questionnaires.

Returns

result : dict
responses but the string responses are now binary.

About

A system for screenwriters to generate a first draft of a screenplay adaptation from a short story.


Languages

Language:JavaScript 72.1%Language:Python 20.4%Language:HTML 6.0%Language:CSS 1.3%Language:Batchfile 0.1%