Extracts, compares, transforms and sorts with buckets phrases.
Install a library
$ pip install semantic-compare
The library requires a spacy model for natural language processing. If you want to use English, please run this command
$ python -m spacy download en_core_web_lg
Simple Usage
from semantic_compare import SemanticComparator as sc
comparator = sc(sentencizer=True)
phrases = comparator.extract_phrases("Create, promote and develop a business.")
Output:
['Create a business','promote a business','develop a business']
sentencizer
- a splitter of sentences by punctuation(dot, question mark, exclamation mark).
Advanced Usage
from semantic_compare import SemanticComparator as sc
# Sentence splitter
def our_sentencizer(doc):
"""
Sentence splitter function that allows splitting document on sentences
by different punctuations and new line
"""
for i, token in enumerate(doc[:-2]):
if token.text == "•" or "•" in token.text:
doc[i].is_sent_start = True
elif (token.text == "." or token.text == '...' \
or token.text == '?' or token.text == '!' or token.text == '\n') \
and doc[i+1].is_title:
doc[i+1].is_sent_start = True
else:
doc[i+1].is_sent_start = False
return doc
# load small english spacy model(can be any spacy model)
comparator = sc(spacy_model='en_core_web_sm')
# Add a custom pipe for text preprocessing
comparator.add_custom_pipe(our_sentencizer, before='parser')
phrases = comparator.extract_phrases('''
Must Have:
* Experience shaping the BI strategy from C-Level to Technical developers.
* Extensive delivery of platform within a Business Intelligence and Analytics function.
* Communication with stakeholders on all levels.
''')
print('\n'.join(phrases))
Using add_custom_pipe
you can add your custom pipe for text processing in spacy.
Get the similarity of phrases against each other. Example 1:
phrase1 = 'Understand customer needs'
phrase2 = 'Capture business requirements'
similarity = comparator.compare_phrases(phrase1, phrase2)
print(similarity)
Output:
0.38569751
Example 2: Get a two-dimensional matrix that clusters the similarity of phrases against each other.
phrases_1 = [
'Communication with stakeholders',
'Understand customer needs',
'Experience shaping the BI strategy',
'shaping the BI strategy',
'Delivery of platform Analytics function',
]
phrases_2 = [
'Extensive delivery of platform within a Business Intelligence and Analytics function',
'shaping the BI strategy',
'Experience shaping the BI strategy from C-Level to Technical developers',
'Communication with stakeholders on all levels',
'Capture business requirements',
'Play computer games',
]
similarity = comparator.build_similarity_matrix(phrases_1, phrases_2)
print(similarity)
Output:
[[-0.03689054 0.0372301 0.17840812 0.09079809 0.65748763]
[ 0.18079719 0.12055688 0.77624094 1. 0.22749564]
[ 0.08472343 0.11505745 0.7030021 0.48876476 0.13252231]
[ 0.7132235 0.07449755 0.178031 0.15712512 0.0676512 ]
[ 0.11637229 0.38569745 0.23005028 0.25646406 0.26493344]
[ 0.17955953 0.15243992 0.11233422 0.16087453 0.03144675]]
When you compare two documents you can see which phrases present in both or only in a specific document.
phrases_1 = [
'Communication with stakeholders',
'Understand customer needs',
'Experience shaping the BI strategy',
'shaping the BI strategy',
'Delivery of platform Analytics function',
]
phrases_2 = [
'Extensive delivery of platform within a Business Intelligence and Analytics function',
'shaping the BI strategy',
'Experience shaping the BI strategy from C-Level to Technical developers',
'Communication with stakeholders on all levels',
'Capture business requirements',
'Play computer games',
]
# cut_off - a percentage of similarity should be bigger than it so that we consider that phrases are similar(default=0.3)
in_both, in_doc1, in_doc2 = comparator.bucket_sorting(
phrases_1, phrases_2, similarity, cut_off=0.5)
Get all steps of transformation from one phrase to another. Example:
print(comparator.transform_phrase(
'Understand customer needs',
'Capture business requirements',
))
Output
["Understand customer needs", "Capture customer needs", "Capture business requirements"]