A Python module to convert natural language numerics into ints and floats. This is a port of the Ruby gem numerizer
The numerizer library can be installed from PyPI as follows:
or from source as follows:
>>> from numerizer import numerize
>>> numerize('forty two')
'42'
>>> numerize('forty-two')
'42'
>>> numerize('four hundred and sixty two')
'462'
>>> numerize('one fifty')
'150'
>>> numerize('twelve hundred')
'1200'
>>> numerize('twenty one thousand four hundred and seventy three')
'21473'
>>> numerize('one million two hundred and fifty thousand and seven')
'1250007'
>>> numerize('one billion and one')
'1000000001'
>>> numerize('nine and three quarters')
'9.75'
>>> numerize('platform nine and three quarters')
'platform 9.75'
Since version 0.2, numerizer is available as a SpaCy extension.
Any named entities of a quantitative nature within a SpaCy document can be numerized as follows:
Users can specify which entity types are to be numerized, by using the labels argument in the extension function, as follows:
The extension is available for tokens and spans as well.
For R users, a wrapper library has been developed by @amrrs. Try it out here.