This is just a quick test of trying to put an awk
script together to help out another research team.
Usage (on OS X/Linux):
awk -f script.awk [location-file] [input-file]
So example command might be in the examples here:
awk -f script.awk location.txt sample.txt
location.txt
should be a list of locations that you're looking to mark up. Such as:
toronto
powell river
on
bc
sample.txt
is a list of data such as:
new haven co-op toronto on $1245
joe schmo co-op powell river bc $4444
Results are OK. The big limitation is:
- Needs to be a user-generated list..
- Only finds first instance. If it is the Slave Lake Community Area, it will only find the first Slave Lake. This is how we get around finding strings like
on
orbc
in the middle of other strings..
Requires manual input, which is a bummer for location.txt
. Also see above.
Could experiment with Stanford NER, perhaps following Bill Turkel's script here, but I suspect that'd create too much noise.