Nealcly / templateNER

Source code for template-based NER

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CSV input files

laiviet opened this issue · comments

Can you share the format of the input CSV files?
Thank you,
Viet

I wrote the following script for my experiments, It may help you to convert BIO format to BART Template Format,

CorpusBIO.txt contains the lines, each has token, label pairs

# example input
IBM B-ORG
is O
a O
...
tokens=[]
labels=[]
for line in open("../CorpusBIO.txt"):
	line=line.replace(';','')
	if len(line.strip())>0:
		token, label=line.split()
		token=token.replace('"','')
		token=token.replace("'","")
		tokens.append(token)
		labels.append(label)
	else:
		buffer_token=""
		buffer_label=""
		first=" ".join(tokens)
		first=first.replace('"','')
		first=first.replace(';','')
		for l,t in zip(labels, tokens):
			if l.split("-")[0]!= 'I' and buffer_token!="":
				print('"%s";%s is a %s entity.' %(first,buffer_token, buffer_label))
				buffer_token=""
				buffer_label=""
			if l.split("-")[0] =='B':
				buffer_token=t
				buffer_label=l.split("-")[1]
			if l.split("-")[0] =='I':
				buffer_token+=" "+ t
		if buffer_token!="":
			print('"%s";%s is a %s entity.' %(first,buffer_token, buffer_label))
		tokens=[]
		labels=[]


THanks!