malllabiisc / RESIDE

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Urgent help needed with segregation of the code

devinaarvind opened this issue · comments

For my project, I need to separate the code into the following format:
I need to segregate the code into functions as described below. The main method should take the input as a txt file (which would be the bag of sentences) and give the output as a txt file which would contain the entities, correct relations and predicted relations. I have tried printing the actual and predicted relations in the 'predict' function but it shows a one-hot representation I believe. Can you please help me on how do I refactor the reside code as per the structure given below?

class RelationExtraction(abc.ABC):

def __init__(self):

def read_dataset(self, input_file, *args, **kwargs):  
	Reads a dataset to be used for training
     Note: The child file of each member overrides this function to read dataset 
     according to their data format.
		input_file: Filepath with list of files to be read
        (optional):Data from file

def data_preprocess(self,input_data, *args, **kwargs):
     (Optional): For members who do not need preprocessing. example: .pkl files 
     A common function for a set of data cleaning techniques such as lemmatization, count vectorizer and so forth.
		input_data: Raw data to tokenize
		Formatted data for further use.

def tokenize(self, input_data ,ngram_size=None, *args, **kwargs):  
	Tokenizes dataset using Stanford Core NLP(Server/API)
		input_data: str or [str] : data to tokenize
		ngram_size: mention the size of the token combinations, default to None
		tokenized version of data

def train(self, train_data, *args, **kwargs):  
	Trains a model on the given training data
     Note: The child file of each member overrides this function to train data 
     according to their algorithm.
		train_data: post-processed data to be trained.
		(Optional) : trained model in applicable formats.
	     None: if the model is stored internally. 

def predict(self, test_data, entity_1 = None, entity_2= None,  trained_model = None, *args, **kwargs):   
	Predict on the trained model using test data
          entity_1, entity_2: for some models, given an entity, give the relation most suitable 
		test_data: test the model and predict the result.
		trained_model: the trained model from the method - def train().
					  None if store trained model internally.
          probablities: which relation is more probable given entity1, entity2 
		relation: [tuple], list of tuples. (Eg - Entity 1, Relation, Entity 2) or in other format 

def evaluate(self, input_data, trained_model = None, *args, **kwargs):
	Evaluates the result based on the benchmark dataset and the evauation metrics  [Precision,Recall,F1, or others...]
         input_data: benchmark dataset/evaluation data
         trained_model: trained model or None if stored internally 
		performance metrics: tuple with (p,r,f1) or similar...

def save_model(self, file):
	:param file: Where to save the model - Optional function

def load_model(self, file):
	:param file: From where to load the model - Optional function

Please refer to online directory, it does the same thing which you are looking for.