malllabiisc / RESIDE

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Urgent help needed with segregation of the code

devinaarvind opened this issue · comments

For my project, I need to separate the code into the following format:
I need to segregate the code into functions as described below. The main method should take the input as a txt file (which would be the bag of sentences) and give the output as a txt file which would contain the entities, correct relations and predicted relations. I have tried printing the actual and predicted relations in the 'predict' function but it shows a one-hot representation I believe. Can you please help me on how do I refactor the reside code as per the structure given below?

class RelationExtraction(abc.ABC):

def __init__(self):
	pass

@abc.abstractmethod
def read_dataset(self, input_file, *args, **kwargs):  
	"""
	Reads a dataset to be used for training
     
     Note: The child file of each member overrides this function to read dataset 
     according to their data format.
     
	Args:
		input_file: Filepath with list of files to be read
	Returns: 
        (optional):Data from file
	"""
	pass


@abc.abstractmethod
def data_preprocess(self,input_data, *args, **kwargs):
	"""
     (Optional): For members who do not need preprocessing. example: .pkl files 
     A common function for a set of data cleaning techniques such as lemmatization, count vectorizer and so forth.
	Args: 
		input_data: Raw data to tokenize
	Returns:
		Formatted data for further use.
	"""
	pass 


@abc.abstractmethod
def tokenize(self, input_data ,ngram_size=None, *args, **kwargs):  
	"""
	Tokenizes dataset using Stanford Core NLP(Server/API)
	Args:
		input_data: str or [str] : data to tokenize
		ngram_size: mention the size of the token combinations, default to None
	Returns:
		tokenized version of data
	"""
	pass


@abc.abstractmethod
def train(self, train_data, *args, **kwargs):  
	"""
	Trains a model on the given training data
    
     Note: The child file of each member overrides this function to train data 
     according to their algorithm.
     
	Args:
		train_data: post-processed data to be trained.
	
    Returns: 
		(Optional) : trained model in applicable formats.
	     None: if the model is stored internally. 
	"""
	pass


@abc.abstractmethod
def predict(self, test_data, entity_1 = None, entity_2= None,  trained_model = None, *args, **kwargs):   
	"""
	Predict on the trained model using test data
	Args:
          entity_1, entity_2: for some models, given an entity, give the relation most suitable 
		test_data: test the model and predict the result.
		trained_model: the trained model from the method - def train().
					  None if store trained model internally.
	Returns:
          probablities: which relation is more probable given entity1, entity2 
              or 
		relation: [tuple], list of tuples. (Eg - Entity 1, Relation, Entity 2) or in other format 
	"""
	pass

@abc.abstractmethod
def evaluate(self, input_data, trained_model = None, *args, **kwargs):
	"""
	Evaluates the result based on the benchmark dataset and the evauation metrics  [Precision,Recall,F1, or others...]
     Args:
         input_data: benchmark dataset/evaluation data
         trained_model: trained model or None if stored internally 
	Returns:
		performance metrics: tuple with (p,r,f1) or similar...
	"""
	pass

@abc.abstractmethod
def save_model(self, file):
	"""
	:param file: Where to save the model - Optional function
	:return:
	"""
	pass

@abc.abstractmethod
def load_model(self, file):
	"""
	:param file: From where to load the model - Optional function
	:return:
	"""
	pass

Please refer to online directory, it does the same thing which you are looking for.