XSystem is a method to learn and represent syntactic patterns in datasets as data structures called XStructures. Once XSystem learns a collection of patterns, it can be used to perform several tasks like: automatic label assignment, where data items are assigned a class by comparing them to a library of known classes (written as Regexes or XStructures); finding syntactically similar content, where learned XStructures are compared to see if they are similar, and outlier detection, where a learned XStructure for a single item is compared to other XStructures to check that its structure is different.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Java
- (Optional) Gradle.
Enter the following command in your console to clone the repository.
$> git clone https://github.com/UCHI-DB/xsystem-ng.git
Enter the following command in your console to add Gradle Wrapper JAR
$> gradle wrapper
Since XSystem is a gradle project and you have already included a gradle wrapper, you do not need to install gradle to build and test the project. Instead, you can just:
$> ./gradlew build
And now to run the unit tests, run the following command!
$> ./gradlew test
The XSystem Inteface has been implemented by XSystemImplementation
class. To use the methods, first create an instance of the XSystemImplementation
class.
XSystemImplementation impl = new XSystemImplementation();
The methods available are as follows -
build
method
Given lines
(an ArrayList of String), the build method learns the line into a XStructure, and returns the XStructure.
XStructure xstruct = impl.build(lines);
generate
method
Given a XStructure pattern
, and an integer n
, the generate method returns an ArrayList of n random Strings generated from the pattern.
ArrayList<String> randomString = impl.generate(pattern, n);
similarity
method
Given two XStructures, x1
and x2
or given a XStructure pattern
and a String str
, this method returns the similarity score between the two.
double similarity = impl.similarity(x1, x2);
OR
double similarity = impl.similarity(pattern, str);
match
method
This method returns a boolean indicating whether XStructure pattern
and regular expression Pattern regex
match with each other or not.
boolean match = impl.match(pattern, regex);
computeOutlierScore
method
Computes the outlier score for a given string str
, in the given XStructure pattern
double outlierScore = impl.computeOutlierScore(pattern, str);
mergetwoXStructs
method
Merges two XStructures x1
and x2
.
XStructure merged = impl.mergetwoXStructs(x1, x2);
mergeMultipleXStructs
method
Merges a list (ArrayList) of XStructures xstructList
XStructure merged = impl.mergetwoXStructs(xstructList);
learnXStructs
method
Given an input CSV file/folder with path - inputPath
, this method learns the XStructs for each column and stores in a specified JSON file path, outFile
.
impl.learnXStructs(inputPath, outFile);
readXStructswthType
method
Given a JSON file/folder path inputJSONfolder
, this method returns the list of pair XStructures and their corresponding datatype contained in it.
ArrayList<Pair<XStructure, String>> list = impl.readXStructswthType(inputJSONfolder);
labelAssignwthRegex
method
Given a regex-type CSV file/folder path sampleRegexFolderPath
, assigns labels to a JSON file/folder of learned XStructs (path learnedXStructsJSONfolderpath
) and outputs a CSV file whose path can be specified by the user - outFile
impl.labelAssignwthRegex(sampleRegexFolderPath, learnedXStructsJSONfolderpath, outFile);
labelAssignmentwthXStruct
method
Given a ArrayList of String inputs - lines
, returns the computed label from a given JSON file/folder (path referenceFilePath
) of learned XStructs
String label = impl.labelAssignmentwthXStruct(lines, referenceFilePath);
- Ipsita Mohanty
- Raul Castro Fernandez
This XSystem implementation is based on this paper.