ggu-talend / data-quality

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#alt text

Data Quality Libraries

This repository contains the source files of Talend Data Quality libraries.

Content structure

Project Description
dataquality-common Abstractions of data analysis, and low-level utilities such as East Asian text pattern recognition
dataquality-email Email validation library
dataquality-libraries Parent pom aggregating other library projects, devops tools
dataquality-record-linkage Record Matching algorithms, blocking key calculation and T-Swoosh
dataquality-sampling Reservoir sampling, data masking, data duplication
dataquality-semantic-model Definition of semantic category related objects
dataquality-semantic API for semantic category analysis
dataquality-standardization Standardization library based on Apache Lucene
dataquality-statistics API for data analysis and statistics (require JDK1.8)
dataquality-wordnet Content validation API based on WordNet dictionary

Product Download

Talend Open Studio for Data Quality can be download from the Talend website.

Build

  • All project are maven based.
  • The parent pom builds all the libraries.

License

Copyright (c) 2006-2016 Talend

Licensed under the Apache Licence v2

About


Languages

Language:Java 94.7%Language:JavaScript 4.6%Language:CSS 0.4%Language:Shell 0.2%Language:HTML 0.1%Language:Roff 0.1%Language:Batchfile 0.0%