hpcc-systems / DataPatterns

HPCC Systems ECL bundle that provides some basic data profiling and research tools to an ECL programmer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Request: Easy method for analyzing different profiling results

dancamper opened this issue · comments

Satisfy this scenario: Profiling is used to analyze new data that will be ingested. Profiling results are saved as a logical file. Then, a new batch of data arrives and is profiled. The new method should compare the new profiling results with the old and output a summary of any differences.

The end goal is to highlight significant differences between the two profiles, which could indicate a significant or unexpected change in the incoming data stream.

JOIN the two using ROWDIFF in the TRANSFORM would be a start :)

This is probably more appropriate as a stand-alone function rather than incorporated into Profile(), as it is file-based rather than field-based.