juqkai / esProc

All the code is implemented in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

esProc

esProc is the unique name for esProc SPL package. esProc SPL is an open-source programming language for data processing, which can perform computing independently. For latest package and release notes, see Download esProc Community Edition Package.

SPL focuses on the mainstream embedded and Java application architecture. SPL script is the counterpart of the stored procedure in RDB. A SPL script will be passed to a Java program through JDBC interface to be executed or to achieve the structured computation.

SPL - Structured Programming Language

SPL application scenarios

  • Got SQL

    SQL has certain computing power, but it is not available in many scenarios, so you will have to hard code in Java. SPL provides lightweight computing power independent of database and can process data in any scenario:

    • Structured text (txt/csv) calculation      Ref. [1] [2] [3] [4] [5]

    • Excel calculation      Ref. [1] [2]

    • Perform SQL on files      Ref. [1] [2]

    • Multi-layer json calculation      Ref. [1] [2]

    • Multi-layer xml calculation      Ref. [1] [2]

    • Java computing class library, surpass Stream/Kotlin/Scala      Ref. [1] [2]

    • Replace ORM to implement business logic      Ref. [1] [2]

    • SQL-like calculation on Mongodb, association calculation      Ref. [1] [2] [3]

    • Post calculation of WebService/Restful      Ref. [1] [2]

    • Post calculation of Salesforce, Post calculation of SAP      Ref. [1] [2]

    • Post calculation of various data sources: HBase,Cassandra,Redis,ElasticSearch,Kafka,…      Ref. [1]

  • Beyond SQL

    SQL is difficult to deal with complex sets and ordered operations, and it is often read out and calculated in Java. SPL has complete set capability, especially supports ordered and step-by-step calculation, which can simplify these operations:

    • Ordered set      Ref. [1] [2] [3]

    • Position reference      Ref. [1] [2] [3]

    • Grouping subsets      Ref. [1]

    • Non-equivalence grouping      Ref. [1]

    • Multi-level association operation      Ref. [1] [2] [3] [4]

    • Static and dynamic pivot      Ref. [1] [2] [3]

    • Recursion and iteration      Ref. [1]

    • Step-by-step and loop operation      Ref. [1]

    • Text and date time operation      Ref. [1] [2]

  • Cooperate DB

    The computing power of the database is closed and cannot process data outside the database. It is often necessary to perform ETL to import data into the same database before processing.

    SPL provides open and simple computing power, which can directly read multiple databases, realize mixed data calculation, and assist the database to do better calculation.

    • Fetch data in parallel to accelerate JDBC      Ref. [1]

    • SQL migration among different types of databases      Ref. [1]

    • Cross database operations      Ref. [1]

    • T+0 statistics and query      Ref. [1]

    • Replace stored procedure operation, improve code portability and reduce coupling      Ref. [1]

    • Avoid making ETL into ELT or even LET

    • Mixed calculation of multiple data sources      Ref. [1] [2]

    • Reduce intermediate tables in the database

    • Report data source development, support hot switching, multiple data sources and improve development efficiency      Ref. [1] [2] [3]

    • Implement microservices, occupy less resources and support hot switching      Ref. [1] [2]

  • Surpass DB

    SQL is difficult to implement high-performance algorithms. The performance of big data operations can only rely on the optimization engine of the database, but it is often unreliable in complex situations.

    SPL provides a large number of basic high-performance algorithms (many of which are pioneered in the industry) and efficient storage formats. Under the same hardware environment, it can obtain much better computing performance than the database, and can comprehensively replace the big data platform and data warehouse.

    • In-memory search:binary search, sequence number positioning, position index, hash index, multi-layer sequence number positioning      Ref. [1]

    • Dataset in external storage:parallel computing of text file, binary storage, double increment segmentation, columnar storage composite table, ordered storage and update

    • Search in external storage:binary search, hash index, sorting index, row-based storage and valued index, index preloading, batch search and set search, multi index merging, full-text searching      Ref. [1]

    • Traversing technique:post filter of cursor, multi-purpose traversal, parallel traversing and multi cursors, aggregation extension, ordered traversing, program cursor, partially ordered grouping and sorting, sequence number grouping and controllable segmentation      Ref. [1]

    • Association technique: foreign key addressing, foreign key serialization, index reuse, alignment sequence, large dimension table search, unilateral splitting, orderly merging, association positioning, schedule      Ref. [1]

    • Multidimensional analysis:pre summary and time period pre summary, alignment sequence, tag bit dimension      Ref. [1]

    • Distributed:free computing and data distribution, cluster multi-zone composite table, cluster dimension table, redundant fault tolerance, spare tire fault tolerance, Fork-Reduce, multi job load balancing

  • For Excel

    The combination of SPL and Excel can enhance the calculation ability of Excel and reduce the difficulty of calculation implementation.    Ref. [1]

    Through SPL's Excel plug-in, you can use SPL functions in Excel, and you can also call SPL scripts in VBA.    Ref. [1]

    SPL provides Excel-oriented set operations:

    • Cell value and summary value calculation      Ref. [1]

    • Set operation and subordinate judgment      Ref. [1]

    • Duplication judgment, count and deduplication      Ref. [1]

    • Sorting and ranking      Ref. [1]

    • Special grouping and aggregate methods      Ref. [1]

    • Association and comparison      Ref. [1]

    • Row-column transpose      Ref. [1]

    • Expansion and supplement      Ref. [1]

  • For Industry

    There are a large number of time series data in industrial scenarios, and databases often only provide SQL. The ordered calculation capability of SQL is very weak, resulting in that it can only be used for data retrieval and cannot assist in calculation.

    Many basic mathematical operations are often involved in industrial scenarios. SQL lacks these functions and the data can only be read out to process.

    SPL can well support ordered calculation, and provides rich mathematical functions, such as matrix and fitting, and can more conveniently meet the calculation requirements of industrial scenes.

    • Time series cursor: aggregation by granularity, translation, adjacence reference, association and merging

    • Historical data compression and solidification, transparent reference

    • Vector and matrix operations

    • Various linear fitting: least squares, partial least squares, Lasso, ridge …

    Industrial algorithms often need repeated experiments. SPL development efficiency is very high, and you can try more within the same time period:

    • Instrument anomaly discovery algorithm

    • Abnormal measurement sample locating

    • Curve lifting and oscillation pattern recognition

    • Constrained linear fitting

    • Pipeline transmission scheduling algorithm

Useful Links

License

esProc is under the Apache 2.0 license. See the LICENSE file for details.

About

All the code is implemented in Java

License:Apache License 2.0


Languages

Language:Java 100.0%Language:Shell 0.0%