artika4biz / sling-utils

Utilities for Apache Sling and Adobe AEM applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sling Utils

This is an OSGI module for Apache Sling and Adobe AEM that provides some utilities. All the servlets need an authenticated session otherwise the client receives a HTTP 403 Error.

JsonQuery

A simple servlet that provides node path of SQL-2 query results, in Json format, with pagination. This is useful when traversing JCR repository is not enough. I use this servlet to receive a flat list of nodes (hundreds of thousands/some millions) to be managed by ML algorithms. With this servlet you can request something like http://localhost:8080/v1/jcr-query?offset=0&limit=140&sql=select * from [oak:Unstructured] as n where isdescendantnode(n,'/data/cassazione') and id is not null to obtain only 40 nodes (limit=40), starting from the first (offset=0) returned node or from the n-th returned node (offset=n):

image

NodeCount

A simple servlet that provides results count of a SQL-2 query. JCR does not provide a "count(*)" function. This servlet is smart enough to use the fastest strategy:

  • Using the Apache Jackrabbit OAK NodeIterator.getSize() method; in Apache Jackrabbit OAK implementation, this method provides a valid result just if the Fast return size is enabled (OSGI configuration with PID org.apache.jackrabbit.oak.query.QueryEngineSettingsService)
  • Counting each returned node otherwise (very slow but it works when needed!). ACLs are applied.

Remember that the NodeIterator.getSize() method counts exactly (just for the Jackrabbit OAK implementation) all the nodes but ACLs are not applied to the results as per the the official documentation.

How to enable the Fast return size option

image

Example of a count request when the Fast return size option is enabled (Execution time, 25ms):

image

Example of a count request when the Fast return size option is disabled (Execution time, more than 5 seconds but it works!):

image

When these servlets are useful

Whit just these two servlets I can now analyze millions of documents stored into an Apache Jackrabbit OAK repository via Apache Sling or Adobe AEM, executing less than 20 lines of Pyhton code, like these ones: image

Build and Installation

The project is built quite simple:

mvn clean install

To install the OSGi bundle use the autoInstallBundle profile:

mvn clean install -P autoInstallBundle

About

Utilities for Apache Sling and Adobe AEM applications


Languages

Language:Java 100.0%