mclements / mercury-aggregates

Bag and set aggregates for Mercury

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

aggregates: bag and set aggregates for Mercury

Introduction

This is a module to provide bag and set aggregates for Mercury.

The motivation for this module is to provide aggregates that are comparable to aggregates provided by Prolog (e.g. XSB and SWI-Prolog) and Datalog (e.g. Soufflé). This suggests working with nondeterministic predicates rather than other data structures.

This is an exploratory exercise, so that the API is likely to change. In particular, the current API and implementations for the bag_row_number and bag_cum_sum window functions are experimental.

The aggregates module includes submodules for floats and ints.

Generic aggregates predicates and functions

Bag aggregatesSet aggregates
bag_aggr(Predicate, Aggregator, Initial, Result)See: solutions.aggregate
bag_count(Predicate, Result)set_count(Predicate, Result)
bag_max(Predicate, Initial, Result)set_max(Predicate, Initial, Result)
bag_min(Predicate, Initial, Result)set_min(Predicate, Initial, Result)
Bag window functionsSet window functions
bag_row_number(ByPredicate, Result(By,RowNumber))
bag_row_number(ByPredicate) = Result(By,RowNumber)

aggregates.floats predicates and functions

Bag aggregatesSet aggregates
bag_max(Predicate, Result)set_max(Predicate, Result)
bag_min(Predicate, Result)set_min(Predicate, Result)
bag_sum(Predicate, Result)set_sum(Predicate, Result)
bag_avg(Predicate, Result)set_avg(Predicate, Result)
bag_mean(Predicate, Result)set_mean(Predicate, Result)
bag_mean(Predicate, Result)set_mean(Predicate, Result)
bag_geom_mean(Predicate, Result)set_geom_mean(Predicate, Result)
bag_var(Predicate, Result)set_var(Predicate, Result)
bag_median(Predicate, Result)set_median(Predicate, Result)
bag_quantile(Predicate, Quantile, Result)set_quantile(Predicate, Quantile, Result)
Bag window functionsSet window functions
bag_cum_sum(Predicate(By,X), Result(By,CumSum))
bag_cum_sum(Predicate(By,X) = Result(By,CumSum)

aggregates.ints predicates

Bag aggregatesSet aggregates
bag_max(Predicate, Result)set_max(Predicate, Result)
bag_min(Predicate, Result)set_min(Predicate, Result)
bag_sum(Predicate, Result)set_sum(Predicate, Result)

Example

For the following example code:

:- module test.

:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

:- implementation.
:- import_module list, int, float, string, maybe, solutions.
:- import_module aggregates, aggregates.floats.

:- type maybe_date ---> date(year::int, month::int, day::int); no.
:- pred bag_max_date(pred(maybe_date)::(pred(out) is nondet), maybe_date::out) is det.
bag_max_date(Predicate, MaxDate) :-
    bag_max(Predicate, date(min_int,0,0), MaxDate1),
    (if MaxDate1 = date(min_int,0,0) then MaxDate = no else MaxDate = MaxDate1).

:- func nan = float.
nan = det_to_float("NaN").

:- pred patient(int::out, string::out) is multi.
patient(1001, "Hopper").
patient(4004, "Wirth").
patient(3003, "Kemeny").
patient(2002, "Gosling").
patient(5005, "Kurtz").

:- pred visit(int::out, maybe_date::out, float::out) is multi.
visit(2002, date(2020,09,10), 6.8).
visit(1001, date(2020,09,17), 5.5).
visit(4004, date(2020,09,24), 8.4).
visit(2002, date(2020,10,08), nan).
visit(1001, no, 6.6).
visit(3003, date(2020,11,12), nan).
visit(4004, date(2020,11,05), 7.0).
visit(1001, date(2020,11,19), 5.3).

main(!IO) :-
    print_line("{Id, Lastname, SumScores, AvgScores, MaxDate}", !IO),
    aggregate((pred({Id,Lastname,Sum,Avg,MaxDate}::out) is nondet :-
	           patient(Id,Lastname),
	           Scores = (pred(Score::out) is nondet :- visit(Id,_,Score), \+is_nan(Score)),
	  	   bag_sum(Scores, Sum),
	           bag_avg(Scores, Avg),
	           Dates = (pred(Date::out) is nondet :- visit(Id,Date,_), Date\=no),
	           bag_max_date(Dates, MaxDate)),
	      print_line,
	      !IO),
    print_line("{Id, RowNumber, Date, Score, CumScore}", !IO),
    aggregate((pred({Id,RowNumber,Datei,Scorei,CumSumi}::out) is nondet :-
	           patient(Id,_), 
	           Combined = (pred(Date::out,Score::out) is nondet :- visit(Id,Date,Score)), 
		   Combined(Datei,Scorei),
	  	   bag_cum_sum(Combined)(Datei,CumSumi),
		   Dates = (pred(Date::out) is nondet :- Combined(Date,_)),
		   bag_row_number(Dates)(Datei,RowNumber)),
	      print_line,
	      !IO).

Note that bag_sum, bag_avg and bag_max_date are aggregates that return deterministic values, while bag_cum_sum and bag_row_number are window functions that return non-deterministic predicates with two variables.

We can run the code to get the subsequent output:

mmc --make test.m && ./test
{Id, Lastname, SumScores, AvgScores, MaxDate}
{1001, "Hopper", 17.4, 5.8, date(2020, 11, 19)}
{2002, "Gosling", 6.8, 6.8, date(2020, 10, 8)}
{3003, "Kemeny", 0.0, nan, date(2020, 11, 12)}
{4004, "Wirth", 15.4, 7.7, date(2020, 11, 5)}
{5005, "Kurtz", 0.0, nan, no}
{Id, RowNumber, Date, Score, CumScore}
{1001, 1, date(2020, 9, 17), 5.5, 5.5}
{1001, 2, date(2020, 11, 19), 5.3, 10.8}
{1001, 3, no, 6.6, 17.4}
{2002, 1, date(2020, 9, 10), 6.8, 6.8}
{2002, 2, date(2020, 10, 8), nan, nan}
{3003, 1, date(2020, 11, 12), nan, nan}
{4004, 1, date(2020, 9, 24), 8.4, 8.4}
{4004, 2, date(2020, 11, 5), 7.0, 15.4}

Detailed documentation

%--------------------------------------------------%
% Copyright (C) 2022 Mark Clements.
% This file is distributed under the terms specified in COPYING.
%--------------------------------------------------%
%
% File: aggregates.m.
% Authors: mclements
% Stability: low.
%
% This module defines bag and set aggregates with nested submodules for
% floats and ints.
%
%--------------------------------------------------%
%--------------------------------------------------%

:- module aggregates.

:- interface.

:- import_module list.

    % bag_aggr(Predicate, Aggregator, Initial, Result) returns a bag aggregate
    % based on a predicate using an aggregator with an initial value. 
    % This uses bag semantics and assumes that the aggregator does not depend
    % on the order of the (unsorted) aggregate.
    %
:- pred bag_aggr(pred(T)::(pred(out) is nondet),
		 pred(T,U,U)::pred(in,in,out) is det, U::in, U::out) is det.

    % bag_count(Predicate, Result) returns a bag count.
    % This has the same semantics as bag_aggr.
    %
:- pred bag_count(pred(T)::(pred(out) is nondet), int::out) is det.

    % bag_max(Predicate, Initial, Result) returns a max aggregate.
    %
:- pred bag_max(pred(T)::(pred(out) is nondet), T::in, T::out) is det.

    % bag_min(Predicate, Initial, Result) returns a min aggregate.
    %
:- pred bag_min(pred(T)::(pred(out) is nondet), T::in, T::out) is det.

    % bag_solutions(Predicate, List) returns an unsorted bag of solutions
    % as a list.
:- pred bag_solutions(pred(T)::(pred(out) is nondet), list(T)::out) is det.

    % set_max(Predicate, Initial, Result) returns a max aggregate.
    %
:- pred set_max(pred(T)::(pred(out) is nondet), T::in, T::out) is det.

    % set_min(Predicate, Initial, Result) returns a min aggregate.
    %
:- pred set_min(pred(T)::(pred(out) is nondet), T::in, T::out) is det.

    % bag_solutions(Predicate, List) returns an unsorted bag of solutions
    % as a list.

    % set_count(Predicate, Result) returns a set count.
    %
:- pred set_count(pred(T)::(pred(out) is nondet), int::out) is det.

    % bag_row_number(Predicate(By)::in, Predicate(By,RowNumber)::out) takes a predicate
    % with a By value and outputs a predicate with the By value and a bag 
    % cumulative sum for the X value sorted by the By value.
    %
:- pred bag_row_number(pred(T)::in(pred(out) is nondet),
		   pred(T,int)::out(pred(out,out) is nondet)) is det.
    % bag_row_number(Predicate(By)) = Predicate(By,RowNumber) takes a predicate
    % with a By value and returns a predicate with the By value and a bag 
    % row number sorted by the By value.
    %
:- func bag_row_number(pred(T)::in(pred(out) is nondet)) = 
   (pred(T,int)::out(pred(out,out) is nondet)) is det.

:- module aggregates.floats.
:- interface.
:- import_module float.

    % bag_max(Predicate, Result) returns a max aggregate.
    % Returns min if the count is less than 1.
    %
:- pred bag_max(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_min(Predicate, Result) returns a min aggregate.
    % Returns max if the count is less than 1.
    %
:- pred bag_min(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_sum(Predicate, Result) returns a bag sum aggregate.
    % This has the same semantics as bag_aggr.
    %
:- pred bag_sum(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_avg(Predicate, Result) returns a bag average aggregate.
    % This has the same semantics as bag_aggr.
    % Returns nan if the count is less than 1.
    %
:- pred bag_avg(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_mean(Predicate, Result) returns a bag average aggregate.
    % This has the same semantics as bag_aggr.
    % Returns nan if the count is less than 1.
    % This is an alias for bag_avg.
    %
:- pred bag_mean(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_geom_mean(Predicate, Result) returns a bag geometric average aggregate.
    % This has the same semantics as bag_aggr.
    % Returns nan if the count is less than 1.
    %
:- pred bag_geom_mean(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_var(Predicate, Result) returns a bag sample variance aggregate.
    % Returns nan if the count is less than 2.
    % This has the same semantics as bag_aggr.
    %
:- pred bag_var(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_median(Predicate, Result) returns a bag median aggregate.
    % Returns nan if the count is less than 1.
    %
:- pred bag_median(pred(float)::(pred(out) is nondet), float::out) is det.

    % bag_quantile(Predicate, Quantile, Result) returns a bag quantile aggregate.
    % Returns nan if the count is less than 1.
    %
:- pred bag_quantile(pred(float)::(pred(out) is nondet), float::in, float::out)
   is det.

    % bag_summary(Predicate, Quantile, Result) returns a bag summary
    % that includes {Minimum, FirstQuartile, Median, Mean, ThirdQuartile, Maximum}.
    % This has the same format as R's summary() for a numeric vector.
    %
:- pred bag_summary(pred(float)::(pred(out) is nondet),
		    {float,float,float,float,float,float}::out) is det.

    % set_max(Predicate, Result) returns a max aggregate.
    % Returns min if the count is less than 1.
    %
:- pred set_max(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_min(Predicate, Result) returns a min aggregate.
    % Returns max if the count is less than 1.
    %
:- pred set_min(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_sum(Predicate, Result) returns a set sum aggregate.
    %
:- pred set_sum(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_avg(Predicate, Result) returns a set average aggregate.
    % Returns nan if the count is less than 1.
    %
:- pred set_avg(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_mean(Predicate, Result) returns a set average aggregate.
    % Returns nan if the count is less than 1.
    % This is an alias for set_avg.
    %
:- pred set_mean(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_geom_mean(Predicate, Result) returns a set geometric average aggregate.
    % Returns nan if the count is less than 1.
    %
:- pred set_geom_mean(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_var(Predicate, Result) returns a set sample variance aggregate.
    % Returns nan if the count is less than 2.
    %
:- pred set_var(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_median(Predicate, Result) returns a set median aggregate.
    % Returns nan if the count is less than 1.
    %
:- pred set_median(pred(float)::(pred(out) is nondet), float::out) is det.

    % set_quantile(Predicate, Quantile, Result) returns a bag quantile aggregate.
    % Returns nan if the count is less than 1.
    %
:- pred set_quantile(pred(float)::(pred(out) is nondet), float::in, float::out)
   is det.

    % set_summary(Predicate, Quantile, Result) returns a set summary
    % that includes {Minimum, FirstQuartile, Median, Mean, ThirdQuartile, Maximum}.
    % This has the same format as R's summary() for a distinct numeric vector.
    %
:- pred set_summary(pred(float)::(pred(out) is nondet),
		    {float,float,float,float,float,float}::out) is det.

    % bag_cum_sum(Predicate(By,X)::in, Predicate(By,CumSum)::out) takes a predicate
    % with a By value and an X value and outputs a predicate with the By value and a bag 
    % cumulative sum for the X values sorted by the By value.
    %
:- pred bag_cum_sum(pred(T,float)::in(pred(out,out) is nondet),
		    pred(T,float)::out(pred(out,out) is nondet)) is det.

    % bag_cum_sum(Predicate(By,X)) = Predicate(By,CumSum) takes a predicate
    % with a By value and an X value and returns a predicate with the By value and a bag 
    % cumulative sum for the X value sorted by the By value.
    %
:- func bag_cum_sum(pred(T,float)::in(pred(out,out) is nondet)) =
   (pred(T,float)::out(pred(out,out) is nondet)) is det.

:- end_module aggregates.floats.

:- module aggregates.ints.
:- interface.
:- import_module int.

    % bag_max(Predicate, Result) returns a max aggregate.
    % Returns min_int if the count is less than 1.
    %
:- pred bag_max(pred(int)::(pred(out) is nondet), int::out) is det.

    % bag_min(Predicate, Result) returns a min aggregate.
    % Returns max_int if the count is less than 1.
    %
:- pred bag_min(pred(int)::(pred(out) is nondet), int::out) is det.

    % bag_sum(Predicate, Result) returns a bag sum aggregate.
    % This has the same semantics as bag_aggr.
    % Returns 0 if the count is less than 1.
    %
:- pred bag_sum(pred(int)::(pred(out) is nondet), int::out) is det.

    % set_max(Predicate, Result) returns a max aggregate.
    % Returns min_int if the count is less than 1.
    %
:- pred set_max(pred(int)::(pred(out) is nondet), int::out) is det.

    % set_min(Predicate, Result) returns a min aggregate.
    % Returns max_int if the count is less than 1.
    %
:- pred set_min(pred(int)::(pred(out) is nondet), int::out) is det.

    % set_sum(Predicate, Result) returns a set sum aggregate.
    % Returns 0 if the count is less than 1.
    %
:- pred set_sum(pred(int)::(pred(out) is nondet), int::out) is det.
:- end_module aggregates.ints.

About

Bag and set aggregates for Mercury

License:BSD 2-Clause "Simplified" License


Languages

Language:Mercury 100.0%