Skip to main content

Aggregation

  • Clause of collect
  • Functions are aggregate versions
  • Optimizers ressource utilization

(numerical funcs doc)[https://www.arangodb.com/docs/stable/aql/functions-numeric.html]

Task

  • Find longest flight distance
  • Find shortest flight distance

MIN

MIN(aggregate)

  • Compares a single element
  • Stores smalles elem as min value

MAX

MAX(aggregate)

  • Compares a single element
  • Stores smalles elem as max value
FOR flight IN flights
COLLECT AGGREGATE
minDistance = MIN(flight.Distance),
maxDistance = MAX(flight.Distance)
RETURN { "shortest flight": minDistance, "longest flight": maxDistance }

query explanation

Query String (207 chars, cacheable: true):
FOR flight IN flights
COLLECT AGGREGATE
minDistance = MIN(flight.Distance),
maxDistance = MAX(flight.Distance)
RETURN { "shortest flight": minDistance, "longest flight": maxDistance }

Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 286463 - FOR flight IN flights /* full collection scan, projections: `Distance` */
3 CalculationNode 286463 - LET #3 = flight.`Distance` /* attribute expression */ /* collections used: flight : flights */
5 CollectNode 1 - COLLECT AGGREGATE minDistance = MIN(#3), maxDistance = MAX(#3) /* sorted */
6 CalculationNode 1 - LET #7 = { "shortest flight" : minDistance, "longest flight" : maxDistance } /* simple expression */
7 ReturnNode 1 - RETURN #7

Indexes used:
none

Optimization rules applied:
Id RuleName
1 move-calculations-up
2 remove-redundant-calculations
3 remove-unnecessary-calculations
4 reduce-extraction-to-projection

Optimization rules with highest execution times:
RuleName Duration [s]
remove-redundant-calculations 0.00010
reduce-extraction-to-projection 0.00010
move-calculations-up 0.00008
remove-unnecessary-calculations 0.00007
replace-function-with-index 0.00005

42 rule(s) executed, 1 plan(s) created

explanation

  • Aggregate functions are all optimized in this way to avoid needing to gather a very large dataset.
  • This offers a very performance way to iterate through a large dataset.
  • If u wanted to do it in memory you needed to gather x elements. x is here 286463.