Skip to main content

Group By Function

The Group By function aggregates your data by grouping rows with common values and calculating summary statistics. It's essential for creating reports, summaries, and analytics from detailed data.

How It Works

Group By processes your data by:

  1. Identifying rows with matching values in grouping fields
  2. Combining those rows into a single output row
  3. Calculating aggregate functions across grouped rows
  4. Outputting one row per unique group

Example:

Input: 10 orders across 3 customers
Output: 3 rows (one per customer) with order totals

Adding a Group By

  1. Drag Group By from the Functions section of the Element Panel
  2. Connect it to your data source
  3. Click the Group By node to configure grouping and aggregations

Configuration Panel

Grouping Fields

Specify which fields define the groups:

Single field grouping:

  • Group by: Region
  • Output: One row per unique Region

Multiple field grouping:

  • Group by: Region, Year
  • Output: One row per unique Region-Year combination

Adding Grouping Fields

  1. Click Add Grouping Field
  2. Select a field from the dropdown
  3. Add additional fields as needed

The order of grouping fields affects output ordering but not the grouping logic.

Aggregate Functions

Define calculations to perform on grouped rows:

COUNT

Counts the number of rows in each group.

Configuration:

FunctionFieldOutput Name
COUNT*OrderCount

Variations:

  • COUNT(*) - Count all rows
  • COUNT(field) - Count non-null values in field

SUM

Adds up numeric values across grouped rows.

Configuration:

FunctionFieldOutput Name
SUMAmountTotalAmount

Use for:

  • Total sales
  • Total quantity
  • Sum of any numeric measure

AVG (Average)

Calculates the arithmetic mean of numeric values.

Configuration:

FunctionFieldOutput Name
AVGPriceAveragePrice

Note: Null values are excluded from average calculation.

MIN

Returns the minimum value in the group.

Configuration:

FunctionFieldOutput Name
MINOrderDateFirstOrderDate

Works with:

  • Numbers (smallest value)
  • Dates (earliest date)
  • Text (alphabetically first)

MAX

Returns the maximum value in the group.

Configuration:

FunctionFieldOutput Name
MAXPriceHighestPrice

Works with:

  • Numbers (largest value)
  • Dates (latest date)
  • Text (alphabetically last)

FIRST

Returns the first value encountered in the group.

Configuration:

FunctionFieldOutput Name
FIRSTDescriptionSampleDescription

Use when:

  • You need any representative value
  • The field is same for all rows in group
  • Order matters (with sorting)

LAST

Returns the last value encountered in the group.

Configuration:

FunctionFieldOutput Name
LASTStatusCurrentStatus

Use when:

  • You want the most recent value
  • Data is pre-sorted chronologically

LIST

Collects all values into an array.

Configuration:

FunctionFieldOutput Name
LISTProductNameProductList

Output: Array of all product names in the group

Use when:

  • You need to preserve all values
  • Creating comma-separated lists
  • Nested data structures

DISTINCT_COUNT (Count Distinct)

Counts unique values in the group.

Configuration:

FunctionFieldOutput Name
DISTINCT_COUNTCustomerIDUniqueCustomers

Use for:

  • Counting unique customers
  • Counting distinct products
  • Any unique count scenario

Multiple Aggregations

Add multiple aggregations for comprehensive summaries:

Example configuration:

FunctionFieldOutput Name
COUNT*OrderCount
SUMAmountTotalAmount
AVGAmountAverageAmount
MINOrderDateFirstOrder
MAXOrderDateLastOrder
DISTINCT_COUNTProductIDUniqueProducts

Common Use Cases

Sales Summary by Region

Input: Order details Grouping: Region

Aggregations:

FunctionFieldOutput
COUNT*OrderCount
SUMRevenueTotalRevenue
AVGRevenueAverageOrderValue
DISTINCT_COUNTCustomerIDUniqueCustomers

Output:

RegionOrderCountTotalRevenueAverageOrderValueUniqueCustomers
East1504500030085
West1205200043372

Monthly Trend Analysis

Input: Daily transactions Grouping: Year, Month

Aggregations:

FunctionFieldOutput
SUMAmountMonthlyTotal
COUNT*TransactionCount
AVGAmountDailyAverage

Output: Monthly summaries for trend analysis

Product Performance

Input: Sales line items Grouping: ProductID, ProductName

Aggregations:

FunctionFieldOutput
SUMQuantityTotalUnitsSold
SUMRevenueTotalRevenue
COUNTOrderIDOrdersContaining
AVGUnitPriceAverageSellingPrice

Customer Lifetime Value

Input: All customer orders Grouping: CustomerID

Aggregations:

FunctionFieldOutput
MINOrderDateFirstPurchase
MAXOrderDateLastPurchase
COUNTOrderIDTotalOrders
SUMAmountLifetimeValue
AVGAmountAverageOrderValue

Inventory Status

Input: Stock movements Grouping: WarehouseID, ProductID

Aggregations:

FunctionFieldOutput
SUMQuantityInTotalReceived
SUMQuantityOutTotalShipped
LASTRunningBalanceCurrentStock

Grouping Strategies

Single Field Grouping

Simple category totals:

Group by: Category
Result: One row per category with totals

Multiple Field Grouping

Cross-tabulation:

Group by: Category, Region
Result: One row per category-region combination

Date-Based Grouping

To group by date parts, first transform dates:

Daily grouping:

Transform: DateOnly = FORMAT_DATE(${DateTime}, "YYYY-MM-DD")
Group by: DateOnly

Monthly grouping:

Transform: YearMonth = FORMAT_DATE(${Date}, "YYYY-MM")
Group by: YearMonth

Weekly grouping:

Transform: WeekStart = DATEADD(${Date}, -DATEPART(${Date}, "dayofweek"), "day")
Group by: WeekStart

Hierarchical Grouping

Create subtotals and grand totals:

Multi-level flow:

[Data] → [GroupBy: Region, Store] → [Subtotals by Store]
→ [GroupBy: Region] → [Subtotals by Region]
→ [GroupBy: (none)] → [Grand Total]
[Append all levels for rollup report]

Working with Aggregates

Filtering Grouped Data

Apply filters after Group By for aggregate-based filtering:

Example: High-value customers only

[Orders] → [GroupBy: CustomerID] → [Filter: TotalAmount > 10000]

This is equivalent to SQL's HAVING clause.

Calculating Percentages

After aggregation, calculate derived metrics:

Transform after Group By:

RevenueShare = ${GroupRevenue} / ${parameter:TotalRevenue} * 100

Ranking Groups

Add rankings after aggregation:

Transform:

// Would require sort + row number or use Lookup for ranks

Ratios and Rates

Calculate ratios from aggregates:

Transform after Group By:

ConversionRate = ${Conversions} / ${Visits} * 100
AverageOrderValue = ${TotalRevenue} / ${OrderCount}
RevenuePerCustomer = ${TotalRevenue} / ${UniqueCustomers}

NULL Handling

Grouping with NULLs

Null values in grouping fields:

  • All nulls are grouped together
  • Creates one group for "no value"

To separate null handling:

Transform before GroupBy:
GroupField = COALESCE(${Field}, "Unknown")

Aggregates with NULLs

FunctionNULL Behavior
COUNT(*)Counts all rows including nulls
COUNT(field)Excludes null values
SUMIgnores nulls (0 if all null)
AVGIgnores nulls in calculation
MIN/MAXIgnores nulls
FIRST/LASTMay return null if encountered

Handling Empty Groups

If a group has no data (after filtering):

  • It won't appear in output
  • To show zeros, use Lookup from master list

Performance Considerations

Filter Before Grouping

Reduce data volume before aggregation:

Less efficient:

[All Orders] → [GroupBy: Region] → [Filter: Year = 2024]

More efficient:

[Orders WHERE Year = 2024] → [GroupBy: Region]

Limit Grouping Fields

More grouping fields = more groups = more output rows:

  • Group by what's needed only
  • Consider hierarchical grouping for drill-down

Selective Aggregations

Calculate only needed aggregations:

  • Each aggregation has computational cost
  • Remove unused aggregations

Large Dataset Considerations

For very large datasets:

  • Filter aggressively before grouping
  • Consider pre-aggregated sources
  • Use incremental processing

Troubleshooting

Wrong Aggregation Results

Count too high:

  • Duplicates in source data
  • Use DISTINCT_COUNT for unique counts
  • Deduplicate before grouping

Sum seems wrong:

  • Check for null values
  • Verify the aggregated field
  • Preview source data

Average unexpected:

  • Nulls excluded from average
  • Check for outliers
  • Verify data types (text vs number)

Too Many Groups

Expected: 100 regions Got: 10,000 rows

Causes:

  • Data quality issues (variations in values)
  • Wrong grouping field
  • Unexpected granularity

Solutions:

  • Standardize values before grouping
  • Check for leading/trailing spaces
  • Verify correct field selected

Missing Groups

Expected groups not appearing:

  • No data for that group (correct behavior)
  • Upstream filtering removed data
  • Null values grouped separately

To show all groups:

  1. Create master list of expected groups
  2. Left Join master with aggregated data
  3. COALESCE nulls to zeros

Memory Issues

Symptoms: Slow or failing on large data

Solutions:

  • Filter source data first
  • Reduce number of aggregations
  • Consider sampling for analysis
  • Process in chunks

Examples

Sales Dashboard Metrics

Goal: Key metrics for sales dashboard

Grouping: (None - grand totals)

Aggregations:

TotalRevenue = SUM(Amount)
TotalOrders = COUNT(*)
UniqueCustomers = DISTINCT_COUNT(CustomerID)
AverageOrderValue = AVG(Amount)
LargestOrder = MAX(Amount)

Department Budget Summary

Goal: Budget utilization by department

Grouping: Department

Aggregations:

TotalBudget = SUM(BudgetAmount)
TotalSpent = SUM(ActualAmount)
TransactionCount = COUNT(*)

Transform after:

Remaining = ${TotalBudget} - ${TotalSpent}
UtilizationPct = ${TotalSpent} / ${TotalBudget} * 100

Time-Based Analysis

Goal: Hourly traffic patterns

Preparation Transform:

HourOfDay = DATEPART(${Timestamp}, "hour")

Grouping: HourOfDay

Aggregations:

Visits = COUNT(*)
UniqueVisitors = DISTINCT_COUNT(SessionID)
PageViews = SUM(PageCount)

Customer Segmentation

Goal: Segment customers by behavior

Grouping: CustomerID

Aggregations:

OrderCount = COUNT(*)
TotalSpent = SUM(Amount)
FirstOrder = MIN(OrderDate)
LastOrder = MAX(OrderDate)

Transform after:

Segment = CASE(
${TotalSpent} >= 10000 && ${OrderCount} >= 10, "VIP",
${TotalSpent} >= 5000, "Premium",
${TotalSpent} >= 1000, "Regular",
"New"
)

Product Mix Analysis

Goal: Analyze product category performance

Grouping: Category, SubCategory

Aggregations:

SKUCount = DISTINCT_COUNT(ProductID)
UnitsSold = SUM(Quantity)
Revenue = SUM(LineTotal)
OrdersContaining = DISTINCT_COUNT(OrderID)

Sort: Revenue DESC

Next Steps