Skip to main content

Merge Function

The Merge function combines data from two sources based on matching keys, similar to a SQL JOIN. It's essential for bringing together related data from different entities or creating unified datasets.

How It Works

Merge connects two data sources and:

  • Matches rows based on specified key fields
  • Combines matched rows into a single output row
  • Handles unmatched rows based on the join type
  • Includes fields from both sources in the output

The join logic determines which rows appear in the output.

Adding a Merge

  1. Drag Merge from the Functions section of the Element Panel
  2. Connect two upstream sources to the Merge node
  3. Click the Merge node to configure join settings

Important: Merge requires exactly two inputs:

  • Left source (primary data)
  • Right source (data to join)

Configuration Panel

Join Type

Select how to handle matching and non-matching rows:

Inner Join

Only rows with matches in both sources appear in output.

Visual:

Left:  1, 2, 3, 4
Right: 2, 3, 5, 6

Output: 2, 3

Use when:

  • You only want records that exist in both datasets
  • Filtering to common records is the goal
  • Missing data should exclude the row

Left Join

All rows from left source appear. Right source data is included when matched.

Visual:

Left:  1, 2, 3, 4
Right: 2, 3, 5, 6

Output:1, 2, 3, 4 (with right data for 2, 3; null for 1, 4)

Use when:

  • Left source is your primary data
  • You want all primary records regardless of matches
  • Missing related data should be null

Right Join

All rows from right source appear. Left source data is included when matched.

Visual:

Left:  1, 2, 3, 4
Right: 2, 3, 5, 6

Output: 2, 3, 5, 6 (with left data for 2, 3; null for 5, 6)

Use when:

  • Right source is your primary data
  • You want all secondary records regardless of matches
  • Similar to left join with sources swapped

Full Outer Join

All rows from both sources appear. Fields are null where there's no match.

Visual:

Left:  1, 2, 3, 4
Right: 2, 3, 5, 6

Output:1, 2, 3, 4, 5, 6 (nulls for unmatched fields)

Use when:

  • You need a complete picture from both sources
  • Analyzing which records exist in one source but not the other
  • Creating a master list from two partial lists

Join Keys

Specify which fields to match between sources.

Single Key

Match on one field from each source:

Configuration:

Left KeyRight Key
CustomerIDCustID

Rows match when CustomerID equals CustID.

Multiple Keys (Composite Key)

Match on multiple fields (all must match):

Configuration:

Left KeyRight Key
OrderDateDate
ProductIDItemID
RegionTerritory

Rows match when ALL key pairs are equal.

Adding Keys

  1. Click Add Join Key
  2. Select the left source field
  3. Select the corresponding right source field
  4. Add additional key pairs as needed

Field Selection

Control which fields appear in output:

Include All Fields:

  • All fields from both sources included
  • Conflicting names get prefixes (left., right.)

Select Specific Fields:

  • Choose which fields to include from each source
  • Specify output names to avoid conflicts

Field Naming Conflicts

When both sources have the same field name:

Automatic prefixing:

Left has: ID, Name, Status
Right has: ID, Name, Category

Output: left.ID, left.Name, left.Status, right.ID, right.Name, right.Category

Manual resolution:

  • Rename fields in source using Map
  • Select only needed fields
  • Use Map after Merge to clean up names

Join Type Comparison

Join TypeLeft MatchesRight MatchesLeft UnmatchedRight Unmatched
InnerExcludedExcluded
LeftIncluded (nulls)Excluded
RightExcludedIncluded (nulls)
Full OuterIncluded (nulls)Included (nulls)

Common Use Cases

Orders with Customer Data

Scenario: Combine orders with customer details.

Sources:

  • Left: Orders (OrderID, CustomerID, OrderDate, Total)
  • Right: Customers (CustomerID, CustomerName, Email, Region)

Configuration:

  • Join Type: Left Join (keep all orders)
  • Left Key: CustomerID
  • Right Key: CustomerID

Output: OrderID, CustomerID, OrderDate, Total, CustomerName, Email, Region

Products with Categories

Scenario: Add category names to products.

Sources:

  • Left: Products (ProductID, ProductName, CategoryID, Price)
  • Right: Categories (CategoryID, CategoryName, Description)

Configuration:

  • Join Type: Left Join
  • Left Key: CategoryID
  • Right Key: CategoryID

Output: Products with category information

Employees with Departments and Managers

Scenario: Build employee directory with hierarchy.

First Merge:

  • Left: Employees (EmpID, Name, DeptID, ManagerID)
  • Right: Departments (DeptID, DeptName)
  • Join on: DeptID

Second Merge:

  • Left: Result of first merge
  • Right: Employees as Managers (EmpID, Name as ManagerName)
  • Join on: ManagerID = EmpID

Finding Differences

Scenario: Find records in one source but not the other.

For records in Left but not in Right:

  1. Left Join
  2. Filter where right key IS NULL

For records in Right but not in Left:

  1. Right Join
  2. Filter where left key IS NULL

For records in either but not both:

  1. Full Outer Join
  2. Filter where either key IS NULL

Invoice Reconciliation

Scenario: Match invoices between two systems.

Sources:

  • Left: System A invoices (InvoiceNo, Amount, Date)
  • Right: System B invoices (InvNumber, Total, InvDate)

Configuration:

  • Join Type: Full Outer
  • Keys: InvoiceNo = InvNumber, Date = InvDate

Analysis:

  • Both match: Invoice in both systems
  • Left only (right null): Only in System A
  • Right only (left null): Only in System B

Multiple Merges

For more than two sources, chain Merge nodes:

Three-way merge:

[Source A] ─┐
├─[Merge 1]─┐
[Source B] ─┘ ├─[Merge 2]─→ [Output]
[Source C] ─────────────┘

Example: Orders with Customers and Products:

  1. Merge Orders with Customers (on CustomerID)
  2. Merge result with Products (on ProductID)

Performance Optimization

Filter Before Merge

Always filter data before merging when possible:

Less efficient:

[All Orders] → [Merge with Customers] → [Filter: This Year]

More efficient:

[Orders filtered to This Year] → [Merge with Customers]

Smaller Source as Right

When one source is much smaller:

  • Use the smaller source as right
  • Use Left Join
  • This can improve performance

Index-Friendly Keys

Join keys that match database indexes perform better:

  • Primary keys
  • Foreign keys
  • Indexed columns

Limit Merged Fields

Don't include all fields if you only need a few:

  • Select only required fields in the Merge configuration
  • Or use Map after Merge to reduce fields

Join Key Considerations

Data Types Must Match

Keys must have compatible types:

// This works
Left: CustomerID (Number) = Right: CustID (Number)

// This may not work
Left: CustomerID (Text) = Right: CustID (Number)

If types differ, use Transform to convert before merging.

Case Sensitivity

Text key comparisons may be case-sensitive:

// May not match
"ABC-001" vs "abc-001"

Normalize case using Transform before merge:

UPPER(${KeyField}) or LOWER(${KeyField})

Null Keys

Null values typically don't match:

  • Null on left doesn't match null on right
  • This is standard SQL behavior

Handle nulls explicitly if matching is needed:

COALESCE(${KeyField}, "UNKNOWN")

Whitespace Issues

Trailing spaces can prevent matches:

"Customer A  " vs "Customer A"

Use TRIM in Transform before merge:

TRIM(${KeyField})

One-to-Many Joins

When right source has multiple matches:

Scenario: One customer has many orders

Left: Customers (1 row per customer) Right: Orders (many rows per customer)

Result: Customer appears multiple times (once per order)

CustomerIDCustomerNameOrderIDAmount
C001Acme CorpO1001500
C001Acme CorpO1002750
C001Acme CorpO1003200
C002Beta IncO10041000

To get one row per customer with aggregated order data:

  1. First use GroupBy on Orders to aggregate
  2. Then Merge with Customers

Many-to-Many Joins

When both sources have multiple matches:

Left: Products (can appear in multiple orders) Right: Orders (can have multiple products)

Result: Cartesian product for each matching key

ProductIDOrderID
P001O001
P001O002
P002O001
P002O002
P002O003

Be cautious of row explosion with many-to-many.

Troubleshooting

No Matches Found (Empty Output)

Possible causes:

  1. Key values don't actually match

    • Check sample data from both sources
    • Look for case, whitespace, or format differences
  2. Wrong key fields selected

    • Verify you selected the correct join keys
    • Check field names and meanings
  3. Data types incompatible

    • Convert types before merge
    • Ensure numbers match numbers, text matches text

Debugging steps:

  1. Preview each source separately
  2. Look at actual key values
  3. Try matching a specific known value

Too Many Rows (Row Explosion)

Possible causes:

  1. Many-to-many relationship

    • One key matches multiple rows on both sides
    • Consider aggregating first
  2. Duplicate keys in one source

    • Deduplicate before merge
    • Use Distinct or GroupBy
  3. Wrong join keys

    • Keys may not uniquely identify relationships

Debugging steps:

  1. Check row counts before and after
  2. Preview for duplicate key values
  3. Verify join key uniqueness

Missing Expected Fields

Possible causes:

  1. Fields excluded from selection

    • Check field selection in Merge config
    • Add missing fields
  2. Field naming conflict

    • Look for prefixed names (left., right.)
    • Use Map to rename

Null Values in Joined Data

For Left/Right joins this is expected:

  • Unmatched rows have nulls from the other source
  • Use COALESCE in Transform if defaults needed

If unexpected:

  • Check that matches should exist
  • Verify key values actually match

Examples

Customer Order Summary

Sources:

  • Customers: CustomerID, Name, Region
  • Orders: OrderID, CustomerID, Date, Amount

Goal: All customers with their total orders and amount

Flow:

  1. GroupBy Orders on CustomerID, calculate SUM(Amount), COUNT(OrderID)
  2. Merge Customers with aggregated Orders (Left Join on CustomerID)

Result: Each customer with OrderCount and TotalAmount (nulls for customers with no orders)

Product Inventory Check

Sources:

  • Products: SKU, Name, Category
  • Inventory: SKU, Warehouse, Quantity

Goal: Products with inventory levels per warehouse

Configuration:

  • Join Type: Left Join
  • Key: SKU = SKU

Result: All products with inventory levels (null quantity for products not in inventory)

Invoice Discrepancy Report

Sources:

  • AR Invoices: InvoiceNo, Amount, Date
  • AP Invoices: InvNo, Total, InvDate

Goal: Find mismatches and missing invoices

Configuration:

  • Join Type: Full Outer
  • Keys: InvoiceNo = InvNo, Date = InvDate

Add Transform:

MatchStatus = CASE(
ISNULL(${InvoiceNo}), "Missing in AR",
ISNULL(${InvNo}), "Missing in AP",
${Amount} == ${Total}, "Matched",
"Amount Mismatch"
)

Filter: MatchStatus != "Matched"

Hierarchical Data

Sources:

  • Employees: EmpID, Name, ManagerID, DeptID
  • Departments: DeptID, DeptName
  • Employees (again, for managers): EmpID as MgrID, Name as MgrName

Flow:

  1. Merge Employees with Departments (Left Join on DeptID)
  2. Merge result with Employees-as-Managers (Left Join on ManagerID = MgrID)

Result: Employees with department names and manager names

Next Steps