Skip to main content

Split Function

The Split function expands a single row into multiple rows by splitting a field that contains multiple values. It's essential for normalizing denormalized data where multiple values are stored in a single field.

How It Works

Split takes a field containing delimited values and:

  1. Parses the delimited string into individual values
  2. Creates one output row for each value
  3. Copies all other fields to each new row
  4. Replaces the original field with single values

Example:

Input: 1 row with Tags = "red,blue,green"
Output: 3 rows with Tag = "red", "blue", "green"

Adding a Split

  1. Drag Split from the Functions section of the Element Panel
  2. Connect it to your data source
  3. Click the Split node to configure split settings

Configuration Panel

Field to Split

Select the field containing delimited values:

Dropdown: Shows all available text fields from upstream data.

Field requirements:

  • Must be text/string type
  • Should contain delimited values
  • Empty or null fields handled gracefully

Delimiter

Specify the character(s) separating values:

Common delimiters:

DelimiterCharacterExample Data
Comma,"red,blue,green"
Semicolon;"A;B;C"
Pipe|"val1|val2|val3"
Tab\t"col1 col2 col3"
Newline\n"line1\nline2\nline3"
Space "word1 word2 word3"

Custom delimiter: Enter any string:

" - " for "A - B - C"
"::" for "part1::part2::part3"
" and " for "apples and oranges and bananas"

Output Field Name

Name for the field containing individual values:

Default: Same as source field name

Custom name:

  • Rename to indicate singular form
  • Example: "Tags" → "Tag"
  • Example: "Categories" → "Category"

Trim Values

Remove whitespace from split values:

Enabled (recommended):

"red, blue, green" → "red", "blue", "green"

Disabled:

"red, blue, green" → "red", " blue", " green"

Handle Empty Values

What to do when split produces empty values:

Skip empty (default):

"A,,B" → "A", "B" (2 rows)

Keep empty:

"A,,B" → "A", "", "B" (3 rows)

Keep Original

Optionally retain the original unsplit field:

Keep original: Yes

Output has: Tag (split value) AND Tags (original "red,blue,green")

Keep original: No (default)

Output only has: Tag (split value)

Row Expansion

Understanding Row Multiplication

Split multiplies rows:

Input:

IDNameTags
1Item Ared,blue
2Item Bgreen
3Item Cred,blue,yellow

Output (6 rows):

IDNameTag
1Item Ared
1Item Ablue
2Item Bgreen
3Item Cred
3Item Cblue
3Item Cyellow

Row Count Impact

New row count = sum of values per row

Before: 100 rows, average 3 values each After: ~300 rows

Plan downstream processing for increased volume.

Preserving Row Identity

All non-split fields are duplicated:

  • Primary keys appear multiple times
  • Other fields copied to each new row
  • Consider adding a sequence number if needed

Common Use Cases

Normalizing Tags/Categories

Scenario: Products with comma-separated tags

Input:

ProductIDProductNameTags
P001Widgetelectronics,gadget,sale

After Split:

ProductIDProductNameTag
P001Widgetelectronics
P001Widgetgadget
P001Widgetsale

Expanding Multi-Select Fields

Scenario: Form submissions with multiple selections

Input:

ResponseIDQuestionAnswers
R001InterestsSports;Music;Travel

After Split:

ResponseIDQuestionAnswer
R001InterestsSports
R001InterestsMusic
R001InterestsTravel

Processing Recipients

Scenario: Emails with multiple recipients

Input:

EmailIDSubjectRecipients
E001Meetinga@co.com;b@co.com;c@co.com

After Split:

EmailIDSubjectRecipient
E001Meetinga@co.com
E001Meetingb@co.com
E001Meetingc@co.com

Breaking Apart Composite Keys

Scenario: Compound identifiers

Input:

CompositeKeyValue
US-2024-001100

Configuration: Split on "-"

After Split:

CompositeKeyValueKeyPart
US-2024-001100US
US-2024-0011002024
US-2024-001100001

Note: Often better to use Transform with SUBSTRING for structured keys.

Hierarchical Path Expansion

Scenario: File paths or hierarchies

Input:

FilePath
/home/user/docs/file.txt

Configuration: Split on "/"

After Split:

FilePathPathPart
/home/user/docs/file.txthome
/home/user/docs/file.txtuser
/home/user/docs/file.txtdocs
/home/user/docs/file.txtfile.txt

Working with Arrays

JSON Arrays

For JSON array fields, Split can expand:

Input:

IDItems
1["A","B","C"]

Configuration:

  • Parse JSON first if needed
  • Split on array elements

Nested Arrays

For complex nested data:

  1. Extract the array field
  2. Apply Split
  3. Further process as needed

Combining with Other Functions

Split + Group By

Normalize then aggregate:

[Products with Tags] → [Split: Tags] → [GroupBy: Tag, COUNT ProductID]

Result: Count of products per tag

Split + Distinct

Get unique values from all rows:

[Data with Categories] → [Split: Categories] → [Distinct: Category]

Result: Unique list of all categories used

Split + Filter

Filter after splitting:

[Data] → [Split: Tags] → [Filter: Tag = "featured"]

Result: Only rows where "featured" was one of the tags

Split + Merge

Enrich split values:

[Orders with ProductCodes] → [Split: ProductCodes] → [Merge: Products on Code]

Result: Separate row for each product with full product details

Handling Edge Cases

Empty Field

When split field is empty or null:

  • Default: Row is kept with null/empty value
  • Skip empty: Row might be excluded

Single Value

When field contains no delimiter:

  • Row passes through unchanged
  • Single value in output field

Delimiter at Start/End

Input: ",A,B,C,"
Skip empty: "A", "B", "C"
Keep empty: "", "A", "B", "C", ""

Consecutive Delimiters

Input: "A,,B"
Skip empty: "A", "B"
Keep empty: "A", "", "B"

Delimiter in Value

If your data might contain the delimiter:

Problem:

Address: "123 Main St, Apt 4, City, State"
Split on comma: Wrong results

Solutions:

  • Use different delimiter in source data
  • Use more specific delimiter (e.g., " | ")
  • Pre-process to escape or replace problematic instances

Performance Considerations

Row Explosion

Split can dramatically increase row count:

  • 1,000 rows × 10 values = 10,000 rows
  • Plan for expanded data volume
  • Filter before split when possible

Memory Usage

Large splits consume memory:

  • Each new row requires memory
  • Consider batch processing for very large datasets

Downstream Impact

More rows means:

  • Longer processing time
  • More data to merge/join
  • Larger output files

Optimization:

[Filter to relevant rows] → [Split] → [Continue]

vs.

[Split all data] → [Filter] (processes more rows)

Troubleshooting

No Rows Created

Symptoms: Output has same row count as input

Causes:

  1. Wrong delimiter specified
  2. Field doesn't contain the delimiter
  3. Wrong field selected

Solutions:

  • Preview source data
  • Check actual delimiter in data
  • Verify field selection

Too Many Rows

Symptoms: Unexpected row explosion

Causes:

  1. Delimiter appears more than expected
  2. Delimiter found in regular content

Solutions:

  • Use more specific delimiter
  • Pre-process data to standardize
  • Filter before split

Whitespace Issues

Symptoms: Values have leading/trailing spaces

Causes:

  1. Trim option disabled
  2. Source data has inconsistent spacing

Solutions:

  • Enable trim option
  • Add Transform after split: TRIM({Field})

Missing Values

Symptoms: Some values not appearing

Causes:

  1. Empty values being skipped
  2. Delimiter issues

Solutions:

  • Check "Handle empty values" setting
  • Preview source data

Examples

Product Attribute Expansion

Scenario: Normalize product attributes for filtering

Input:

SKUAttributes
SKU001Color:Red|Size:Large|Material:Cotton

Flow:

[Products] → [Split: Attributes on |] → [Further processing]

Output:

SKUAttribute
SKU001Color:Red
SKU001Size:Large
SKU001Material:Cotton

Follow-up Transform:

AttributeName = SUBSTRING({Attribute}, 0, INDEXOF({Attribute}, ":"))
AttributeValue = SUBSTRING({Attribute}, INDEXOF({Attribute}, ":") + 1)

Order Line Items

Scenario: Expand order with product list

Input:

OrderIDProductsQuantities
O001P1,P2,P31,2,1

Note: This requires parallel splits or different approach.

Better data model:

OrderIDProductQuantity
O001P11
O001P22
O001P31

Email Distribution List

Scenario: Send individual emails from distribution list

Input:

CampaignIDRecipientsSubject
C001user1@co.com;user2@co.com;user3@co.comNewsletter

After Split:

CampaignIDRecipientSubject
C001user1@co.comNewsletter
C001user2@co.comNewsletter
C001user3@co.comNewsletter

Then: Each row triggers individual email send

Best Practices

Choose Appropriate Delimiters

When designing data:

  • Use delimiters unlikely to appear in data
  • Consider pipe (|) or double-pipe (||)
  • Document the delimiter used

Validate Before Splitting

Ensure data quality:

  • Check for unexpected delimiters
  • Verify field contains expected format
  • Handle edge cases

Consider Normalization

If frequently splitting:

  • Consider normalizing source data
  • One row per value from the start
  • Reduces processing complexity

Add Row Identifiers

After splitting, track originals:

  • Keep original ID
  • Add sequence number within group
  • Helps with debugging and tracing

Next Steps