Transform data with regular expressions and split strings

Learn to transform your product data in Productsup using rule boxes that support regular expressions and split strings.

Introduction

To transform your product data and extract, display, split, replace, or remove parts of your attribute values, you can use rule boxes that support regular expressions and split strings.

Regular expressions

A regular expression (regex) is a sequence of characters that uses specific syntax or structure rules to define a search pattern. You can search your data for specific text pattern matches instead of exact text matches using a regex.

Example of a regex: /([A-Z])\w+/. If you run a search in a text and use this regular expression, you can find all words in the searched text that have upper-case letters from A to Z.

The platform lets you use regular expressions in the following rule boxes:

  • Preg Replace

  • Preg Match

  • Preg Match All

  • Set Value if Match (RegEx)

Preg Replace

You can use the Preg Replace rule box to search data with a regex and replace all the matches the platform finds with an entry of your choice:

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Preg Replace rule box.

    Use the Preg Replace rule box to find and replace all matches with a regex
  6. Enter a valid regex in Search. See RegExr to verify your regular expressions.

    Note

    Make sure to add a slash (/) at the start and end of your regular expression.

  7. In Replace, enter the term that should replace your regex matches.

The following examples use the Preg Replace rule box.

Value

Rule box

Search field input

Replace field input

Output

This food processor has 6 settings.

Preg Replace

/[0-7]/

under 8

This food processor has under 8 settings.

This food processor has 9 settings.

Preg Replace

/[0-7]/

under 8

This food processor has 9 settings.

Preg Match

If you have long-string values in an attribute and want the platform to display only specific parts of those values, you can use the Preg Match rule box with a regex.

The Preg Match rule box reviews the values in the chosen attribute looking to match the pattern you specified with a regex. When the platform finds a match in a value's string, it preserves the matching part and removes the rest of the string.

Note

The Preg Match rule box stops scanning a value's string as soon as it finds a first match and ignores the rest of the string.

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Preg Match rule box.

    Use the Preg Match rule box to find all regex matches and delete all non-matching results
  6. Enter a valid regex in RegEx. See RegExr to verify your regular expressions.

    Note

    Make sure to add a slash (/) at the start and end of your regular expression.

  7. In Assign, specify the number of the capturing group within your regex that you want the platform to extract.

    A capturing group is a sequence of characters within a regex enclosed in parentheses. For example, this regex /(color\: [a-z]+).*(size\: [A-Z])/ has two capturing groups: (color\: [a-z]+) and (size\: [A-Z]).

    Note

    Providing input in the Assign field isn't always necessary. It applies primarily to complex regular expressions with multiple capturing groups.

    For Productsup to extract and display only those parts of the values you specified in a relevant capturing group, you must provide input in Assign.

    1. To extract and display the data matching the entire regular expression, enter 0 or leave the field empty.

    2. To extract and display only the data matching the first capturing group, enter 1.

    3. To extract and display only the data matching the second, third, or fourth capturing groups, enter 2, 3, or 4, respectively.

The following examples use the Preg Match rule box.

Value

Rule box

RegEx field input

Assign field input

Output

These T-shirts are available in color: red, size: M.

Preg Match

/color\: [a-z]+/

0

color: red

These T-shirts are available in color: red, size: M.

Preg Match

/(color\: [a-z]+).*(size\: [A-Z])/

0

color: red, size: M

These T-shirts are available in color: red, size: M.

Preg Match

/(color\: [a-z]+).*(size\: [A-Z])/

2

size: M

Preg Match All

The Preg Match All rule box has the same functionality as the Preg Match rule box; it helps you extract and display specific parts of long-string values in an attribute. The difference is that the Preg Match All rule box makes the platform review the entire string of each value to find all matches within a value, while the Preg Match rule box stops scanning a value after the first match.

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Preg Match All rule box.

    Use the Preg Match All rule box to extract and keep the matches of the entire regex
  6. Enter a valid regex in RegEx. See RegExr to verify your regular expressions.

    Note

    Make sure to add a slash (/) at the start and end of your regular expression.

  7. In Assign, specify the number of the capturing group within your regex that you want the platform to extract.

    A capturing group is a sequence of characters within a regex enclosed in parentheses. For example, this regex /(color\: [a-z]+).*(size\: [A-Z])/ has two capturing groups: (color\: [a-z]+) and (size\: [A-Z]).

    Note

    Providing input in the Assign field isn't always necessary. It applies primarily to complex regular expressions with multiple capturing groups.

    For Productsup to extract and display only those parts of the values you specified in a relevant capturing group, you must provide input in Assign.

    1. To extract and display the data matching the entire regular expression, enter 0 or leave the field empty.

    2. To extract and display only the data matching the first capturing group, enter 1.

    3. To extract and display only the data matching the second, third, or fourth capturing groups, enter 2, 3, or 4, respectively.

  8. In Delimiter, define the character that should separate your matches in the output value after removing all unneeded info.

    The comma (,) is the default delimiter the platform uses if the field is empty.

Compare using the Preg Match All and Preg Match rule boxes to extract the phone number from the following value: country code: +44, county code: 1844, individual dialing part: 123456.

Rule box

RegEx field input

Assign field input

Delimiter field input

Output

Preg Match All

/(\+[0-9])*([0-9])+/

0

(space character)

+44 1844 123456

Preg Match

/(\+[0-9])*([0-9])+/

0

N/A

+44

The Preg Match All rule box returns a longer string because it looks for all matches within a value, while the Preg Match rule box stops after finding the first match.

Set Value if Match (RegEx)

You can use the Set Value if Match (RegEx) rule box to assign particular values to one attribute of a product based on the contents of another attribute.

For example, use the Set Value if Match (RegEx) rule box to get the price range attribute to contain information on whether a product is cheap or expensive based on the values of the price attribute. The Set Value if Match (RegEx) rule box lets you edit the attribute price range and pair it with the searched attribute price. The regex /\b1?\d\.\d\d$/ lets the platform search the price attribute for products that cost less than 20.00$. If the platform finds a product that costs less than 20.00$, it assigns the value cheap to this product in the price range attribute. If the platform discovers products that don't match the regex and cost 20.00$ or more, such products get the value expensive in their price range attribute.

Together with the input provided in the fields of the Set Value if Match (RegEx) rule box, this use case looks similar to the following:

Edited attribute

Initial value in the edited attribute

Rule box

Searched attribute

Value in the searched attribute

RegEx field input

Assign field input

Choice in handle no match

change to field input

Output value in the edited attribute

price range

10-15$

Set Value if Match (RegEx)

price

10.99$

/\b1?\d\.\d\d$/

cheap

assign

expensive

cheap

16-19$

19.99$

cheap

20-25$

20.00$

expensive

30-35$

32.49$

expensive

To add the rule box to an attribute:

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute that you want to edit.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Set Value if Match (RegEx) rule box.

    Search for regex matches in a different column and add these values to the column where you are applying the Set Value if Match (RegEx) rule box
  6. In Column, choose the attribute you want to search for regex matches.

  7. Enter a valid regex in RegEx. See RegExr to verify your regular expressions.

    Note

    Make sure to add a slash (/) at the start and end of your regular expression.

  8. In Assign, specify the value that the edited attribute should display if the platform finds a regex match in the searched attribute.

  9. In the handle no match drop-down menu, choose how the platform should treat the values of the edited attribute if there is a product with no regex match in the searched attribute:

    1. The leave unchanged option makes sure the values of the edited attribute stay the same.

    2. The assign option makes sure the platform changes the values of the edited attribute. Enter what value the platform should assign to products with no matches in the change to input field.

Split strings

Your attributes may contain category paths, lists, or other data types that involve multiple data points. You can use the platform's split string functionality to shorten the values of such attributes and preserve only the needed info.

Productsup lets you split strings with the following rule boxes:

  • Split String

  • Split String for PLA

  • Split String & Filter

  • Split String and Count Items

All Split String rule boxes follow a specific method of numbering data points within strings. This method supports both positive and negative numbers. For example, when applying any Split String rule box to the category path Animals & Pet Supplies > Pet Supplies > Dog Supplies > Dog Beds, the platform assigns the following numbers to each step of the category path:

Positive numbers

0

>

1

>

2

>

3

Category path

Animals & Pet Supplies

Pet Supplies

Dog Supplies

Dog Beds

Negative numbers

-4

-3

-2

-1

Using these numbers and the greater-than sign (>) in a Split String rule box, you can tell the platform which data points you want to retrieve and what character splits the string into data points.

Tip

You can use both positive and negative numbers of data points in a Split String rule box. However, if you need to retrieve a couple of data points at the end of a string, and the strings in your attribute don't always have the same number of data points, negative numbers are the advised option.

Split String

The Split String rule box analyzes the values of an attribute to extract and save only those parts of the values you need.

For example, you can use the Split String rule box to shorten a category path in the following scenario: out of the entire category path Animals & Pet Supplies > Pet Supplies > Dog Supplies > Dog Beds, you want only the first two data points Animals & Pet Supplies > Pet Supplies to remain in the value.

The Split String rule box can also let you extract information from a long text. For example, you can use two Split String rule boxes to extract product colors from this product description value: The manufacturer uses only natural and hypoallergenic materials to produce these dog beds. Colors available: black, grey, white, green, yellow, and lilac. Size range: XS, S, M, L, XL, and XXL. Together with the input provided in the fields of the Split String rule boxes, this use case looks similar to the following:

Rule box

Splitter field input

From field input

Items field input

Output value

1.

Split String

. Size range:

0

1

The manufacturer uses only natural and hypoallergenic materials to produce these dog beds. Colors available: black, grey, white, green, yellow, and lilac

2.

Split String

Colors available:

1

1

black, grey, white, green, yellow, and lilac

To add and set up the Split String rule box, follow these steps:

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Split String rule box.

    Use the Split String rule box to shorten strings in your values and delete unnecessary info
  6. Enter the character that splits your string into multiple data points in Splitter.

    Note

    The platform never displays the splitter in the output if you extract only one (1) data point from the value. To see the splitter in the output value, you need to extract at least two (2) data points.

  7. In From, enter the number of the data point from which the platform should start extracting information. The expected input format for this field is a digit.

    See the previously shown numbering example to identify the number of your desired data point.

  8. Enter how many data points you want the platform to extract from your strings in Items.

    The expected input format for this field is a digit. Only positive numbers are acceptable.

    Note

    If you leave Items empty, the platform extracts all data points of the string, starting with the data point specified in From.

Split String for PLA

The Split String for PLA rule box analyzes the values of an attribute to extract and save only those parts of the values that you need. The difference from the Split String rule box is that the Split String for PLA rule box empties the values of those products where the platform finds no splitter character to prevent you from sending incorrect data to your export channels.

Compare the output values of the Split String and Split String for PLA rule boxes applied to the same attribute:

Attribute with the rule box

Rule box

Value in the attribute

Splitter field input

From field input

Items field input

Output value in the attribute

Category path

Split String

Animals & Pet Supplies > Pet Supplies > Dog Supplies > Dog Beds

>

1

2

Pet Supplies > Dog Supplies

Apparel & Accessories > Clothing > Activewear

Clothing > Activewear

Arts & Entertainment --- Event Tickets

Arts & Entertainment --- Event Tickets

Split String for PLA

Animals & Pet Supplies > Pet Supplies > Dog Supplies > Dog Beds

>

1

2

Pet Supplies > Dog Supplies

Apparel & Accessories > Clothing > Activewear

Clothing > Activewear

Arts & Entertainment --- Event Tickets

(empty)

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Split String for PLA rule box.

    Use the Split String for PLA rule box to split strings and empty the values of such products where the platform found no splitter character
  6. Enter the character that splits your string into multiple data points in Splitter.

    Note

    The platform never displays the splitter in the output if you extract only one (1) data point from the value. To see the splitter in the output value, you need to extract at least two (2) data points.

  7. In From, enter the number of the data point from which the platform should start extracting information. The expected input format for this field is a digit.

    See the previously shown numbering example to identify the number of your desired data point.

    Note

    If you enter 0 in From, the platform starts extracting information from your first data point and, after finding no splitter character in the string, returns the entire string as if it consisted of one data point only.

    To have the Split String for PLA rule box empty the values that contain no splitter characters, specify any number in From but for 0 and negative numbers.

  8. Enter how many data points you want the platform to extract from your strings in Items.

    The expected input format for this field is a digit. Only positive numbers are acceptable.

    Note

    If you leave Items empty, the platform extracts all data points of the string, starting with the data point specified in From.

Split String & Filter

The Split String & Filter rule box analyzes the values of an attribute to extract and save only those parts of the values that you need. The difference from the Split String rule box is that the Split String & Filter rule box lets you limit the length of your resulting string.

For example, to split the string Animals & Pet Supplies > Pet Supplies > Dog Supplies > Dog Beds and extract the first three (3) data points, you can use both the Split String and Split String & Filter rule boxes. However, if your resulting string should be under 50 characters, there is a difference in applying these rule boxes:

  1. If you apply the Split String rule box, the resulting string is Animals & Pet Supplies > Pet Supplies > Dog Supplies, which is 52 characters long.

  2. If you apply the Split String & Filter rule box and set 50 characters as the maximum length of the resulting string, you get the output Animals & Pet Supplies > Pet Supplies, which is 37 characters long.

Note

The Split String & Filter rule box doesn't leave chunks of data points in the string when the resulting string reaches its maximum length. If there is a data point that doesn't fit into the character limit entirely, the platform removes it from the resulting string.

To add the rule box to an attribute:

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Split String & Filter rule box.

    Use the Split String & Filter rule box to split strings and limit the length of the output
  6. Enter the character that splits your string into multiple data points in Splitter.

    Note

    The platform never displays the splitter in the output if you extract only one (1) data point from the value. To see the splitter in the output value, you need to extract at least two (2) data points.

  7. In From, enter the number of the data point from which the platform should start extracting information. The expected input format for this field is a digit.

    See the previously shown numbering example to identify the number of your desired data point.

  8. Enter how many data points you want the platform to extract from your strings in Items.

    The expected input format for this field is a digit. Only positive numbers are acceptable.

    Note

    If you leave Items empty, the platform extracts all data points of the string, starting with the data point specified in From.

  9. In max Length, specify a character limit for your resulting string.

Split String and Count Items

The Split String and Count Items rule box analyzes the values of an attribute to split them into separate data points and count how many data points each value contains. The rule box uses a delimiter to count the number of data points within a string and overwrites the string with the resulting total.

For example, if you apply the Split String and Count Items rule box to this value: Animals & Pet Supplies > Pet Supplies > Dog Supplies > Dog Beds, the platform uses the greater-than sign (>) to count the number of data points in this string and overwrites this value with 4.

  1. Go to Data View from the site's main menu.

  2. Choose the needed export channel or the intermediate stage in the drop-down menu on the left.

  3. Select Edit in the column of the attribute where you want to apply the rule box.

  4. Select the Add Box drop-down menu.

  5. Search for and select the Split String and Count Items rule box.

    Use the Split String and Count Items rule box to let the platform count the number of data points (items) in your strings and output the result in the attribute
  6. In Please select Separator!, enter the character that splits your string into data points.