Extract data that uses regular expressions and split strings

Extract data that uses regular expressions and split strings in Productsup.

It may be the case that you have data where you need to match complex dynamic patterns in order to extract or replace information.

Two ways to match data that may differ across products could be regular expressions (regex) and split strings.

Regex

A regular expression is a sequence of characters that defines a search pattern. You can learn about how to create regex patterns from the linked website.

You can use regex in a number of boxes on the Platform.

Searching and replacing using regex

You can search using regex and replace all matches by using the preg replace box:

  1. Add the preg replace box

  2. Enter a valid regex in the search box

    • make sure to add a / either side of your expression

  3. Enter a replace term to be replaced if there is a match

In the below example, we match all numbers between 0 and 7 and replace them with the phrase: under 8.

replace.png
Searching and keeping only matched values using regex

You can also define a regex to search for values, and only keep the parts of your data that match your search expression. To achieve this, you can use the preg match box.

The preg match finds the first match from your data, according to an expression you give it. You can either return all of the match, or alternatively individual capturing groups.

  1. Add the preg match box

  2. Enter a valid regex

    • make sure to add a / either side of your expression

  3. If you want to return the entire match, assign 0

    • otherwise you can returning individual capturing groups by giving the number of the capturing group

For the below example, we would extract the value colour: white from the following text: This is the description of one of our products, details: colour: white, size: XS, brand: Nike.

match.png

If we use the assign function to grab specific capturing groups, an example could look like this:

Input: This is the description of one of our products, details: color: white, size: XS, brand: Nike.

Regex:

/(color\: [a-z]+).*(size\: [A-Z])/

If we assign 0, we would output: color: white, size: XS.

If we assign 1, we would output: color: white.

If we assign 2, we would output: size: XS.

Preg match all

The preg match all box works in the same way as the preg match box, but finds all matches and doesn’t just stop at the first one.

  1. Add the preg match all box

  2. Enter a valid regex

    • make sure to add a / either side of your expression

  3. If you want to return the entire match, assign 0

    • otherwise, you can return individual capturing groups by giving the number of the capturing group

  4. Define the delimiter for your matches and how they should be separator when returned

For example, we could use the preg match to return all numbers in the following input: +44 county code: 1844 individual dialing part: 123456

Regex:

/[0-9]+)/

Output: 441844123456

If we were to use the preg match box, we would only output 44, as the box stops after the first match.

Overwrite data if there is a match

You can search data using regex and if there is a match, overwrite the entirety of the current input with another value.

  1. Add the set value if match box

  2. Define the column you wish to search

  3. Enter a valid regex to search for

    • make sure to add a

      /

      either side of your expression

  4. Enter the value to overwrite if a match is found

  5. Define what should happen if no match occurs

set_value_if_match.png

Split strings

It may be beneficial to split a string to extract certain parts of it. This technique is most commonly used with lists or category paths.

Let’s look at the following scenario - you have a category path and you want to extract the first part so that you can use this for your main category information. Here, a split string could extract the main category part for each product:

Fender Bass > Dart > Dart Compressor

When using any split string box, you will need to define a splitter, an item from which you wish to start splitting, and how many items you want to retrieve.

Let’s use our above example to grab the first category value. Our category path is split by a >, and we wish to grab from the start (0) of the entire category path, returning 1 item only.

splitter

from

items

>

0

1

You can also use negative values in the from field to undertake the split in the opposite direction. This could be useful if you wish to grab the last item, but the number of items is not always constant. To grab the last two items from our example above, we would set the from value to -2 and the number of items to 2:

splitter

from

items

>

-2

2

Splitting a string

To use the split string method:

  1. Add the split string box

  2. Define your splitter

  3. Define the item from which you want to start

  4. Define how many items you wish to return

    • alternatively, you can leave this blank to return all items after the From item

split_string.png
Splitting a string and deleting the input data if there is no splitter

The split string for PLA box functions similarly to the split string box but will empty the entire attribute for that product if no splitter is found. This is designed so that you do not send incorrect data.

  1. Add the split string box

  2. Define your splitter

  3. Define the item from which you want to start

  4. Define how many items you wish to return

    • alternatively, you can leave this blank to return all items after the From item

Splitting and filtering a string

The split string & filter box works similarly to the split string box, with an added filter function that detects failures. These failures are skipped without being counted towards the split operations.

For example we may have the following category path Category1<Category2<BJKHBUB<Category3<Category4. The term BJKHBUB will be ignored because it is identified as an error term.

  1. Add the split string & filter box

  2. Define your splitter

  3. Define the item from which you want to start

  4. Define how many items you wish to return

Splitting a string and counting the number of items

You can use the split string and count items box to count the total number of items in your product’s current data.

For example, with the following category you could use the box to inform you that there are 4 items in total: Category1>Category2>Category3>Category4.

  1. Add the split string and count items box

  2. Define your splitter/separator