Skip to main content

Clean and format your data

Clean and format your data feed using rule boxes in Dataflow or Data View in Productsup.

Introduction

Before optimizing your data feed for a specific export, you can use rule boxes in Data View or Dataflow to clean and format your product data.

In the following sections, you can find the rule boxes you can use to clean and format your data.

Prerequisites

Each rule box requires an individual setup process, but to find the needed rule box in Data View do as follows:

  1. Go to Data View from the site's main menu.

  2. Choose the Intermediate stage or the desired export from the drop-down menu on the left side of the upper ribbon.

  3. Select Edit in the needed attribute name.

  4. Select the Select an Option drop-down menu in the section on the right.

  5. Proceed with the individual rule box setup steps in the sections below.

Add or remove whitespaces

If your attribute values have more or fewer whitespaces than necessary, you can use the following rule boxes to clean the text.

Remove consecutive whitespaces

If you have more than one consecutive whitespace between words, you can remove the excessive whitespace with the Remove Consecutive Whitespace rule box. The words should remain separated with only one whitespace.

  1. Take the steps from Prerequisites.

  2. Search for and select the Remove Consecutive Whitespace rule box.

    Remove consecutive whitespace
  3. Select Save.

Note

You can also remove or replace whitespaces using the Text Replace rule box. See Replace static text with the Text Replace rule box.

Remove whitespaces at the beginning or end of your values

To remove extra whitespaces at the beginning or end of your attribute's values, use the Remove Spaces At Beginning and End (Trim) rule box.

  1. Take the steps from Prerequisites.

  2. Search for and select the Remove Spaces At Beginning and End (Trim) rule box.

    Remove spaces at the beginning and end
  3. Select Save.

Separate words in camel case

If the text in your values uses the camel case, where the second word starts with a capital letter without a space to separate it from the first word, for example, brownShoes, you can split them using the Separate Words rule box.

  1. Take the steps from Prerequisites.

  2. Search for and select the Separate Words rule box.

    Separate words
  3. Select Save.

Change capitalization

You can change the capitalization of words in your attribute values as follows:

  • Capitalize all or only the first words in the string

  • Convert the uppercase to sentence case

  • Convert uppercase to lowercase

  • Convert lowercase to uppercase

Capitalize all or first words

To capitalize all or only the first word in the string:

  1. Take the steps from Prerequisites.

  2. Search for and select the Capitalize Words rule box.

    Capitalize words. Makes Every First Letter Uppercase: LEAVE UPPERCASE
  3. Select the capitalization method in Makes Every First Letter Uppercase:

    • Convert Uppercase capitalizes each word in the string and converts uppercase letters within a word into lowercase. For example, it converts only theFirst letter of each word into Only Thefirst Letter Of Each Word.

    • Only first letter capitalizes the first letter of the string and converts all the other words into lowercase. For example, it converts not all words. Only the first letter IN THE STRING into Not all words. only the first letter in the string.

    • LEAVE UPPERCASE makes all words capitalized. For example, it converts All WordsIn the string into All WordsIn The String.

  4. Select Save.

Convert uppercase to sentence case

The sentence case is a capitalization style in which only the first letter of the first word of a sentence is in uppercase, and all other words are in lowercase.

To convert the uppercase text, such as ALL WORDS SHOULD BE IN THE SENTENCE CASE, into the sentence case, such as All words should be in the sentence case, use the Uppercase to human rule box.

Note

The Uppercase to human rule box converts only words longer than three (3) characters.

  1. Take the steps from Prerequisites.

  2. Search for and select the Uppercase to human rule box.

    Uppercase to human
  3. Select Save.

Convert uppercase to lowercase

To convert all uppercase words, such as THESE WORDS, into all lowercase words, such as these words, use the Lowercase rule box.

  1. Take the steps from Prerequisites.

  2. Search for and select the Lowercase rule box.

    Lowercase
  3. Select Save.

Convert lowercase to uppercase

To convert all lowercase or capitalized words, such as these words or These words, into all uppercase words, such as THESE WORDS, use the Uppercase rule box.

  1. Take the steps from Prerequisites.

  2. Search for and select the Uppercase rule box.

    Uppercase rule box
  3. Select Save.

Remove invalid GTINs

Most exports require each product to have a valid product identifier, such as GTIN. Global Trade Item Number (GTIN) is a unique and internationally recognized product identifier. To detect and delete invalid GTINs, use the Filter GTINs rule box.

  1. Take the steps from Prerequisites.

  2. Search for and select the Filter GTINs rule box.

    Filter GTINs
  3. Select one (1) of the options in the Advanced Behavior drop-down menu to fix your values and make them valid GTINs:

    • Fill up to 14 digits adds one or more zeros at the beginning of the value until it reaches 14 digits.

    • Fill up to 13 digits adds one or more zeros at the beginning of the value until it reaches 13 digits.

    • Trim leading zeroes removes all zeros from the beginning of the value.

    • Leave unchanged keeps the original value.

  4. Select Save.

Decode special characters and remove HTML tags

If you have pieces of code or HTML tags left in your data feed, you can convert them into readable text.

Fix and decode HTML

If you have broken or encoded HTML, such as &amp, you can convert them into text, such as &, using the Fix and Decode HTML rule box.

  1. Take the steps from Prerequisites.

  2. Search for and select the Fix and Decode HTML rule box.

    Fix and Decode HTML
  3. Select Save.

Fix and decode base64

If you have broken or encoded base64 tags, you can convert them into readable text.

For example, using the Decode base64 content rule box, you can convert VGV4dCBFeGFtcGxlCg== into Text Example.

Note

To use the Decode base64 content rule box, first, you should prepend the values in your data feed attribute with base64:. See Append or prepend a value.

To apply the rule box, take to following steps:

  1. Take the steps from Prerequisites.

  2. Search for and select the Decode base64 content rule box.

    Decode base64 content
  3. Select Save.

Remove HTML tags

You can remove any leftover HTML tags, such as <h3> or <\br>, from your data feed:

  1. Take the steps from Prerequisites.

  2. Search for and select the Remove HTML Tags rule box.

    Remove HTML tags
  3. Select Save.

Remove UTF-8 control characters

To remove UTF-8 control characters, such as U+0080, from your data feed:

  1. Take the steps from Prerequisites.

  2. Search for and select the Remove UTF-8 Control Characters rule box.

    Remove UTF-8 Control Characters
  3. In the Replace with, select if you want to replace the value with whitespace or no space.

  4. Select Save.

Note

If some parts of UFT-8 still stay unchanged, use the Preg Replace rule box with the following regular expression in the Search field:

/[\x{0080}-\x{009F}]+/u

Detect and extract information from an attribute

To extract the product information, such as brand or color, from one attribute into another, use the Detect Brand, Detect Color, and Detect and Extract Values rule boxes.

Detect and extract brands

To detect brands in one attribute and extract them into a separate attribute, use the Detect Brand rule box.

Note

The platform uses our database of brands.

  1. Take the steps from Prerequisites.

  2. Search for and select the Detect Brand rule box.

    Detect Brand
  3. Select the attribute that contains the brand information in the Column drop-down.

  4. In Output, select whether you want to append, prepend, remove, or assign the brand you have detected.

    Note

    The assign output means replacing the existing value with the brand name.

  5. In On No Match, select what should occur if there is no match.

  6. Select Save.

For example, you have the following values in your feed:

description

title

Nike PRE Montreal Racer Trainers grey upper material: leather and textile, internal material: textile, sole: high-quality synthetics, cover sole: textile Size: 30-36

Sportswear PRE Montreal Racer Trainers

To extract Nike from the description attribute and append it to the content of the title attribute, apply the Detect Brand rule box to the title attribute with the following settings:

Column: description

Output: append

On No Match: leave old value

As a result, your title value should look like this: Sportswear PRE Montreal Racer Trainers Nike.

Detect colors

To detect colors in one attribute and extract them to another attribute, use the Detect Color rule box.

Note

The platform uses our database of color names.

  1. Take the steps from Prerequisites.

  2. Search for and select the Detect Color rule box.

    Detect Color
  3. Select the attribute that contains the color information in the Column drop-down menu.

  4. In Output, select whether you want to append, prepend, remove, or assign the color you have detected.

    Note

    The assign output means replacing the existing value with the brand name.

  5. In Language, select in which language your colors are.

  6. In On No Match, select what should occur if there is no match.

  7. Select Save.

Detect and extract values based on a custom list

You can detect not only brand or color but also custom values and extract them from one attribute to another. First, in Lists, you must create a custom list of values you wish to extract and then add the Detect and Extract Value rule box in Data View or Dataflow.

  1. Go to Lists from your site's main menu and create a Standard Inclusion/Exclusion list with your custom values to extract. See Exclude products with the Standard Inclusion/Exclusion list.

  2. Take the steps from Prerequisites.

  3. Search for and select the Detect and Extract Value rule box.

    Detect and Extract Values
  4. In Input Column, select the attribute with the necessary information.

  5. In Detection List, select the Standard Inclusion/Exclusion list you created for your custom values.

  6. In Mode, select whether you want to assign, append, or prepend the detected values.

  7. In Match Type, select how the values from your list should match:

    • term can occur anywhere - The rule box takes the detected value that can occur anywhere in the selected attribute's column.

    • just entire word - The rule box takes the first entire word that matches the selected attribute's column and the list.

    • content starts with - The rule box takes the first word in the string, starting with the defined value.

    • content starts with entire word - The rule box takes the first word in the string if it is an entire word that matches the defined value.

  8. In Case, select whether the matches should be case-sensitive.

  9. In On No Match, select what should occur if there is no match.

  10. Select Save.