Clean and format your data
Clean and format your data feed using rule boxes in Dataflow or Data View in Productsup.
Introduction
Before optimizing your data feed for a specific export, you can use rule boxes in Data View or Dataflow to clean and format your product data.
In the following sections, you can find the rule boxes you can use to clean and format your data.
Prerequisites
Each rule box requires an individual setup process, but to find the needed rule box in Data View do as follows:
Go to Data View from the site's main menu.
Choose the Intermediate stage or the desired export from the drop-down menu on the left side of the upper ribbon.
Select Edit in the needed attribute name.
Select the Select an Option drop-down menu in the section on the right.
Proceed with the individual rule box setup steps in the sections below.
Add or remove whitespaces
If your attribute values have more or fewer whitespaces than necessary, you can use the following rule boxes to clean the text.
Remove consecutive whitespaces
If you have more than one consecutive whitespace between words, you can remove the excessive whitespace with the Remove Consecutive Whitespace rule box. The words should remain separated with only one whitespace.
Take the steps from Prerequisites.
Search for and select the Remove Consecutive Whitespace rule box.
Select Save.
Note
You can also remove or replace whitespaces using the Text Replace rule box. See Replace static text with the Text Replace rule box.
Remove whitespaces at the beginning or end of your values
To remove extra whitespaces at the beginning or end of your attribute's values, use the Remove Spaces At Beginning and End (Trim) rule box.
Take the steps from Prerequisites.
Search for and select the Remove Spaces At Beginning and End (Trim) rule box.
Select Save.
Separate words in camel case
If the text in your values uses the camel case, where the second word starts with a capital letter without a space to separate it from the first word, for example, brownShoes
, you can split them using the Separate Words rule box.
Take the steps from Prerequisites.
Search for and select the Separate Words rule box.
Select Save.
Change capitalization
You can change the capitalization of words in your attribute values as follows:
Capitalize all or only the first words in the string
Convert the uppercase to sentence case
Convert uppercase to lowercase
Convert lowercase to uppercase
Capitalize all or first words
To capitalize all or only the first word in the string:
Take the steps from Prerequisites.
Search for and select the Capitalize Words rule box.
Select the capitalization method in Makes Every First Letter Uppercase:
Convert Uppercase capitalizes each word in the string and converts uppercase letters within a word into lowercase. For example, it converts
only theFirst letter of each word
intoOnly Thefirst Letter Of Each Word
.Only first letter capitalizes the first letter of the string and converts all the other words into lowercase. For example, it converts
not all words. Only the first letter IN THE STRING
intoNot all words. only the first letter in the string
.LEAVE UPPERCASE makes all words capitalized. For example, it converts
All WordsIn the string
intoAll WordsIn The String
.
Select Save.
Convert uppercase to sentence case
The sentence case is a capitalization style in which only the first letter of the first word of a sentence is in uppercase, and all other words are in lowercase.
To convert the uppercase text, such as ALL WORDS SHOULD BE IN THE SENTENCE CASE
, into the sentence case, such as All words should be in the sentence case
, use the Uppercase to human rule box.
Note
The Uppercase to human rule box converts only words longer than three (3) characters.
Take the steps from Prerequisites.
Search for and select the Uppercase to human rule box.
Select Save.
Convert uppercase to lowercase
To convert all uppercase words, such as THESE WORDS
, into all lowercase words, such as these words
, use the Lowercase rule box.
Take the steps from Prerequisites.
Search for and select the Lowercase rule box.
Select Save.
Convert lowercase to uppercase
To convert all lowercase or capitalized words, such as these words
or These words
, into all uppercase words, such as THESE WORDS
, use the Uppercase rule box.
Take the steps from Prerequisites.
Search for and select the Uppercase rule box.
Select Save.
Remove invalid GTINs
Most exports require each product to have a valid product identifier, such as GTIN. Global Trade Item Number (GTIN) is a unique and internationally recognized product identifier. To detect and delete invalid GTINs, use the Filter GTINs rule box.
Take the steps from Prerequisites.
Search for and select the Filter GTINs rule box.
Select one (1) of the options in the Advanced Behavior drop-down menu to fix your values and make them valid GTINs:
Fill up to 14 digits adds one or more zeros at the beginning of the value until it reaches 14 digits.
Fill up to 13 digits adds one or more zeros at the beginning of the value until it reaches 13 digits.
Trim leading zeroes removes all zeros from the beginning of the value.
Leave unchanged keeps the original value.
Select Save.
Decode special characters and remove HTML tags
If you have pieces of code or HTML tags left in your data feed, you can convert them into readable text.
Fix and decode HTML
If you have broken or encoded HTML, such as &
, you can convert them into text, such as &
, using the Fix and Decode HTML rule box.
Take the steps from Prerequisites.
Search for and select the Fix and Decode HTML rule box.
Select Save.
Fix and decode base64
If you have broken or encoded base64 tags, you can convert them into readable text.
For example, using the Decode base64 content rule box, you can convert VGV4dCBFeGFtcGxlCg==
into Text Example
.
Note
To use the Decode base64 content rule box, first, you should prepend the values in your data feed attribute with base64:
. See Append or prepend a value.
To apply the rule box, take to following steps:
Take the steps from Prerequisites.
Search for and select the Decode base64 content rule box.
Select Save.
Remove HTML tags
You can remove any leftover HTML tags, such as <h3>
or <\br>
, from your data feed:
Take the steps from Prerequisites.
Search for and select the Remove HTML Tags rule box.
Select Save.
Remove UTF-8 control characters
To remove UTF-8 control characters, such as U+0080
, from your data feed:
Take the steps from Prerequisites.
Search for and select the Remove UTF-8 Control Characters rule box.
In the Replace with, select if you want to replace the value with whitespace or no space.
Select Save.
Note
If some parts of UFT-8 still stay unchanged, use the Preg Replace rule box with the following regular expression in the Search field:
/[\x{0080}-\x{009F}]+/u
Detect and extract information from an attribute
To extract the product information, such as brand or color, from one attribute into another, use the Detect Brand, Detect Color, and Detect and Extract Values rule boxes.
Detect and extract brands
To detect brands in one attribute and extract them into a separate attribute, use the Detect Brand rule box.
Note
The platform uses our database of brands.
Take the steps from Prerequisites.
Search for and select the Detect Brand rule box.
Select the attribute that contains the brand information in the Column drop-down.
In Output, select whether you want to append, prepend, remove, or assign the brand you have detected.
Note
The assign output means replacing the existing value with the brand name.
In On No Match, select what should occur if there is no match.
Select Save.
For example, you have the following values in your feed:
description | title |
---|---|
Nike PRE Montreal Racer Trainers grey upper material: leather and textile, internal material: textile, sole: high-quality synthetics, cover sole: textile Size: 30-36 | Sportswear PRE Montreal Racer Trainers |
To extract Nike
from the description
attribute and append it to the content of the title
attribute, apply the Detect Brand rule box to the title
attribute with the following settings:
Column: description
Output: append
On No Match: leave old value
As a result, your title
value should look like this: Sportswear PRE Montreal Racer Trainers Nike
.
Detect colors
To detect colors in one attribute and extract them to another attribute, use the Detect Color rule box.
Note
The platform uses our database of color names.
Take the steps from Prerequisites.
Search for and select the Detect Color rule box.
Select the attribute that contains the color information in the Column drop-down menu.
In Output, select whether you want to append, prepend, remove, or assign the color you have detected.
Note
The assign output means replacing the existing value with the brand name.
In Language, select in which language your colors are.
In On No Match, select what should occur if there is no match.
Select Save.
Detect and extract values based on a custom list
You can detect not only brand or color but also custom values and extract them from one attribute to another. First, in Lists, you must create a custom list of values you wish to extract and then add the Detect and Extract Value rule box in Data View or Dataflow.
Go to Lists from your site's main menu and create a Standard Inclusion/Exclusion list with your custom values to extract. See Exclude products with the Standard Inclusion/Exclusion list.
Take the steps from Prerequisites.
Search for and select the Detect and Extract Value rule box.
In Input Column, select the attribute with the necessary information.
In Detection List, select the Standard Inclusion/Exclusion list you created for your custom values.
In Mode, select whether you want to assign, append, or prepend the detected values.
In Match Type, select how the values from your list should match:
term can occur anywhere - The rule box takes the detected value that can occur anywhere in the selected attribute's column.
just entire word - The rule box takes the first entire word that matches the selected attribute's column and the list.
content starts with - The rule box takes the first word in the string, starting with the defined value.
content starts with entire word - The rule box takes the first word in the string if it is an entire word that matches the defined value.
In Case, select whether the matches should be case-sensitive.
In On No Match, select what should occur if there is no match.
Select Save.