Clean and format your data
Clean and format your data in Productsup.
Before moving on to optimizing your data, the best practice would be to clean up and format your initial data. This could be to remove HTML tags, sanitize your text, or remove block capitals.
Dealing with whitespaces in your data
Your data may have too many - or indeed too few - whitespaces in between your words which you will wish to sort out.
Removing consecutive whitespaces
If you have more than one whitespace in a row, you can remove them using the remove consecutive whitespace box. One whitespace will always be left so that the words remain separated.
Add the remove consecutive whitespace box
This will turn this input:
Taupe Sandals - silver - leather - Size 39
Into the following output:
Taupe Sandals - silver - leather - Size 39
Note
You could also remove or replace set numbers of whitespaces using the text replace box
Removing trailing or leading whitespaces
If you have extra whitespaces at the beginning or end of your content, you can remove these using the remove spaces at the beginning and end (trim) box.
Add the remove consecutive whitespace box
This will turn this input:
Buy this great "Tailenders of the world unite" t-shirt!
Into the following output:
Buy this great "Tailenders of the world unite" t-shirt!
Separating words in camelCase
If you have words in camelCase (where the second part of the word starts with a capital letter) then you can split these words using the separate words box.
Add the separate words box
This will turn this input: brownShoes
into this output: brown Shoes
You can consider normalizing the capitalization using capitalization boxes.
Dealing with capitalization
You may wish to change the capitalization on certain data, such as by making every word start with a capitalized letter or every word in lowercase.
Removing block capitals
You can convert block capitals using the capitalize words box.
Add the capitalize words box
Select whether you would like to amend to
Convert uppercase: This Capitalizes The First Letter Of Every Word
Only first letter: This capitalizes the first letter of each sentence

Converting capitals to ‘sentence case’
To convert block capitals into ‘sentence case’, where only the first word in the sentence is capitalized, consider using the uppercase to human box.
Add the uppercase to human box
This will turn this input: THESE ARE LOUD NOISES. VERY LOUD NOISES.
into this output: These are loud noises. Very loud noises.
Converting to lowercase
To convert block capitals into all lowercase, consider using the lowercase box.
Add the lowercase box
This will turn this input: THESE ARE LOUD NOISES. VERY LOUD NOISES.
into this output: these are loud noises. very loud noises.
Spellcheck
You can use the spellcheck box to check for misspelled words and provide suggestions for how to correct the spelling.
Add the spellcheck box
Select the column where your input values should come from
Select the language in which your values are in
You can optionally choose to not spellcheck words shorter than a certain character length

Dealing with invalid GTINs
Most export channels require each product to have a valid identifier, and this is often a GTIN. You can use the filter GTINs box to detect and delete values that are not valid GTINs.
Add the filter GTINs box
Select one of the advanced behavior options - these will try and manipulate your data into valid GTINs before performing a final check if they are valid
Fill up to 14 digits: add one or more leading zeroes until the value reaches 14 digits
Fill up to 13 digits: add one or more leading zeroes until the value reaches 13 digits
Trim leading zeroes: remove all zeroes from the beginning of the value
Leave unchanged: the original value is kept
Decoding HTML and removing HTML tags
On occasion, you may have HTML tags or encoded information in your feed that you wish to decode and display in human-readable text.
Fixing and decoding HTML
If you have broken or encoded HTML, you can convert this to ‘normal’ characters using the fix and decode HTML box.
Add the fix and decode the HTML box.
When using this box, for example, you can turn
Input:
Fuß, Apple &, Banana
into
Output:
Fuß, Apple & Banana
Fix and decode base64
If you have broken or encoded base64 tags, you can convert them to ‘normal’ characters using the fix and decode base64 box.
Add the fix and decode base64 box.
Removing HTML tags
You can remove any leftover HTML tags from your data using the remove HTML tags box.
Add the remove HTML tags box.
When using this box, for example, you can turn
Input:
<h4 class="text">Power Smoothie - start the day with a smoothie.</h4><br/>
into
Output:
Power Smoothie - start the day with a smoothie.
Removing UTF-8 control characters
If your data contains UTF-8 control characters that you wish to remove, consider using the remove UTF-8 control characters box.
Add the remove UTF-8 control characters box.
Note
Should not all broken control characters be removed using the above box, you could also use the following regular expression in the preg replace box to remove them:
/[\x{0080}-\x{009F}]+/u
Detecting and extracting information from a column
It may be the case that information such as the brand or color of a product is only available within another column, such as the description of your products. If you wish to isolate this information and use it in another column, you can do so using the following boxes.
Detecting brands
To detect brands and extract them to another column, you can use the detect brand box.
Add the detect brand box.
Input the column in which the brand information is contained.
Choose the output method: whether you want to append, prepend, replace, or assign the brand you have detected.
Select what should occur if there is no match.
As an example, you may wish to extract the brand from the description column and append it to the title column.
Description | title |
---|---|
Nike PRE Montreal Racer Trainers grey upper material: leather and textile, internal material: textile, sole: high-quality synthetics, cover sole: textile Size: 30-36 | Sportswear PRE Montreal Racer Trainers |
After applying the box to the title column according to the screenshot below, your title would look like this:
Sportswear PRE Montreal Racer Trainers Nike

Detecting colors
To detect colors and extract them to another column, you can use the detect color box.
Add the detect color box.
Input the column in which the brand information is contained.
Choose the output method: whether you want to append, prepend, replace, or assign the color you have detected.
Select which language your colors are in.
Select what should occur if there is no match.

Detecting and extracting values based on a custom list
You can also detect and extract custom values and not necessarily brand or color. To do this, you first have to upload a custom list of values that you wish to extract.
Add a Standard Inclusion/Exclusion list with your custom values to extract.
Add the detect and extract value box.
Define the input column in which your information is contained.
Select the detection list which you previously uploaded.
Choose the mode, whether you want to append, prepend, replace, or assign the color you have detected.
Select the match type which defines where and how the values from your list should be matched.
Select which case you want to match the values in your list: sensitive or insensitive.
Select what should occur if there is no match.
