Skip to main content

Rule box category Clean up values

Clean your product data using the Clean up values rule box category in Productsup.

Introduction

The category Clean up values contains all rule boxes that can help you detect and remove irregularities in your data. For example, you can filter GTINs, separate words, decode HTML and Base64, translate colors, remove consecutive whitespace, or remove whitespace at the beginning and end of the value. Besides that, you can detect and extract values using the rule boxes in this category.

clean_up_values.png

This category can be divided into several subgroups. In the following sections, you can find how to use the rule boxes of the Clean up values category:

Detect and extract information from an attribute

The rule box category Clean up values contains the rule boxes Detect and Extract Value, Detect Brand, and Detect Color. Using this rule box category, you can extract product information, such as brand or color, from one attribute into another. You can also detect and extract other desired values using lists.

Detect and Extract Value

You can find desired values in an attribute and extract them into another attribute. Using a Standard Inclusion/Exclusion list, the Detect and Extract Value rule box finds a match in a product's value and adds it to the attribute where you apply the rule box.

The platform goes through the terms on your list in the order you added them to the list. First, the platform looks for a match of the first term. If it finds a match for this term in a product's value, it extracts the match to the attribute where you apply the Detect and Extract Value rule box. Then, the platform moves on to the next product. Once it scans all products for a match of the first term, it moves on to the second term on your list to see if the products with no match of the first term contain a match of the second term. This process finishes when the platform has scanned all terms on your list. It may finish earlier if the platform finds matches for all your products before reaching the end of the list.

Note

Before adding and setting up this rule box, you must go to Lists from your site's main menu and create a Standard Inclusion/Exclusion list with the values you want to extract. See Exclude products with the Standard Inclusion/Exclusion list.

The order of the values in your list determines which values the platform adds to the attribute where you apply the rule box.

  1. Take the steps from Add a rule box to add the Detect and Extract Value rule box to the attribute where you want to add the extracted values.

    Detect and Extract Values
  2. In Input Column, select the attribute where you want to search for matches. This drop-down list shows attributes available at the previous stage.

  3. In Detection List, select the Standard Inclusion/Exclusion list you created for the values you want to detect and extract.

  4. In Mode, select how you want to add the extracted values in your current attribute:

    1. assign completely overwrites the current value with the extracted value.

    2. append adds the extracted value at the end of the current value.

    3. prepend adds the extracted value at the beginning of the current value.

  5. In Match Type, select how the platform should extract the values from your list:

    • term can occur anywhere takes the detected match from any place within the selected attribute's values, even if it is part of a word.

    • just entire word takes the detected match only if it matches an entire word within the selected attribute's values.

    • content starts with takes the detected match if it is at the start of the selected attribute's values, even if it is part of a word.

    • content starts with entire word takes the detected match only if it matches an entire word at the start of the selected attribute's values.

  6. In Case, select whether the matches should be case-sensitive:

    • insensitive ignores the case of text values when matching them to the values on the list.

    • sensitive checks if the case of the text values in the current attribute matches the case of the values on the list and extracts values only if the case matches.

  7. In On No Match, select what should occur if there is no match:

    • set Empty empties the value.

    • set Original Value doesn't change the current value.

  8. Select Save.

For example, you have a Standard Inclusion/Exclusion list with the following values:

  • purses

  • shoes and socks

  • sock

If a term on the list matches an entire word in the short description attribute values in the import stage, you want to extract it into the description_keywords attribute in the intermediate stage.

You add the Detect and Extract Value rule box to the description_keywords attribute and set it up as follows:

Input Column: short description, Detection List: the name of the Standard Inclusion/Exclusion list, Mode: assign, Match Type: just entire words, Case: insensitive, On No Match: set Empty

Here is the data in your attributes after applying the rule box:

short description (no changes)

description_keywords (before)

description_keywords (after)

socks, shoes

shoes and socks here

shoes and socks

it is a set of socks, pants, and shoes

The values socks, shoes and it is a set of socks, pants, and shoes match the value sock from the list, but the rule box doesn't extract them into the attribute description_keywords because they match a part of a word instead of an entire word.

Detect Brand

The Detect Brand rule box detects brands in one attribute and extracts them into another attribute. This rule box scans an attribute's values for a match of brand names stored in our database and adds the matches into the attribute where you apply the rule box.

  1. Take the steps from Add a rule box to add the Detect Brand rule box.

    Detect Brand
  2. Select the attribute that contains the brand information in the Column drop-down list.

  3. In Output, select what you want to do with the extracted brand values in your current attribute:

    1. assign completely overwrites the current value with the detected brand value.

    2. append adds the detected brand value at the end of the current value.

    3. prepend adds the detected brand value at the beginning of the current value.

    4. remove deletes the detected brand value from the current value.

  4. In On No Match, select what should occur if there is no brand match in your values:

    • leave old value doesn't change the current value.

    • set empty empties the value.

  5. Select Save.

For example, you have the following values in your title attribute and want to extract the brands mentioned in the titles into the brand attribute. You can achieve this with the following setup of the Detect Brand rule box:

Column: title, Output: assign, On No Match: leave old value

title (no changes)

brand (before)

brand (after)

Nike sneakers

Nike

acer laptop

Asus

acer

Gucci suit

Prada

Gucci

Detect Color

The Detect Color rule box detects colors in one attribute and extracts them into another attribute. This rule box scans an attribute's values for a match of color names stored in our database and adds the matches into the attribute where you apply the rule box.

  1. Take the steps from Add a rule box to add the Detect Color rule box.

    Detect Color
  2. Select the attribute that contains the color information in the Column drop-down menu.

  3. In Output, select what you want to do with the extracted color values in your current attribute:

    1. append adds the detected color value at the end of the current value.

    2. prepend adds the detected color value at the beginning of the current value.

    3. assign completely overwrites the current value with the detected color value.

    4. remove deletes the detected color value from the current value.

  4. In Language, select which language your colors are in. The options are German, English, Spanish, French, Italian, Dutch, and Thai. If you select all, the rule box detects color names matching colors in any of these languages.

  5. In On No Match, select what should occur if there is no color match in your values:

    • leave old value doesn't change the current value.

    • set empty empties the value.

  6. Select Save.

For example, you have the following values in your title attribute and want to extract the colors mentioned in the titles into the color attribute. You can achieve this with the following setup of the Detect Color rule box:

Column: title, Output: assign, Language: english, On No Match: leave old value

title (no changes)

color (before)

color (after)

Nike sneakers blue

blue

acer laptop silver

gray

silver

Gucci suit black

dark

black

Filter GTINs

Most exports require each product to have a valid product identifier, such as Global Trade Item Number (GTIN). GTIN is a unique and internationally recognized product identifier. The Filter GTINs rule box detects invalid GTINs via a checksum digit and deletes them.

  1. Take the steps from Add a rule box to add the Filter GTINs rule box.

    Filter GTINs
  2. Select one of the options in the Advanced Behavior drop-down menu to fix your values and make them valid GTINs:

    • Fill up to 14 digits adds one or more zeros at the beginning of the value until the value length reaches 14 digits.

    • Fill up to 13 digits adds one or more zeros at the beginning of the value until the value length reaches 13 digits.

    • Trim leading zeroes removes all zeros from the beginning of the value.

    • Leave unchanged keeps the original value.

  3. Select Save.

For example, you have the following values in your GTIN attribute. You want to remove all invalid GTINs and ensure all valid GTINs are 14 digits long. You can achieve this with the Filter GTINs rule box by selecting Fill up to 14 digits as the advanced behavior:

GTIN (before)

GTIN (after)

811571013579

00811571013579

198002236142

00198002236142

194743857492

The rule box checks if the GTINs are valid and adds zeros at the beginning of a value if the length of a valid GTIN is less than 14 characters. If a GTIN is invalid, the rule box empties the value.

Decode special characters and remove HTML tags

The rule box category Clean up values contains the rule boxes Fix and Decode HTML, Decode base64 content, and Remove HTML Tags. If you have pieces of code or HTML tags left in your data, you can convert them into readable text using these rule boxes.

Fix and Decode HTML

If your values have broken or encoded HTML tags, such as & or ß instead of & or ß, the Fix and Decode HTML rule box converts them into regular text characters.

  1. Take the steps from Add a rule box to add the Fix and Decode HTML rule box.

    Fix and Decode HTML
  2. Select Save.

For example, you have the following values in your category attribute and want to fix the broken characters. You can achieve this with the Fix and Decode HTML rule box:

category (before)

category (after)

Home & Garden > Linens & Bedding > Bedding > Blankets

Home & Garden > Linens & Bedding > Bedding > Blankets

Apparel & Accessories > Clothing > Activewear

Apparel & Accessories > Clothing > Activewear

Apparel & Accessories > Clothing > Underwear & Socks

Apparel & Accessories > Clothing > Underwear & Socks

Decode base64 content

If your values have broken or encoded base64 tags, the Decode base64 content rule box converts them into readable text. For example, you can convert VGV4dCBFeGFtcGxl into Text Example.

Note

To use the Decode base64 content rule box, you first should prepend the values in your attribute with base64:. See Append/Prepend Value.

  1. Take the steps from Add a rule box to add the Decode base64 content rule box.

    Decode base64 content
  2. Select Save.

For example, you have the following values in your category attribute and want to decode them from base64 into regular text. You can achieve this with the Decode base64 content rule box:

category (before)

category (after)

base64:SG9tZSAmIEdhcmRlbiA+IExpbmVucyAmIEJlZGRpbmcgPiBCZWRkaW5nID4gQmxhbmtldHM=

Home & Garden > Linens & Bedding > Bedding > Blankets

base64:QXBwYXJlbCAmIEFjY2Vzc29yaWVzID4gQ2xvdGhpbmcgPiBBY3RpdmV3ZWFy4oCL

Apparel & Accessories > Clothing > Activewear

QXBwYXJlbCAmIEFjY2Vzc29yaWVzID4gQ2xvdGhpbmcgPiBVbmRlcndlYXIgJiBTb2Nrcw==

QXBwYXJlbCAmIEFjY2Vzc29yaWVzID4gQ2xvdGhpbmcgPiBVbmRlcndlYXIgJiBTb2Nrcw==

Remove HTML Tags

The Remove HTML Tags rule box removes any leftover HTML tags, such as <h3> or <\br>, from your attributes.

  1. Take the steps from Add a rule box to add the Remove HTML Tags rule box.

    Remove HTML tags
  2. Select Save.

For example, you have the following values in the description attribute and want to extract text from these HTML bodies. You can achieve this with the Remove HTML Tags rule box:

description (before)

description (after)

<div id="productName" class="align-top product-name-container">
    <h1 class="product-name title"> Flat leather sandals with a bow</h1>
</div>

Flat leather sandals with a bow

<div id="productName" class="align-top product-name-container">
    <h1 class="product-name title"> High-heel sandals with a buckle</h1>
</div>

High-heel sandals with a buckle

<h4 class="text">Power Smoothie - Start the day with an energy boost.
</h4>
<br/>
5 reasons to buy it:<ul class="text">
<li>Gluten free,</li>
<li>Vitality,</li>
<li>Iron and Calcium,</li>
<li>100% organic,</li>
<li>Brazilian fruits: Acai (29%), Grape (25%), Mango (18%), Banana (17%), Pineapple (11%).</li>
</ul>
<br/>

Power Smoothie - Start the day with an energy boost. 5 reasons to buy it: Gluten free, Vitality, Iron and Calcium, 100% organic, Brazilian fruits: Acai (29%), Grape (25%), Mango (18%), Banana (17%), Pineapple (11%).

Add or remove spaces

The rule box category Clean up values contains the rule boxes Remove Consecutive Whitespace, Remove Spaces At Beginning and End (Trim), and Separate Words. If your attribute values have more or less space than necessary, you can use these rule boxes to clean up the text.

Remove Consecutive Whitespace

If you have more than one consecutive space character between words, the Remove Consecutive Whitespace rule box removes excessive space. The words should remain separated with only one space.

Tip

You can also remove or replace space using the Text Replace rule box. See Text Replace.

  1. Take the steps from Add a rule box to add the Remove Consecutive Whitespace rule box.

    Remove consecutive whitespace
  2. Select Save.

For example, you have the following values in your category attribute and want to remove the extra spaces. You can achieve this with the Remove Consecutive Whitespace rule box:

category (before)

category (after)

Home & Garden > Linens & Bedding > Bedding > Blankets

Home & Garden > Linens & Bedding > Bedding > Blankets

Apparel & Accessories > Clothing > Activewear

Apparel & Accessories > Clothing > Activewear

Apparel & Accessories > Clothing > Underwear & Socks

Apparel & Accessories > Clothing > Underwear & Socks

Remove Spaces At Beginning And End (Trim)

The Remove Spaces At Beginning and End (Trim) rule box removes unnecessary space characters at the beginning or end of your attribute's values.

  1. Take the steps from Add a rule box to add the Remove Spaces At Beginning And End (Trim) rule box.

    Remove spaces at the beginning and end
  2. Select Save.

For example, you have the following values in your description attribute and want to remove the extra spaces. You can achieve this with the Remove Spaces At Beginning and End (Trim) rule box:

description (before)

description (after)

The manufacturer uses only natural and hypoallergenic materials to produce these dog beds. Colors available: black, gray, white, green, yellow, and lilac. Size range: XS, S, M, L, XL, and XXL

The manufacturer uses only natural and hypoallergenic materials to produce these dog beds. Colors available: black, gray, white, green, yellow, and lilac. Size range: XS, S, M, L, XL, and XXL

The manufacturer uses only natural and hypoallergenic materials to produce these dog blankets. Colors available: brown, gray, green, and blue. Size range: XS, S, L, and XXL

Note

There is additional whitespace at the beginning and end of this value.

The manufacturer uses only natural and hypoallergenic materials to produce these dog blankets. Colors available: brown, gray, green, and blue. Size range: XS, S, L, and XXL

Separate Words

If the text in your values uses the camel case, where the second word starts with a capital letter without a space to separate it from the first word, such as brownShoes, the Separate Words rule box removes the camel case.

  1. Take the steps from Add a rule box to add the Separate Words rule box.

    Separate words
  2. Select Save.

For example, you have the following values in your description attribute and want to separate the words in the camel case. You can achieve this with the Separate Words rule box:

description (before)

description (after)

TheManufacturerUsesOnlyNaturalAndHypoallergenicMaterialsToProduceTheseDogBeds.

The Manufacturer Uses Only Natural And Hypoallergenic Materials To Produce These Dog Beds.

ColorsAvailable: black, grey, white, green, yellow, and lilac.

Colors Available: black, grey, white, green, yellow, and lilac.

Remove parentheses and translate colors

The rule box category Clean up values contains the rule boxes Remove Terms in Brackets and Translate Colors to let you remove all terms in brackets from your texts and translate your color values into different languages.

Remove Terms in Brackets

The Remove Terms in Brackets rule box deletes text in parentheses from your values. It removes the text in brackets and the brackets themselves.

  1. Take the steps from Add a rule box to add the Remove Terms in Brackets rule box.

    replace_between_brackets.png
  2. Select Save.

For example, you have the following values in your description attribute and want to remove all text in brackets. You can achieve this with the Remove Terms in Brackets rule box:

description (before)

description (after)

This is a great T-shirt, the very best (endorsed by world-famous jazz saxophonist Joshua Redman) one there is.

This is a great T-shirt, the very best one there is.

All reviewers note the high quality of the material ().

All reviewers note the high quality of the material .

Tip

If unneeded spaces remain in the value after applying this rule box, you can remove them with the Text Replace rule box by, for example, searching for . or , preceded with a space character and replacing them with . or ,. See Text Replace.

Translate Colors

The Translate Colors rule box detects colors in your values and translates them into English, German, French, Italian, Spanish, or Dutch.

Tip

If you need to translate more than just color values, you can use a dedicated data service. See Translate your feed with the Google Translate and Google Translate (Advanced) data services.

  1. Take the steps from Add a rule box to add the Translate Colors rule box.

    translate_colors.png
  2. In Source Language, select the language of your current values:

    • English

    • German

    • French

    • Italian

    • Spanish

    • Dutch

  3. In Target Language, select the language you want to translate your colors into.

    • English

    • German

    • French

    • Italian

    • Spanish

    • Dutch

  4. Select Save.

For example, you have the following values in your title attribute and want to translate the colors mentioned in the values into German. You can achieve this with the Translate Colors rule box by selecting relevant source and target languages:

title (before)

title (after)

Sandals - green - size 39

Sandals -grün- size 39

Sneakers - white - size 40

Sneakers -weiß- size 40

Clogs - purple - size 44

Clogs - lila - size 44