Import XML files - advanced settings

Import XML files using some advanced settings in Productsup.

What is an XML file?

XML is a versatile file format made up of nodes, which are in a tree structure at different depths. A node is a key that is found between the left and right arrow:

<items>

Here is an example of a typical XML file:

<items>
    <product>
        <title>Yellow Shirt</title>
        <size>Medium</size>
        <price>20 EUR</price>
        <old_price>30 EUR</old_price>
    </product>
    <product>
        <title>Blue Shirt</title>
        <size>Large</size>
        <price>25 EUR</price>
    </product>
</items>

You can import your raw XML files via URL or via a local upload.

As it is so versatile, the computer first needs to understand it, which it does by parsing it. To make sure the data is parsed into the platform correctly, you can specify XML settings in the data source setup.

Root nodes on an XML File

A root node in this context is the desired starting point for importing your products. Productsup will always scan for a root node automatically. If the parser doesn’t detect the right node, you need to insert it manually.

To add in these settings, navigate to content options and XML settings under the advanced settings of your data source. You can then add in the root node. For more information, see Import a file from an HTTP/FTP link.

root_node.png

An example of a ‘standard use case’ root node

In the standard use case, all child data belonging to the product comes under the root node.

In the below example, the root node is items. This would import everything that comes under the items node, creating one product for each product node.

<items>
    <product>
        <title>Yellow Shirt</title>
        <size>Medium</size>
        <price>20 EUR</price>
        <old_price>30 EUR</old_price>
    </product>
    <product>
        <title>Blue Shirt</title>
        <size>Large</size>
        <price>25 EUR</price>
    </product>
</items>

An example of a ‘non-standard use case’ root node

It could be the case that your XML file contains nodes that do not directly correlate to products and should be skipped.

<items>
    <name>Example name</name>
    <product>
        <title>Yellow Shirt</title>
        <size>Medium</size>
        <price>20 EUR</price>
        <old_price>30 EUR</old_price>
    </product>
    <product>
        <title>Blue Shirt</title>
        <size>Large</size>
        <price>25 EUR</price>
    </product>
</items>

If you set the root node as items here, you would also import the name node which is undesired. You can explicitly set it so that only the exact product node is imported here. To do this, you should set the root node as product!.

The exclamation point tells the parser to only import the exact root node you input.

An example of an ‘inheritance use case’ root node

Sometimes you may have an XML feed that has variations of the same product in it. In such a case, you may wish to assign common values to each variant.

<items>
    <product_list>
        <name>products UK</name>
        <title>Red Shirt</title>
        <price>20 GBP</price>
        <products lang="en">
            <product>
                <size>Medium</size>
            </product>
            <product>
                <size>Extra Large</size>
            </product>
        </products>
    <product_list>

    <product_list>
        <name>products US</name>
        <title>Blue Shirt</title>
        <price>30 USD</price>
        <products lang="en">
            <product>
                <size>Small</size>
            </product>
        </products>
        <products lang="es">
            <product>
                <size>Extra Small</size>
            </product>
        </products>
    <product_list>

</items>

For the above example, you want to import two products (red shirt and blue shirt) in four (4) different sizes.

Here you should enter the root node as items>product_list>products. The arrows inform the parser of the tree structure, going from least to most granular.

The resulting import data would similar to this:

size

product_@attributes_lang

product_list_name

product_list_title

product_list_price

Medium

en

products UK

Red Shirt

20 GBP

Extra Large

en

products UK

Red Shirt

20 GBP

Small

en

products US

Blue Shirt

30 USD

Extra Small

es

products US

Blue Shirt

30 USD

Use tags in the root node

Nodes can sometimes have a value inside the node itself. These values are referred to as “tag attributes”.

<products lang="en">

The tag attribute in the above example is en.

Tag attributes are imported with an at-sign (@).

For example, if you want to import only the Spanish product from the inheritance example XML, then set the rood node as products lang=es.

The parser will only import items that have this attribute inside the tag.

The resulting import data would similar to the following:

size

product_@attributes_lang

product_list_name

product_list_title

product_list_price

Extra Small

es

products US

Blue Shirt

30 USD

Use a sequence in the root node

You can import a certain instance of nodes by using a sequence.

For example, if you want to import only the US products from the inheritance example XML, set the root node as products #2. The parser will search for the second occurrence of the products node and import solely what it finds there.

The resulting import data would similar to the following:

size

product_@attributes_lang

product_list_name

products_list_title

product_list_price

Small

en

products US

Blue Shirt

30 USD

Extra Small

es

products US

Blue Shirt

30 USD

Max depth to be scanned for a root node

When searching for the root node, you can define the depth to which the parser searches. Everything deeper than the maximum depth will not be scanned. Adding in this information could optimize your processing time.

To add in these settings, navigate to content options and XML settings under the advanced settings of your data source. You can then input the max depth.

Note

An XML file always starts at a depth of 0.

max_depth.png

In the below example, the root node consists of the <products> node and is found at a depth of 3.

<items>                                     // depth 0
   <product_list>                          // depth 1
       <name>products UK</name>            // depth 2
       <title>Red Shirt</title>
       <price>20 GBP</price>
       <products lang="en">
           <product>                       // depth 3
               <size>Medium</size>         // depth 4
           </product>
           <product>
               <size>Extra Large</size>
           </product>
       </products>
   <product_list>

   ...

Bundle repeating nodes into columns

If a node appears multiple times under a product, the Platform will import them as separate columns. If this is not desired, you can bundle repeating nodes into one column, based on how many times they occur.

To add in these settings, navigate to content options and XML settings under the advanced settings of your data source. You can then input the threshold under bundle repeating nodes. You can also choose the delimiter to be used under bundle delimiter (the standard is comma).

bundle repeating nodes

In this example XML, we have multiple color variations per size.

<items>
   <product>
        <title>
        <size>Medium</size>
            <color>Yellow</color>
            <color>Red</color>
            <color>Blue</color>
            <color>Green</color>
    </product>
</item>

Per default, the platform will import the feed like this:

title

size

color_1

color_2

color_3

color_4

T-Shirt

Medium

Yellow

Red

Blue

Green

If you enter 4 into Bundle repeating nodes and set : as the bundle delimiter, then every attribute appearing at least four times will be bundled and separated by a colon. It will look something like this:

title

size

color

T-Shirt

Medium

Yellow:Red:Blue:Green

XML declaration

The XML declaration is a processing instruction that identifies the document as being an XML. All XML documents should begin with an XML declaration, such as:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
declaration.png

Adding in missing declarations

If your declaration is missing, you can input this in the pre-pend header row field.

You should navigate to content options and XML settings under the advanced settings of your data source to do so.

Replacing false declarations

If your declaration is incorrect, you can replace it in the replace XML declaration field.

You should navigate to content options and XML settings under the advanced settings of your data source to do so.

Repair broken data in your XML

You can fix some issues to do with broken data directly in the Productsup Platform.

You should navigate to content options and XML settings under the advanced settings of your data source to do so.

repair.png

Repair control characters

You may have broken UTF-8 control characters in your data source. If this the case, then you can tick repair control characters in order to ensure the parser does not break because of this.

Remove Document Type Declaration (DTD)

You may have a Document Type Declaration in your feed. This is a line that normally comes directly after the XML declaration.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">

If this the case, then you can tick remove DTD in order to ensure the parser does not break because of this.

Repair parent node

You may have a parent node that is not closed properly or is incomplete. If this the case, then you can tick repair parent node in order to ensure the parser does not break because of this.

Allow empty node

You may have nodes that are empty, but you still want the columns to be imported into the Platform. This would then import the node as the column name, and leave the value blank for the products. If this the case, then you can tick allow empty nodes in order to import them.

<items>
    <product>
        <title>
        <size>Medium</size>
        <color>Yellow</color>
        <placeholder></placeholder>
    </product>
</item>

Once enabling allow empty nodes, the imported data for the above example would look like this:

title

size

color

placeholder

T-Shirt

Medium

Yellow:Red:Blue:Green

XSL Transformations

Sometimes you may need an XSLT (Extensible Stylesheet Language Transformations) in order to have the product data from your XML file be represented in the way you wish.

Get in touch with support to enquire about having an XSLT created for you.

For data sources imported via the Feed URL or local upload data source, you can also create an XSLT and add it in yourself:

  1. Navigate to your site

  2. Navigate to data sources

  3. Click on the settings wheel of your data source you wish to add an XSLT for

  4. Click on the advanced settings tab

  5. Click on I/O Settings

  6. Click add on the transform XML with XSLT option under the available I/O settings section

    xslt_io.png
  7. Add in your XSLT under the XSL Template field

  8. Click Save