Extract product information from the Data Crawler

Extract product information in HTML from the Data Crawler in Productsup.

Once you have set up the Data Crawler, you will most likely wish to extract information from the HTTP code you crawled.

You can read the following article in order to learn about how to set up the Data Crawler.

Once you have mapped the ___service_datacrawler_data column containing the source code to another column, you can proceed to extract the relevant data from this.

Tip

Extracting from the Data Crawler is a technically advanced topic, so you may wish to ask a developer or Productsup support to help you out.

Extract data from HTTP code

To extract data from HTTP code, you first need to find the relevant data on your website.

  • Open up an example product page

  • Highlight the element you want to extract (i.e. a title)

  • Right-click and inspect element/inspect

Under the elements in your browser, you will see the source code for the element you highlighted. This will help you find the correct code in your HTTP code extracted by the crawler.

For example, the Data integration & standardization title on our website has the source code:

<div class="jss7 jss923">Data integration &amp; standardization</div>
data_int_and_stand_inspect.png

Once you have found the correct source code, there are a number of manners to extract elements.

HTML Get element by ID

If you have an element defined by an id, then you can extract it using the HTML Get element by ID box.

  1. Open up an example product page

  2. Inspect the desired element

  3. Make a note of the id of the element (in the below example it is content)

    id_element.png
  4. Add the HTML Get element by ID box to the connection between your crawler data and output column

    • for more information about adding boxes, click the (link)

  5. Enter the id you found into this box

  6. Click save

    element_id_box.png

Tip

After extracting the element using the above method, you may want to further specify the data needed. You could use methods to Extract text elements to do so.

HTML Get element by Tag name

If you have an element defined by an id, then you can extract it using the HTML Get element by ID box.

  1. Open up an example product page

  2. Inspect the desired element

  3. Make a note of the name of the element (in the below example it is generator)

    name_element.png
  4. Add the HTML Get element by Name box to the connection between your crawler data and output column

  5. Enter the name you found into this box

  6. You can which occurrence of the name you want to grab, in case it occurs more than once

  7. Click save

    element_name_box.png

HTML Get element by Xpath

If you want to extract an element via its Xpath, you can do so using the HTML Get element by Xpath box.

  1. Open up an example product page

  2. Inspect the desired element

  3. Find the Xpath of the element

    • highlight the element and find the source code using the inspect tool

    • right-click and click copy XPath

    xpath_element.png
  4. Add the HTML Get element by Xpath box to the connection between your crawler data and output column

  5. Enter the Xpath you found into this box

  6. You can which occurrence of the name you want to grab, in case it occurs more than once

  7. Click save

    element_xpath_box.png

Extract text elements

There are many other ways to extract elements, should the boxes above not fulfill your needs.

For example, you could use the following boxes: