Boxes - Crawler

1. General instruction

Using the Data Crawler and the Crawler boxes is a for advanced users only. Feel free to contact our support team for questions or if you need assistance.

Before working with the _Crawler_boxes you need to manually copy elements from your webpage. The following example shows you what has to be done.

Example: We want to use the Element "Productsup in numbers" from our webpage.

  • Mark it and than right-click on it.

  • Click on "Element untersuchen" or in English on "Inspect element".

63bebfd0d211c6db0f2b5937b4d7359e.png

The following information is shown on your screen.

b6fa9b7eb92d0802509912f1a8c4dd7f.png

The line marked in blue displays the source code of the area on your webpage you highlighted before.

It i also possible the other way around: If you scroll in the source code, the data you have selected with your mouse (see above) will be marked in blue on the live webpage:

b126d9b36f106eb8061c05ca50747bbc.png

2. HTML Get Element by ID

If you want to use the box HTML getElementById you have to search for a column in the source code, where (the) "id=...." is in.

"id" will assign more than once in the source code. You have to choose the one that describes the element on your webpage you want to have (as explained above).

be3e66737b36e6e5acd3973bf5508b0f.png

When you've found that, copy the word(s) inside the quotes after "id=".

In this example it is "content".

0aeb8f794bc5af3db81a560336b206e2.png

Create a column where the output should be in. In this example we named the column "Output".

Add the box HTML getElemtById to that column and paste the copied value(s) into your box in Data Edit.

0ee46f72bb8977a89d7c92e1a593934a.png

In Data View the result is shown as follows.

59a5908ead9dfa943fc6af3d0a1ba317.png

To edit the data that appeared to specific information other boxes are used. The following examples show you how to work with them.

Example: We want to appear "Productsup in numbers" as a value for that column.

To achieve this the whitespaces and some words need to be removed. Use the Remove Consecutive Whitespace box to remove the consecutive Whitespace. Use the Remove HTML Tags to get rid of HTML tags.

If you want to extract a special text, you can use the box Split String. In this example you would have to use the box like this to get just the text "Productsup in numbers" as a result in the column:

1cc1e84ae433102f961500cc2528ad98.png

Final Output:

74f0b020378f20a4b366b8e908150877.png

This example may seem confusing on the first view. This output described above could have also ben solved by using the box Static Value.But it can be used for more complex scenarios. For example if you have the price per kg on your web page but not in the feed and you want to add that to your feed. It will have the same structure for each product, but in every cell there will be a different value that you may have to extract from additional text.

3. HTML Get Element by Tag Name

When using the box HTML getElementByTagName, you firstly need to search for a column in the source code, where (the) "name=...." is in. This is done as described above. "name" will assign more than once in the source code. You have to choose the one that describes the element on your webpage you want to have (as explained above).

Once you found that, copy the word(s) inside the quotes after "name=".

d81b002857a9b32df469ed0d702166d8.png
  • Add the box HTML getElemtByTagName to the column where the data should be in.

  • Paste the copied value(s) into the box in Data -Edit. In this example the value is "generator".

This box works similar to HTML getElementbyIdbox, it just uses the HTML tag instead of the name.

c29499f42d1d3c50cd5799b0b08e9620.png

4. HTML Get Element by XPath

If you want to use the box HTML getElementByXPath look for the row where the source code of the chosen element is in.Normally it should be marked in blue, because you chose it on the website before.Now right-click on the row and press "Copy XPath":

55cd8e198bc83d7cb866167ddb18495f.png
  • Create a column where the output should be in. In this example the column is named "Output".

  • Add the HTML GetElementByXpath to that column in Data Edit.

  • Paste the values into the box.

c0ff289c37c623930512e753f981c467.png

After using the box:

74f0b020378f20a4b366b8e908150877.png