Detect image metadata with the Image Properties Crawler

Detect image metadata with the Image Properties Crawler in Productsup.

Note

In order to use the Image Crawler, you must:

You should also:

  • Inform your website admin about the crawler

  • Align with your website admin to make sure the configuration is in line with the capability of the website

  • Be aware that the Image Crawler could slow down the performance of your site

  • Be aware that once the Image Crawler has started to run, pressing the cancel job button will not kill it

The Image Properties Crawler detects metadata of images, returning information such as the HTTP Code of the link (whether the image is reachable), the image type, and the image’s size.

A particularly popular use case is to use the crawler to check the availability of your image links by the HTTP Response of the link. From there, you can skip any unreachable image links.

The Image Properties Crawler detects metadata of images, returning information such as the HTTP Code of the link (whether the image is reachable), the image type, and the image’s size.

A particularly popular use case is to use the crawler to check the availability of your image links by the HTTP Response of the link. From there, you can skip any unreachable image links.

Adding the Image Properties Crawler

In order to add the Image Crawler, you should:

  1. Navigate to your site

  2. Navigate to Data Services on the left-hand tab

  3. Click add service

  4. Add the Image Properties Crawler service

  5. Give the service a name (if desired)

  6. Define a custom column prefix (if desired)

  7. Select whether to use the service on the import or intermediate level

    • the level refers to where your image links are found

      • if these are in your import file, you can select import

      • if you first need to optimize/create your links before running the crawler, select intermediate

  8. Click add

  9. Select the column where your image links are, under the Image URL column

  10. You can modify the user agent if desired

    • this is the name of the crawler which accesses your product links

    • you may wish to whitelist this user agent for access to your website

  11. Select the number of crawlers accessing the link at the same time in concurrent crawlers

    • a lower number means that there is generally effect on your website’s performance

    • a higher number means that the crawling is generally completed more quickly

  12. Under request timeout (seconds), you can set the response waiting time for the crawler

  13. Set the interval at which a product should be recrawled under expires after (days)

    • the crawler will always crawl new or changed links

    • it will only recrawl links for new information at the interval you set

    • you can set this interval to be -1 if you want to recrawl products on every run

  14. Select trigger during a refresh of the Data View if you want the crawler to run in this case

  15. Click save

Once you’ve successfully crawled your image links by triggering a run of your site, you will receive new columns in the Platform. These columns will start with three underscores, followed by the column prefix you set:

  • service_imagecrawler_image_url: the image URL on the Productsup CDN

  • service_imagecrawler_http_code: HTTP status code response

  • service_imagecrawler_width: original image width

  • service_imagecrawler_height: original image height

  • service_imagecrawler_mime: original image mime type

  • service_imagecrawler_content_type: HTTP response content-type

  • service_imagecrawler_size_download: original image size in bytes

  • service_imagecrawler_total_time: total time fetching the image

  • service_imagecrawler_md5_url: MD5 checksum of the original image URL

  • service_imagecrawler_md5_image: MD5 checksum of the original image file

image_crawler.png

Edit an existing Image Properties Crawler

In order to edit settings for your Image Properties Crawler, you should:

  1. Navigate to your site

  2. Navigate to Data Services on the left-hand tab

  3. Click on the settings wheel

Delete an existing Image Properties Crawler

In order to delete your Image Properties Crawler, you should:

  1. Navigate to your site

  2. Navigate to Data Services on the left-hand tab

  3. Click on the settings wheel

  4. Scroll to the bottom of the page and click remove this service