Detect image metadata with the Image Properties Crawler
Detect image metadata with the Image Properties Crawler in Productsup.
Note
In order to use the Image Crawler, you must:
Define a product identifier
Be the owner of the domain you are crawling
You should also:
Inform your website admin about the crawler
Align with your website admin to make sure the configuration is in line with the capability of the website
Be aware that the Image Crawler could slow down the performance of your site
Be aware that once the Image Crawler has started to run, pressing the cancel job button will not kill it
The Image Properties Crawler detects metadata of images, returning information such as the HTTP Code of the link (whether the image is reachable), the image type, and the image’s size.
A particularly popular use case is to use the crawler to check the availability of your image links by the HTTP Response of the link. From there, you can skip any unreachable image links.
The Image Properties Crawler detects metadata of images, returning information such as the HTTP Code of the link (whether the image is reachable), the image type, and the image’s size.
A particularly popular use case is to use the crawler to check the availability of your image links by the HTTP Response of the link. From there, you can skip any unreachable image links.
Adding the Image Properties Crawler
In order to add the Image Crawler, you should:
Navigate to your site
Navigate to Data Services on the left-hand tab
Click add service
Add the Image Properties Crawler service
Give the service a name (if desired)
Define a custom column prefix (if desired)
Select whether to use the service on the import or intermediate level
the level refers to where your image links are found
if these are in your import file, you can select import
if you first need to optimize/create your links before running the crawler, select intermediate
Click add
Select the column where your image links are, under the Image URL column
You can modify the user agent if desired
this is the name of the crawler which accesses your product links
you may wish to whitelist this user agent for access to your website
Select the number of crawlers accessing the link at the same time in concurrent crawlers
a lower number means that there is generally effect on your website’s performance
a higher number means that the crawling is generally completed more quickly
Under request timeout (seconds), you can set the response waiting time for the crawler
Set the interval at which a product should be recrawled under expires after (days)
the crawler will always crawl new or changed links
it will only recrawl links for new information at the interval you set
you can set this interval to be -1 if you want to recrawl products on every run
Select trigger during a refresh of the Data View if you want the crawler to run in this case
Click save
Once you’ve successfully crawled your image links by triggering a run of your site, you will receive new columns in the Platform. These columns will start with three underscores, followed by the column prefix you set:
service_imagecrawler_image_url: the image URL on the Productsup CDN
service_imagecrawler_http_code: HTTP status code response
service_imagecrawler_width: original image width
service_imagecrawler_height: original image height
service_imagecrawler_mime: original image mime type
service_imagecrawler_content_type: HTTP response content-type
service_imagecrawler_size_download: original image size in bytes
service_imagecrawler_total_time: total time fetching the image
service_imagecrawler_md5_url: MD5 checksum of the original image URL
service_imagecrawler_md5_image: MD5 checksum of the original image file

Edit an existing Image Properties Crawler
In order to edit settings for your Image Properties Crawler, you should:
Navigate to your site
Navigate to Data Services on the left-hand tab
Click on the settings wheel
Delete an existing Image Properties Crawler
In order to delete your Image Properties Crawler, you should:
Navigate to your site
Navigate to Data Services on the left-hand tab
Click on the settings wheel
Scroll to the bottom of the page and click remove this service