ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products
Mozilla/5.0 (compatible; ImagesiftBot; +imagesift.com)
If you have any questions about ImageSiftBot or would like to opt-out of being crawled, please contact us by email at
support@imagesift.com
Does ImageSiftBot follow Robots.txt rules?
Standard directives in robots.txt that target ImagesiftBot are respected. For example, the following will allow ImagesiftBot to crawl all pages, except those under /private/:
User-Agent: ImagesiftBot
Allow: /
Disallow: /private/
ImagesiftBot also supports the crawl-delay directive in robots.txt files. It interprets the value as the minimum duration, in seconds, between the start of consecutive requests. For example, assume you have specified the following in your robots.txt file:
User-Agent: ImagesiftBot
Crawl-delay: 5
ImagesiftBot will split each day into 5 second intervals and issue at most one request to your domain inside each interval.
If there is no rule targeting ImagesiftBot, but there is a rule targeting Googlebot, then ImagesiftBot will follow the Googlebot directives. For example, ImagesiftBot will fetch all pages, except those under /private/ with the following robots.txt:
User-Agent: *
Disallow: /
User-Agent: Googlebot
Allow: /
Disallow: /private/
What information does ImageSiftBot save?
Along with images, ImageSiftBot saves the following information:
Host URL and text on the page
Alt text associated with image
How do we use this information?
Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images.