Info
Content

Crawling in login areas, checkout processes and other restricted pages

Per default the crawler will only visit URLs on which the CMP-Code is present. This can lead to problems when it is not possible for the crawler to reach the (same) content as a normal user would reach (e.g. because the user logged into a login area, added products to his shopping cart or performed other tasks in order to get a different output from your website).

Using Basic Auth

Basic Auth is the most common authentication method for webservers (also known as ".htaccess authentication" or ".htaccess/.htpasswd login"). In order to allow the crawler to access password protected pages, you can setup the authentication via CMPs > Edit > Crawler Settings > HTTP-Authentication.

Using custom cookies

In the CMP settings you can set cookie authentication (CMPs > Edit > Show Crawler settings > Cookie-Authentication). This is a mechanism in order to tell the crawler to send cookies to the server (although these cookies might never exist elsewhere). Once the crawler visits the website, the cookies will be present and the website can read these. The website can then react in a different way than how it would if the cookies were not present (e.g. by allowing access to the crawler to a restricted area or showing certain content to the crawler that otherwise requieres other steps to be performed prior to the visit).

In order to setup cookie authentication, please insert one item per line where an item consists of domain:cookiename:cookievalue. Example:

mywebsite.com:mycookie:123
myotherwebsite.com:othercookie:let_me_in
a-third-website.com:authentication:crawler
a-third-website.com:token:bfe926da3fc1

Sending URLs to the crawler

This code is not necessary when you have the normal CMP-Code in your website!

In order to send URLs to the crawler (e.g. while in testing phase), you can add the following script to your website:

<script>
(new Image()).src = "https://delivery.consentmanager.net/delivery/addurl.php?id=XX&h="+encodeURIComponent(location.href);
</script>

Replace XX with the ID of your CMP. It will automatically collec the URLs where the script is installed and send them to the backend for crawling.

Back to top