robots.txt

Prutser

New Member
Hi
How will bing interpret robots.txt
when it returns a HTTP header status code 5xx ?
and when it returns a 404 or 4xx in general ?
I can't find a document that tells me

I knozw that Google will not crawl the site any further when the status of robots txt is returning a 500 message
And Goolge will interpret a 4xx like a permission to crawl everything

How is that in Bing ?
Is there any official documentation ?
Thanks
 

mneylon

Administrator
Staff member
They document it fully:
and also provide tools for testing it

I've no idea what you mean about HTTP status codes - they've got nothing to do with robots.txt
 

Prutser

New Member
hi again
Let me try to explain again:

When you access a file / url on the internet the server will return a http headers response code, correct ?
see List of HTTP status codes - Wikipedia

When a search bot crawls an url or site it will first read the robots.txt file to determine if crawling is allowed.
If robots.txt doesn't exist (status code 404) Goolge will interpret it like it is allowed to crawl everything on that site.
When the server returns a status 500 (internal server code) when it tries to access robots.txt then it will stop crawling the site.

How does Bing interpet a 404 and a 5XX status ?
 

mneylon

Administrator
Staff member
hi again
Let me try to explain again:

When you access a file / url on the internet the server will return a http headers response code, correct ?
see List of HTTP status codes - Wikipedia

When a search bot crawls an url or site it will first read the robots.txt file to determine if crawling is allowed.
If robots.txt doesn't exist (status code 404) Goolge will interpret it like it is allowed to crawl everything on that site.
When the server returns a status 500 (internal server code) when it tries to access robots.txt then it will stop crawling the site.

How does Bing interpet a 404 and a 5XX status ?
There's some information on how it handles this here Bing Webmaster Tools

It's not explicit, but it sounds like it'll handle it the same way that googlebot does
 

Prutser

New Member
I saw that page before but that is about pages that are crawled, not about robots.txt

I also suspect it might be the same but I like to have a document / proof.
I tried to find it in their help pages but did not find any written documentation from Google
 

mneylon

Administrator
Staff member
Well if it's a "compliant" bot then it should follow the protocols.
I'd setup sites in the Bing webmaster tools thing and see what happens.
However I'll admit that I've never really spent a lot of time on Bing stuff
 

Prutser

New Member
good idea of setting test sites.

> Well if it's a "compliant" bot then it should follow the protocols.
do you know if those protocols are documented somewhere ?
 

Prutser

New Member
I don't think that site is holding the information that I am looking for
Thanks anyway, I will try to test it all out and check other resources if there is somewhere written documentation.
 
Award-winning Mac antivirus and Internet security software
Top