Irish SEO,  Marketing & Webmaster Discussion

 
Make money - save the planet!

Google not accepting robots.txt rules

This is a discussion on Google not accepting robots.txt rules within the Search Engine Optimisation forums, part of the Online Marketing category; One of my sites has 141 pages indexed in Google in little over a fortnight. The site uses a shopping ...


Go Back   Irish SEO, Marketing & Webmaster Discussion > Online Marketing > Search Engine Optimisation

Register Forum Rules FAQDonate Members List Calendar Search Today's Posts Mark Forums Read


Notices

Reply

 

LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 10-09-2008, 02:34 AM
Cormac's Avatar
Cormac Moylan
 
Join Date: Jan 2006
Location: Baile Ath Cliath / Corcaigh
Posts: 1,247
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
Cormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to behold
Send a message via AIM to Cormac Send a message via MSN to Cormac Send a message via Yahoo to Cormac Send a message via Skype™ to Cormac
Default Google not accepting robots.txt rules

One of my sites has 141 pages indexed in Google in little over a fortnight. The site uses a shopping cart application which hooks up to Amazon and displays Amazon listings.

The shopping cart is powered by associate-o-matic which brings down a LOT of content from Amazon. I was concerned about duplicated content so I setup some modrewrite rules and I restricted indexing (robots.txt) of all URLS which contain a query string.

I tested this robots.txt file against a number of pages from my site via the Google Webmaster Console. Each and every time the robots.txt analyzer said that the page are restricted.

I permitted the inclusion of 14 entry pages via robots.txt and via a sitemap.xml file. These 14 entry pages are the only ones indexed in Yahoo.com. Yahoo has prevented indexing of the duplicated content (as it should do, well done Yahoo).

Google on the other hand has completely ignorned the robots.txt file and has indexed over a 100 pages of duplicate content which I said not to index.

In the Google Webmaster Console I have an alert stating that approx 250 URLs are restricted by robots.txt. But a lot of those 250 URLs are appearing in Google's index.

I can't understand why Google is doing this. Yahoo is playing ball and being correct by following my rules but Google is potentially lining me up for possible dup content issues further down the line.

Has anybody encountered any similar issues to that of mine? I can't disclose the URL at this time as the site is a work in progress.
__________________
blog | Geansaí Gorm - Written entirely in, awful, Irish! | Me on Blue Jumpers
*
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #2 (permalink)  
Old 10-09-2008, 09:52 AM
Wannabe Geek
 
Join Date: May 2006
Posts: 429
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
glengara is a splendid one to beholdglengara is a splendid one to beholdglengara is a splendid one to beholdglengara is a splendid one to beholdglengara is a splendid one to beholdglengara is a splendid one to behold
Default

You might find something on it here -

robots.txt:
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #3 (permalink)  
Old 10-09-2008, 12:59 PM
paul's Avatar
ninja SEO
 
Join Date: Dec 2006
Location: .de
Posts: 1,118
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
paul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud ofpaul has much to be proud of
Default

Did you always have a robots.txt present ?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #4 (permalink)  
Old 10-09-2008, 02:30 PM
Cormac's Avatar
Cormac Moylan
 
Join Date: Jan 2006
Location: Baile Ath Cliath / Corcaigh
Posts: 1,247
Nominated 0 Times in 0 Posts
TOTW/F/M Award(s): 0
Cormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to beholdCormac is a splendid one to behold
Send a message via AIM to Cormac Send a message via MSN to Cormac Send a message via Yahoo to Cormac Send a message via Skype™ to Cormac
Default

Yeah, had one from the start.
I'm hoping to see the pages drop out of the index in the next fortnight. If not then I'm going to have to get onto Google about this.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Reply

Tags
accepting, google, robotstxt, rules

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads

Thread Thread Starter Forum Replies Last Post
Robots.txt distressed Search Engine Optimisation 16 15-07-2008 12:32 PM
Googlebot blocked from robots.txt + sitemap.xml Cormac Search Engine Optimisation 8 17-04-2008 08:48 PM
Accepting Credit Card Payments paulocon E-Commerce 15 01-04-2008 03:28 PM
Bad Robots Cormac Webmaster Discussion 1 17-10-2006 03:31 PM
Accepting Payments on a Website Coby McNulty Webmaster Discussion 3 17-02-2006 09:47 PM


Sponsored links

Paid On Results


All times are GMT +1. The time now is 08:09 PM.


Powered by: vBulletin Version 3.7.3, Copyright ©2000 - 2008, Jelsoft Enterprises Limited.
Hosted in Ireland by Blacknight - Test your ISP |Irish Hosting Directory| Armchair.ie|Logo by Eden Web Design|Avatars by Afterglow |Latest Blog Entries | VPS HostingAd Management by RedTyger

Search Engine Friendly URLs by vBSEO 3.2.0