Results 1 to 3 of 3

Thread: .htaccess authentication blocking robots.txt

  1. #1
    MOH
    MOH is offline
    Wannabe Geek
    Join Date
    Jul 2009
    Posts
    361
    Post Thanks / Like

    Default .htaccess authentication blocking robots.txt

    Working on a test site at the moment which previously accidentally got indexed, due to some fool removing the robots.txt
    I replaced it, successfully got the site deindexed, and to prevent anyone accessing it restricted access to it in the .htaccess to my ip only.
    Noticed yesterday that google webmaster tools is now showing a string of 403 errors for the site, as it's also getting a 403 for the robots.txt
    I've just changed the .htaccess to use simple authentication instead of ip blocking to restrict the site, and GWT is now giving me a 401 for the robots.txt (and presumably for the rest of the site once it gets around to crawling it again).

    While job is done in that no one can access the site, I'm sure a history of crawl errors won't help it in the long run after it goes live if I leave it like this.

    I'm sure there's a simple way of adding an exception in the .htaccess to allow access to robots.txt while keeping the authentication for everything else, but being half asleep on Monday morning I can't find it. Any suggestions?

  2. #2
    MOH
    MOH is offline
    Wannabe Geek
    Join Date
    Jul 2009
    Posts
    361
    Post Thanks / Like

    Default

    As usual, got it after posting. (Keep an eye out for my forthcoming site MOH.com, which will consist entirely of me asking silly questions and answering them shortly afterwards. Riveting stuff.)
    For future reference, took a few wrong turns, but this seems to do the trick. Within the .htaccess:
    Code:
    AuthName "Demo site - internal testing only"
    AuthUserFile <password file location here>
    Require valid-user
    
    <Files ".\robots.txt">
    AuthType None
    </Files>
    The odd thing about this is, it shouldn't work as far as I can see.

    "None" is listed as a valid option for AuthType for Apache 2.3, but not for 2.2.
    I'm on an Ubuntu server with Apache 2.2.11 - maybe I've got an updated version of mod_authn_file that includes it.

    Seems to work for me anyway.

  3. #3
    MOH
    MOH is offline
    Wannabe Geek
    Join Date
    Jul 2009
    Posts
    361
    Post Thanks / Like

    Default

    The above didn't actually work. It only seemed to because I had previously authenticated myself on the site, so when I clicked through from GWT I was able to see the robots.txt but Google failed again with a 401 a few minutes ago.

    For Apache2.2 or earlier, this should (hopefully) do it (the Satisfy any means the Allow from all will suffice, so there's no need for authentication):

    Code:
    <Files "robots.txt">
    Allow from all
    Satisfy any
    </Files>
    Last edited by MOH; 27-04-2010 at 01:41 PM. Reason: typo in Files line

Similar Threads

  1. Blocking by IP in Jobberbase?
    By blacknight in forum CMS and Content Management
    Replies: 0
    Last Post: 30-01-2010, 06:44 PM
  2. Mozilla is blocking microsoft's buggy firefox plugin..
    By stephen186 in forum General Chat
    Replies: 1
    Last Post: 22-10-2009, 10:14 AM
  3. Robots.txt
    By distressed in forum Search Engine Optimisation
    Replies: 16
    Last Post: 15-07-2008, 12:32 PM
  4. Spamhaus errors blocking my email?
    By ButtermilkJack in forum Server / Technical Administration Tips and Queries
    Replies: 3
    Last Post: 07-03-2007, 03:42 PM
  5. Bad Robots
    By Cormac in forum Webmaster Discussion
    Replies: 1
    Last Post: 17-10-2006, 03:31 PM

Visitors found this page by searching for:

htaccess authentication exception

htaccess basic auth exception

htaccess auth exception

htaccess auth exceptions

htaccess allow robots

.htaccess allow robots.txt

htaccess block robots.txt

allow robots.txt htaccess

robots.txt .htaccessblocking robotshtaccess robots.txtapache2 robots.txthtaccess authdisable robots.txthtaccess robots.txt allowhtaccess authtype exceptionhtaccess robots.txt exception.htaccess Robotsallow robots.txt in htaccesshtaccess robots.htaccess blocking robots.txthtaccess allow access to robots.txtapache auth exceptionapache disable robotsadd robots.txt code to htaccess

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •