Warning about robots.txt file

I happened on a thread in a search engine forum which said that some search engine spiders will not index a web site if it doesn't find a "robots.txt" file. I didn't have one on my site, so I decided to add one.

While I was adding the robots.txt file to the root of my domain, I included some "Disallow" entries to prevent indexing of some non-public subdirectories. These directories most likely would not have been indexed anyway, because I don't link to them from anywhere on the site.

HOWEVER, here is when I realized that putting these disallow entries in your robots.txt file actually creates security vulnerabilities for your site. Anyone with a browser can enter:

http://www.yoursite.com/robots.txt

and view your entries in the robots.txt file, INCLUDING a list of the directories you have marked as disallowed. You may not link to those directories, but someone reading the robots.txt file could type them into theor browser and get access to your hidden areas.

For instance, if your robots.txt file included:

Disallow: /logs/
Disallow: /members

snoopers could enter:
http://www.yoursite.com/logs/
or http://www.yoursite.com/members/

to potentially view your server logs or your members-only areas.

So how do you protect yourself? Simple. Make sure every one of your disallowed directories includes a default server file. On most servers those are: index.html, default.html, index.htm or default.htm. You should know which it is for your server, it's the file that loads when someone enters your root URL only:

http://www.yoursite.com

Your default files in the disallowed directories can be a redirect to the front of your site, a blank file, or a warning -- it's up to you.

Just make sure you protect those directories!

 

 

 

 

Top