What is a robots.txt File and why do I need one ?
A robots.txt file is a file that is put on the root folder of your site server side to tell search engines what pages to index when the spider crawls your site and which pages or sections of your site that you do not want indexed and displayed on search results.
List of some Robots Names
(you will need these if you want to specify individual rules for different spiders)
Google - Googlebot
Google-image – Googlebot-Image
Lycos T-Rex
Alta Vista – Scooter
MSN – MSNBot
Yandex Yandex bot
Exalead – Exabot
A basic robots.txt file
Open notepad and copy and paste the example below. This will allow all robots to crawl and index all of your site.
User-agent: *
Disallow:
No spiders to crawl your site robots.txt file
If you are designing your site and dont want any of the pages to be indexed until you have completed it simply add a slash to disallow all spiders crawling your site, remember to remove the slash and upload it to the server once your site is live otherwise you will not be indexed.
User-agent: *
Disallow: /
Stop spiders crawling and indexing your CGI-BIN
To stop all spiders from indexing your cgi-bin. (Note that there is a forward slash at the beginning and end of the directory name, this means any files in that directory will not be indexed.
User-agent: *
Disallow: /cgi-bin/
Stop spiders crawling and indexing a page
If you are working on a section and you dont want it indexed for example if you are creating a new page called contact
User-agent: *
Disallow: /contact/
This will stop all robots from indexing your contact page
Stop Google indexing your images
To stop Google from indexing your images in a folder called images
User-agent: Googlebot-image
Disallow: /images/
Stop spiders indexing cgi-bin and google indexing your images
User-agent: *
Disallow: /cgi-bin/
User-agent: Googlebot-image
Disallow: /images/
So now you have the basics of what to do and how to allow or deny spiders crawling all or just part of your site with a robots.txt file