Twitter Adder SEO Software Master Your Webmaster WordPress Themes WordPress Themes

Robots.txt Optimization tips for bloggers

Share

The robots.txt file is used to control the crawlers activity on a website/blog. It will help you to keep some directories away from crawling while allowing some. For example if yu have two folders 1.Articles and 2.Javascripts – and if you wish to exclude Javascripts from crawling by robots, then you can command it on the robots.txt file.

A few basics about what the robots.txt file is -
- It is found in the root folder, Ex:-www.google.com/robots.txt
- It’s a text file and can be edited
- It is used to command the robots what to crawl and what not
- It is used to help the crawlers locate the sitemap on your site

If you are on blogger platform, then you can’t upload the robots.txt file. Panic not – there is another option which you can utilize. I’ll discuss it towards the end of this article. First let’s discuss a normal robots.txt implementation on a hosted site.

Implementing the Robots.txt file on a web-hosted site(WordPress)

Pre-requisites – I assume that you have a wordpress hosted site with Cpanel/FTP access.

- Find the file at your public_html folder. If it isn’t there, create a blank text document.

Excluding a folder from crawling by SE bots.
Suppose you don’t want Google to index one of your folders.
In the robots.txt file, you have to specify two things – which crawler agent(Google, Yahoo, MSN) do you want to keep out and – which folder/folders you want to exclude.

The general syntax to be written in the Robots file is this.

User-agent: *
Disallow: /yourfolder/

Here, user-agent:* means all search agents(Google,MSN,Yahoo etc).
/yourfolder/ restricts that folder from crawling. Note that the sub-folders will not be crawled too.

In order to keep all agents away from crawling ALL folders, use this code.

User-agent: *
Disallow: /

You can specify individual crawler agents with their names(replacing *) like google bot,lycra etc.If you are following a general command to all search engine crawlers, keep the * in the user-agent line.

Specifying a sitemap with the Robots.txt file
Due to the recent agreement with the major search engines, they have come up with a common command that they will follow to detect sitemaps from robots.txt file. The command is –

Sitemap: Sitemap url here

Robots.txt for Blogger users.

Blogger users cannot upload the robots.txt file instead, they can use the robots meta tag to control the crawling of bots on particular files.

These codes should be included in the HEAD section of the particular page template.(Enclosed in arrow brackets)

META NAME=”ROBOTS” CONTENT=”NOINDEX”

This command will not index the current page in which this code is included.

META NAME=”ROBOTS” CONTENT=”NOFOLLOW”

This command will not follow/parse the links present on the particular page where this code is present in the head section.Blogger users can use this option to their advantage when making posts.If you want every new page to be crawled by the bots, include the following code to head section of your blogger template.

meta name=”robots” content=”index, follow”

Happy driving the robots. :)

Stylish Wordpress Themes

Written by Mani Karthik

Blogger, Web / Social Media Enthusiast & SEO with Flip Media. I'm always on the learning curve. Love to meet new people, feel free to befriend me.

Follow Mani Karthik on Twitter Add Mani Karthik on Facebook

27 Responses

  1. Hi…
    This is regarding tk domain. I’m using tk domain to promote my blogspot.
    After I switchover to tk domain even my meta tag verification get failed. How can I do successful meta tag verification in tk domain websites? Is it wise to promote my blog with tk domain? what are the hurdles I may face in future? please let me know…

    I look forward to hearing from you ASAP.

    Thanx in advanz…

  2. by d way.. niz site man.. keep it up! :)

  3. Han

    Hi,

    I was wondering if you know how to control how googlebot crawls the individual pages of a blogspot site. I have one particular post that I want to exclude from google searches. Is there any way to do that? How does one edit the html for just one particular post to include the meta tag you have given above? I want the rest of my blog site crawled by the bots but just one particular post to be excluded.

    Thanks.

  4. Han,

    Add the following tag to your blogspot page.

    META NAME=”ROBOTS” CONTENT=”NOFOLLOW”

    This page won’t get indexed by the bots.

  5. Jamal,

    It’s definitely not a good idea to promote the .tk domain, because it’s a domain beyond your control. If you are on blogger. I suggest using the custom domain option. Try to get a domain, it’s only less than 10$ for a year.
    If you are using the .tk domain, go to your settings for the domain and it asks for relevant keywords and description to the domain, if you had skipped in during registration take time out to fill in the keywords and description field. The .tk domain will automatically create the meta tags for keywords and decription in the site which is spider crawlable.

    Having said that, it’s only an potion if you are keen to use the domain. I suggest you rush to godaddy right now. :)

  6. astin

    hi,
    My blogspot is not showing adsense ads from yesterday, i think something is blocking google adsense crawler to my blogspot. I dont know how to allow google adsense crawler to crawl my blog. Plz help me..

  7. But it still show the same message:
    User-agent: *
    Disallow: /search

    How will i fixed this problem.

  8. Just a question , I have seen your robots.txt but you are not blocking the Category and even RSS links.. Is there any reason for this.. Since this creates duplicate contents.

  9. Romba nalla Article.. Hi very nice article.. really working on my site..thankyou very much..

  10. Nice article .. but can you help me out in fixing my blog, in my blog not a single post is indexed my google..
    thanks in advance.

  11. hey plz check my blog once and see if u can optimize its perfomance http://pcsoftwarez.blogspot.com

  12. Hi Mani Karthik,

    my blog is yeeern.blogspot.com.
    i am having problem adding the feed to the google webmaster tools.

    How do i change my robots.txt?

    Thanks in Advance.

  13. hello when i submitted sitemap.. i got robot.txt error on label files. why is it so. is it because of blogger templates m using pls see my blog and help

  14. nice post, now i know how to put robot.txt to my blog. thanks

  15. will this make my each blog post to be indexed by googlebot?

  16. How do i change my robots.txt?
    i use arabid blog by name

  17. Restricted/blocked links!! How can I change the robot file.

  18. Great info…I used your indexing info on my site at http:www.gicleeprintingoncanvas.com

  19. I’ve been a year involved with blogger site but still don’t understand how to create robots.txt file on it. Now I need some of my posts not to be indexed. I’ve tried your tip above but doesn’t worked. One confused me is HEAD section of the page template. Which one? Why don’t you clarify with snapshots? The template of blogger work for all the pages, doesn’t it? Thank, anyway for the tip.

  20. I’m so confused all about seo i read a lot put can’t apply in my blog

  21. also i can’t change robots.txt which say

    User-agent: *
    Disallow / serch

    i use blogger

  22. My robots.txt file says:
    User-agent: Mediapartners-Google
    Disallow:

    User-agent: *
    Disallow: /search

    It was once ‘allow’. How did this happen?

  23. my blog is not accepted to publish google ads and its telling that ur blog is voilations of policies .so what can i do to get acceptence of adsense for ads publishing.

    and mine problem is ” robot.txt “.as u told to insert the .the template is not saving and showing as below

    Your template could not be parsed as it is not well-formed. Please make sure all XML elements are closed properly.
    XML error message: The element type “meta” must be terminated by the matching end-tag “”.

Leave a Reply