Google Site Index and Sitemap – Working together
| Tweet | Share |
Many a times, there have been questions posted up at the WebMasters forums, asking the solution for the discrepancy in the number of pages indexed from your website shown on the Google site index and the Webmasters console.
site:www.yoursite.com gives the number of pages indexed by Google on its regular index while the webmasters console shows the number of indexed pages in the dashboard based on your sitemap submitted.
According to this post by Charlene – there is nothing to worry about it nor is it a bug with Google.
The Google site index operator only displays you an approximate estimate of the number of pages from your site on the Google’s regular index. It isn’t accurate, nor it may have the recently added / new pages. Sometimes it may also have more pages than the sitemap, and this could be due to the additional pages not indexed on your sitemap but found through links.
So essentially, here are some possible conclusions we can make regarding the sitemap and google indexing patterns.
1 – A Sitemap is not an Order to Google
Submitting a sitemap can only help tell Google that – “These files ought to be indexed by Google. And here are they”.
While Google in all possibility will index them, one cannot make sure that they will be indexed, especially when there are lot of files and large sitemaps.
Google has its own ideas when it comes to finding out if a page from a website is worth keeping in its normal index or not, and submitting a sitemap is not a resolution to it.
Situations when Google might not index all the files on the sitemap.
- When there are files in the sitemap that have no external links/internal links at all .
- When there are pages that are strikingly similar and could be considered as duplicate content.
- Other reasons that tells Google that the pages aren’t relevant or technically correct.
2 – The Google site index operator is a hint of your general site health
Google says -
Think of the site operator as a quick diagnosis of the general health of your site in Google’s index. Site operator results can show you:
- a rough estimate of how many pages have been indexed
- one indication of if your site has been hacked
- if you have duplicate titles or snippets
So sitemaps might not be the surest way to get all the pages indexed on Google. Technically, getting relevant links to each page would be the surest way, but that’s going to be really hard a possibility, so the best solution is to do a mix- and – match of both.
To ensure that Google indexes all (maximum?) pages from your site -
1. Submit a sitemap with proper weightage assigned to each page/category.
2. Get relevant links to categories/subcategories and spread the google juice.
3. Remove possibilities of dangling pages
4. Keep the internal navigation sound so that Google can crawl them all properly.

thanks for the informtion
thanks really useful informations
I do a few things whenever I start a new blog to get google indexed. Normally my websites get indexed quickly. I have also written an article about it on my new blog
http://amitsharma.co.in/internet/how-to-get-google-index-your-wordpress-blog-in-just-few-days/
I just wish since Google know that say 10 submitted it picked 4 to index it would say which 4 . since it picked them and indexed them it should tell us geez. How hard is that to understand. Also since it didn’t pick the other 6 of the 10 then why should be displayed too in gwt. Thats not hard to understand either. It resided not to it should tell us why too period. guessing is a horrible way to be left with . Its also poor when you can put in a sitemap that did like 6of6 and then you add 2 more and it drops to 4of8 and then you put back the one that worked better and it drops more 3of6 . I really feel after trying so many things here over the last year that Google is just a pain and doesn’t work as well as it should.The fact is to get you page showing down in the 1-10 pages we do need a good sitemap. I was for a years with ones that were made wihtt he hosting and online. All were made with no-www and 0 index. as soon as I changed it to www. I got instant indexing 6of 6 . with www I get at least some . But with out its a guaranteed “0″
why should be displayed too in gwt/why shouldn’t it be displayed too in gwt
It resided/it decided
sorry