Googlebot - Cached vs. Last Visited
Vanessa Fox is fast becoming one of my favorite "Googlers", right up there with Matt Cutts. Not only is she a devout Buffy the Vampire Slayer fan, like myself - but she follows up with what she says, and to most people that's golden. Last week as guest host on WebMasterRadio.fm's Good Karma show, Vanessa and Matt talked about a lot of things that dealt with how Google crawls and displays search results, including how Google handles accented characters and words.
What I found a greater value was the conversation about how Google knows when a page has been changed or modified in any way, and how that relates to what's cached and how that's displayed in the SERPs (Search Engine Result Pages). Prior to the Good Karma show on 8/31 and to the posting on Google's Webmaster Central blog by Vanessa, things were a bit confusing for webmasters.
By watching traffic logs, webmasters can see the Googlebot visits a site, but what they can't see is what google's actually sucking down and reading every time it comes for a crawl. This leads to a bit of confusion from what Google's displaying in the SERPsas the last visited date. Vanessa gives the following example:
1. Googlebot crawls a page on April 12, 2006.
2. Our cached version of that page notes that "This is G o o g l e's cache of http://www.example.com/ as retrieved on April 12, 2006 20:02:06 GMT."
3. Periodically, Googlebot checks to see if that page has changed, and each time, receives a Not-Modified response. For instance, on August 27, 2006, Googlebot checks the page, receives a Not-Modified response, and therefore, doesn't download the contents of the page.
4. On August 28, 2006, our cached version of the page still shows the April 12, 2006 date -- the date we last downloaded the page's contents, even though Googlebot last visited the day before.
Well that's all changed now! Yes folks, this is a great example of a company listening to it's "customers" and changing things so the experience is better for everyone - all around.
So now - the cached date is going to reflect the last date the Googlebot visited your site. So in the example above, Google's now going to display the cached date as August 27, 2006.
For a full run down on how this works including the reading of what Google does with a 304 response from your server - check out the blog post on Google Webmaster Central Blog.








Comments