Google cache insight via a bug (or is it)
Posted by Mercury Thread | Posted in Google | Posted on 14-05-2010
0
over the past couple of weeks I had a problem with one of my blogs the DNS and everything went to pot. So I’ve rest it up on a new host and as it had links pointing to it Google nicely spidered the domain with the empty installation of WordPress on it. No big deal – add some old content (no back up so I hacked through the cache pages) and the ping will kick in and get it spidered. At least that was the thought.
Googles cocked it up – no biggie just now but kinda weird results coming out which gives a little insight into something I know about and have seen before but not on one of my sites. So I’m having a look for the site name (“discover whisky”) and sub pages are higher than the homepage appear. A touch strange but not overly expected – but it had the new homepage last night when I pinged some stuff out. Que cera – more DC madness but for May 2010 not unusual.
But above my site in the SERPs is a listing for a different website! no reference to my site at all.
Discover Whisky Cache
Other Site Cache
The images are a wee bit small but you can see it’s the same page of content – and having a look Google has indexed loads of ‘em. But you will see the same domain is listed within the wee grey box at the top (you may need a telescope but the full siuze images are clearer if you click on the thumbs).
The kinda cool/dodgy/mistake in all of this is the way that Google appears to be indexing duplicate pages – I mean 100% duplicate pages. In that it’s giving every page of this sort the same URL reference string. Ever wondered what that funny string of random characters was on a cache result: strings like “q=cache:4m23wQS3MZAJ:www.discoverwhisky.co.uk” in this segment of the URL for the cache page? It’s a page reference (of some sort). And my whisky blog homepage has the same bloody reference as the other site: “q=cache:4m23wQS3MZAJ:kellyguimont.com”. So for example the URL http://webcache.googleusercontent.com/search?q=cache:4m23wQS3MZAJ:www.google.co.uk/ gives you the same cache page despite the domain being different.
It doesn’t matter what you change the domain to in this string it’s always the same page that will be returned within the cache page.
And so the reason for the issue becomes clearer – when trying to ascertain whats going on my domain has the links and this other one has the content ownership. The same issue you get when an affiliate steals your content whacks it on their site and get’s you kicked out of Google – or it could be a dodgy SEO company doing it to you by ways of a ‘negative seo’ campaign (something which one day I may go into at some length on this site).
On a number of cache requests since may day update I’ve seen the URL reference part of the string being wrong and no cache page being returned. So this may just be an extension of this issue.
Anyway I thought some folks may find this duplicate content indexintg issue kind of interesting or just a bit freaky. Any additional thoughts let me know in the comments.



