Finding Duplicate content using inurl

Status
Not open for further replies.

kflanagan28

New Member
One of the sites I am working on suffers a lot from duplicate content due to the CMS. If I do site:domain.com and compare it to site:domain.com/* it would appear almost 90% of it resides in the Supplemental index. Although from what Google said in 2007 there is no Supplemental index. So I am not sure why that command is showing such a difference. Maybe I am reading it wrong ?

But back to the question. I was looking into this and made some stupid mistakes (it happens). I was looking at the dynamic content with the command:

site:domain.com inurl:dynamicparamater

This was showing me some results but most being buried in the omitted results. I took this to mean that dynamic parameter was causing duplicate content (it seems a good guess because page titles and meta descriptions were generic for these pages). I had been working on a nofollow/robots.txt strategy to help this site and think that may have caused me to go mental. So when I became rational I thought the reason results are in the omitted results is because Google is saying there are lots of similiar pages with the same URL.

Is that assumption correct and was I totally off using that command to find dup content being created by dynamic pages ?
 

kflanagan28

New Member
Nice post, good advice. I am more looking on info on how to find duplicate content when you are knee deep in it i.e. site with over 250k pages.

As I said, I was using the inurl command and grabbing dynamic parameters from the url. From this I was assuming the results in omitted results were being counted as dup pages. This was probably wrong. But as i said in the original post, using those commands would suggest a lot of the content is being marked as duplicate ...
 

kflanagan28

New Member
Hey

Thanks for the reply. But just to point out. I am not looking for ways to stop duplicate content as I know how to do this. I am looking at methods of finding duplicate content in a large site that produces a lot of dynamic content.
I got some feedback over on another forum that my initial theory on using the inurl paramater can be used to look for this.

Thanks for the replies ..
 

MickyWall

New Member
Hi,

They won't have the same URL, they'll all have the same title and meta description etc and thus not be unique.
Your obviously right that Google obviously wants to serve up unique quality pages with good targetted content.

The site I'm working on at the moment has alot of duplicate content issues and searching through Google towards the back end of the results and in GWT shows some of the duplicate issues.

Can you make the page titles and descriptions dynamic?
 

kflanagan28

New Member
Hey

Yeah for the most part I am looking to create unique page titles etc for each page. But in some cases it's simply not realistic. There are just parts of the site that do not need to be indexed as they add no real value in terms of organic traffic. They are always going to produce low quality pages. So I am going to add these to a robots.txt file.

Cheers for the reply
 

ericstan

New Member
without knowing the url of the website is very dificult to find the best way to find the duplicate content.
try to find the unique element/word/parameter that is found in the duplicate content you dont need and search for that, using inurl, intitle or just with quotes.
 
Status
Not open for further replies.
Top