How can we resolve automatically generated URL errors once and for all?
Today’s “Ask An SEO” question comes from Bhaumik from Mumbai, who asks:
“I have a question about auto-generated URLs. My company had previously used different tools to generate sitemaps. But recently we started creating them manually by selecting the necessary URLs and blocking the others in robots.txt.
We are currently facing an issue with more than 50 auto-generated URLs.
For example, we have a page called “keyword keyword” URL: https://url.com/keyword-keyword/ and we have another Knowledge Center URL page: https://www.url. com/folder/keyword-keyword .
In the coverage issues we see errors under the 5xx series that created totally new URLs something like https://test.url.com/keyword-keyword/keyword-keyword. We tried several ways, but we don’t get the solution for this one.
It’s an interesting situation you find yourself in.
The good news is that 5XX errors tend to resolve themselves, so don’t worry about that one.
The cannibalization problem you face is also more common than most people realize.
With e-commerce stores, for example, you can have the same product (or the same collection of products) appear in multiple folders.
So which is the official?
The same goes for your location in the B2B financial space (I removed your URL above and replaced it with “keyword keyword”.)
This is why search engines have created canonical links.
Canonical links are a way of telling search engines when a page is a duplicate of another and which page is the official page.
Let’s say you sell pink bunny slippers.
These bunny slippers have their own page, they are on sale, they appear in shoes, and also in pink.
The first URL above is the “official version” of the URL.
This means it must have a canonical link pointing to itself.
The other three pages are duplicate versions. So when you set up your canonical link, it should refer to the official page.
In short, you’ll want to make sure all four pages have rel=”canonical” href=”https://url.com/products/pink-bunny-slippers” as this will de-duplicate them for search engines.
Next, make sure to remove any duplicate versions of your sitemap.
A sitemap is meant to showcase the most important and indexable pages on your website.
You don’t want to include unofficial versions of a page, pages banned by robots.txt, and uncanonicalized URLs in your sitemaps.
Search engines don’t crawl your entire website every time – and if you send them to unimportant pages, you’re wasting your ability to properly crawl and index.
There is another situation that can occur here.
If site search is enabled, it can also create duplicate URLs.
If I type “pink bunny slippers” into your site search box, I’ll likely get a URL with the same keyword phrase in the URL – and also with parameters on it.
This would compound your problem, and your IT team would have to programmatically set canonical links to search results with a meta robots for noindex, follow.
Another thing to look for is: if I click on the pink bunny slippers page from the search result, these settings may remain.
If so, follow the same steps as mentioned above.
Using proper canonical links and ensuring that your sitemap does not contain unofficial pages will help solve the problem of duplicate pages and ensure that you don’t spoil a spider’s visit by making it crawl through the wrong ones. pages of your site.
Hope that helps!
Featured Image: Leremy/Shutterstock
Editor’s note: Ask an SEO is a weekly SEO tips column written by some of the top SEO experts in the industry, who have been handpicked by Search Engine Journal. Do you have a question about SEO? Fill out our form. You might see your answer in the next #AskanSEO post!