It is a very interesting response from John Mueller of Google on the topic of what to do with URLs as they may appear duplicated because of the URL parameter. For example, UTMs at the end of the URLs. He also said that he definitely doesn’t 404 those URLs, on which he doesn’t think no one would argue. On the other hand, he also said that you can see the rel=canonical as that was what it was made for. The kicker is that he said that it probably doesn’t matter in any way for SEO.
A user posted on Twitter:
Hello! I am new to the community but have been in SEO for 5 years. I started a new job as the sole SEO manager and am thinking about the crawl budget. There are 20k crawled but not indexed URLs compared to the 2k that is crawled and indexed- this is not due to error, but due to the high number of UTM\campaign specific URLs and 404 pages.
I was hoping to balance this crawl budget a bit by removing the UTM\campaign URLs from being crawled via robots.yxy and by turning some of the 404s off.
Can someone help me figure out if this could be a good idea \ and could potentially cause harm?
A page that doesn’t exist should return a 404. You don’t gain anything SEO-wise by making them 410. The only reason I have heard that I can follow is that it makes it easier to recognize accidental 404s vs. known removed pages as 410s.
For the whole conversation, you can check out the tweet on Twitter and read out the whole detailed conversation. It will help you to learn more about the canonical URL parameters.