In an ideal world, there would only be one version of each page. Too bad the world is messy and the same content can exist at multiple locations on the same website and other websites. Years ago, a solution was adopted to help with these duplicate content issues: the “canonical link element,” better known as a “rel=canonical” or “canonical tag.”
How does a canonical tag help with SEO?
The canonical tag helps solve duplicate content issues by setting the preferred version of a page and passing signals such as links to the preferred version. The tag helps consolidate duplicate content caused by issues such as:
- HTTP and HTTPS
- www and non-www
- parameters and faceted navigation
- session IDs
- trailing slashes
- index/default pages
- alternate page versions such as m. or AMP pages or print versions
Where to add a canonical
Most SEOs are aware that you can use a canonical tag in the head section, such as:
<link rel="canonical" href="https://example.com/" />
What you may not realize is that a canonical tag can be set in the HTTP header as well, such as:
HTTP/1.1 200 OK
Link: <https://example.com/>; rel="canonical"
The canonical in the header can be used for any page, but the most common use case is for setting a preferred version for PDFs, as Google did when Dan Sharp hijacked Google’s SEO Starter Guide.
‘My canonical tag isn’t working’
The canonical tag is not a directive, meaning that it can be ignored. The canonical version is the version of the page that should be used in sitemaps, for instance, and having conflicting URLs in the sitemap or throughout your internal links can throw mixed signals. A canonical tag may also be ignored if the pages aren’t a close enough match.
Other things can go wrong, such as copying pages and not changing the canonical tag or leaving a placeholder in the canonical like “change me” or “replace me.” You should also use absolute — not relative — path URLs in the canonical to help avoid errors. These and self-referential canonical tags can cause multiple pages to basically tell search engines they are the preferred version, which doesn’t make sense. If the page has multiple canonical tags that are different, then Google will ignore both.
What will Google do if there are mixed signals such as those just mentioned? They will try to determine the best URL using various signals like the suggested canonicals, internal links and sitemap URLs, but there are other factors, too. For example, they may try to pick a shorter URL over a longer one or pick HTTPS over HTTP.
Google prefers HTTPS pages over equivalent HTTP pages as canonical, except when there are conflicting signals such as the following (per the Use canonical URLs page in Google’s Search Console Help documentation):
- The HTTPS page has an invalid SSL certificate.
- The HTTPS page contains insecure dependencies.
- The HTTPS page is roboted (and the HTTP page is not).
- The HTTPS page redirects users to or through an HTTP page.
- The HTTPS page has a rel=”canonical” link to the HTTP page.
- The HTTPS page contains a noindex robots meta tag
A rare case, but certainly one that can happen, is when coding errors cause the head section to end before it should. In this case, a canonical may actually be in the body content, where it isn’t respected by search engines. What’s worse is that this issue won’t be detected by most tools like Screaming Frog or Deep Crawl or even by viewing the source. Only by viewing the DOM (Document Object Model) itself — such as when using Inspect for Chrome Dev Tools — can this issue be identified.
For example, take a look at Home Depot’s canonical tag in the image below, and you’ll see that the head section has ended — and much of the content that appears in the head when viewing the source is actually in the body when viewing the DOM.
A canonical tag can easily be wrong for one small thing like a spelling mistake or trailing slash, especially when in a set, like with pagination or hreflang. In these sets, having a different page indexed than what is included in the tags will cause set of pages not to consolidate as they should, such as setting the canonical on page 2 of a paginated set to URL of page 1. Noindex tags and canonicals should also not be used together. I’ve seen instances where the canonical tag seemed to be passing the noindex to the preferred version.
Did you know canonical tags can be used across different domains? This is actually the preferred method to use when syndicating content. You should also canonicalize alternate versions of your website, such as mobile or AMP versions, back to the main. Even better, according to Google, you won’t need to change your canonical tags for the upcoming mobile-first index.
Canonicals wouldn’t be needed in an ideal world
While canonicals are useful for consolidating signals across multiple pages, remember that in the ideal world there is only one version. Consolidating pages with other methods like redirects is better in the long run, since you can hopefully get down to a single accessible version of a page.
Some opinions expressed in this article may be those of a guest author and not necessarily Search Engine Land. Staff authors are listed here.