SEO Glossary Part 4: Cached Page to Comment Spam - Search Engine Optimization Directory

Fourth entry in the running SEO glossary, continuing alphabetically from “cached page” to “comment spam.”

Cached Page

A cached page is a stored snapshot of a webpage that a search engine or CDN keeps on hand to serve faster or as a fallback when the live version is temporarily unavailable. A search engine’s cached copy reflects the last successful crawl of that page, not necessarily what’s currently live, which makes cache timestamps useful for diagnosing how recently a page was actually crawled versus assuming it happens instantly. Comparing a cached, text-only rendering against the full live version can also expose JavaScript rendering problems, since content that only appears after client-side scripts run may be missing from the simpler cached view. Some pages never get cached at all, typically because a noarchive meta directive or an HTTP no-cache header explicitly blocks it. Google removed the public-facing “cached” link from standard search results in 2024, and by that September confirmed the cache: search operator itself no longer returns a working cached page at all, so checking a page’s cache status today means using Search Console’s URL Inspection tool instead.

Canonical URL

A canonical URL is the version of a page a site owner wants a search engine to treat as the authoritative one when multiple URLs show identical or near-identical content, declared through a rel="canonical" link element. Search engines treat canonical tags as a strong signal rather than an absolute rule, meaning a canonical hint can be overridden if other signals (like internal linking patterns or a redirect) point elsewhere. Every indexable page should carry a self-referencing canonical tag even when there’s no duplicate, as a baseline declaration of URL preference.

The mechanics of canonicalization, including how to handle URL parameters, session IDs, cross-domain syndication, and pagination, are covered in more depth in this site’s dedicated canonical URL and URL structure guides. This entry is intentionally a short definition rather than a full implementation walkthrough, since that ground is already covered elsewhere on this site.

Chrome

Chrome is Google’s web browser, launched in September 2008. It currently holds the largest share of global browser traffic by a wide margin, which matters for SEO in two practical ways: Chrome User Experience Report (CrUX) data, drawn from real Chrome users, feeds directly into Core Web Vitals as a ranking input, and Googlebot’s rendering engine is Chromium-based, typically running a version close to but not always identical with the latest stable Chrome release. Chrome DevTools also functions as a free, built-in environment for diagnosing rendering, performance, and console errors on any page, and its Lighthouse panel runs the same underlying audit engine that powers PageSpeed Insights, making it a convenient first stop for checking Core Web Vitals issues without leaving the browser.

Citation

A citation, in local SEO, is any mention of a business’s name, address, and phone number (commonly abbreviated NAP) across the web, whether or not that mention includes a clickable link. Structured citations from established business directories and platforms generally carry more weight than scattered, unstructured mentions, and consistency of the NAP details across sources matters more than the sheer number of places a business is listed. A citation without a link can still contribute to local search visibility, which is part of why citation building differs from conventional backlink-focused link building. Inconsistent NAP data, such as a business listed under a slightly different name, an old address, or a disconnected phone number on even a handful of prominent platforms, can create genuine confusion for local ranking systems trying to confirm a business is legitimate and correctly located, which is why a periodic citation audit across major platforms tends to be worth the time for any business competing in local search.

Click Depth

Click depth measures how many clicks away from the homepage a given page sits. Pages closer to the homepage tend to receive more crawl priority and more internal link authority than pages buried several levels deep, though the mechanism is really about internal linking proximity rather than a strict click count; a page technically four clicks deep in a rigid menu hierarchy but linked directly from the homepage’s body content behaves, for crawling purposes, much like a shallow page.

There’s no fixed, official Google threshold for how many clicks is “too many.” The commonly cited industry heuristic is a rough three-click guideline, but that’s a general practice convention, not a documented ranking cutoff, and claims of an exact number of clicks beyond which a page “may never get crawled” should be treated as unsourced. What’s better supported is the general pattern: excessive click depth compounds with limited crawl budget on large sites, making buried pages measurably less likely to be crawled and indexed promptly, even without a hard numeric rule. A practical way to reduce click depth without restructuring an entire navigation menu is adding contextual internal links from already-authoritative, well-linked pages directly to deeper content, effectively creating a shortcut that doesn’t require redesigning the site’s formal hierarchy.

Cloaking

Cloaking is a black hat technique that shows different content to search engine crawlers than to human visitors, typically by detecting the requester’s user agent and serving a version tailored to game rankings. This differs from legitimate practices like responsive design or approved dynamic serving, where the underlying content is substantively the same regardless of who’s requesting it. Because cloaking explicitly deceives crawlers about what a page actually contains, it’s treated as a serious guideline violation carrying real penalty risk. It’s worth noting that misconfigured CDNs or security systems can accidentally produce cloaking-like behavior, so it’s worth auditing server responses by user agent if a site has unexplained indexing problems. Manual actions for cloaking, once issued, generally require submitting a reconsideration request through Search Console after the underlying cause is fixed, and Google typically expects a clear explanation of what caused the discrepancy and what’s been changed to prevent it recurring.

CMS (Content Management System)

A content management system is software that lets people create, edit, organize, and publish web content without writing code directly, by separating content storage from page presentation through templates. CMS platforms vary widely in how SEO-friendly they are out of the box; some (like most mainstream blogging platforms) handle basics like clean URLs and meta tags well by default, while others require significant custom work. Headless CMS setups, which separate the content backend from the front-end display entirely, offer more flexibility but require more technical resources to implement correctly. CMS performance has a direct effect on Core Web Vitals scores regardless of how well content itself is optimized, and migrating between CMS platforms carries real risk of SEO disruption if URL structures change without proper redirects. Plugin and theme bloat is a common, quieter CMS problem: each additional plugin adds its own scripts and stylesheets, and a site accumulating years of installed-but-rarely-audited plugins often carries a meaningful, avoidable performance cost that shows up directly in Core Web Vitals scores.

Comment Spam

Comment spam is the automated or manual posting of irrelevant, promotional, or manipulative comments on blogs, forums, or other user-generated platforms, typically to plant backlinks. Google introduced the nofollow attribute in 2005 specifically to reduce the incentive for this kind of spam, since a nofollowed link doesn’t pass the same ranking credit a normal link does; most mainstream CMS platforms have applied nofollow to comment links by default ever since.

Modern comment spam persists less for direct SEO benefit, given nofollow’s dampening effect, and more for referral traffic or brand visibility. A combination of automated spam filters and CAPTCHA-style verification substantially reduces its volume, and platforms that host large amounts of user-generated content generally need both algorithmic detection and human moderation to keep pace.

How the main crawl-control mechanisms differ

Mechanism	Blocks crawling?	Blocks indexing?	Passes link equity?
<!–INLINECODE3–> disallow	Yes	No (URL can still be indexed without content)	No
<!–INLINECODE4–> meta tag / header	No	Yes	Yes (until removed from index)
<!–INLINECODE5–>	No	Consolidates to preferred URL	Yes, to the canonical target

Sources cited: Google Search Central: robots meta tag, noindex documentation, Google Search Central: canonicalization