Article No. 42

Google Search Console Indexing Issues: A Guide to the Page Indexing Report

Abstract

The Page Indexing report (Search Console's central indexing dashboard, sometimes still called the Coverage report from its earlier name) groups every URL Google knows about into an indexed or excluded...

On this page

The Page Indexing report (Search Console’s central indexing dashboard, sometimes still called the Coverage report from its earlier name) groups every URL Google knows about into an indexed or excluded bucket, and for anything excluded, assigns one of a fixed set of exclusion-reason categories. Each category means something different, has a different real cause, and needs a different fix. This guide walks through what the report shows and how to act on each category. It isn’t a general troubleshooting guide; it’s specifically about reading and using this one report, and for topics that have their own dedicated deep-dive (robots.txt syntax, redirect mechanics, soft-404 detection, canonical tags), it points to those rather than re-explaining them here.

What the Report Shows and How to Read It

The report splits into two headline numbers: indexed URLs and not-indexed URLs, with the not-indexed side broken into named exclusion reasons, each showing a URL count and a trend line over time. Google’s own reference for these category definitions lives at support.google.com/webmasters/answer/7440203. Clicking into any category shows a sample of affected URLs (not necessarily the complete list on very large sites) along with the option to validate a fix once changes have been made.

A meaningful chunk of the value here is trend-reading, not just the current snapshot. A category that’s been flat for months isn’t urgent. A category climbing steadily, especially “Crawled – currently not indexed” or “Discovered – currently not indexed,” is worth investigating before it grows further, since it often signals something systemic (a template change, a content-quality issue affecting a whole section) rather than a handful of isolated pages.

“Discovered – Currently Not Indexed”

This means Google knows the URL exists, usually from a sitemap, an internal link, or an external link, but has not yet crawled it. This is a queue-priority signal more than a content-quality one: it usually means Google’s crawl-demand assessment put this URL lower in line, either because the site as a whole has more URLs than Google currently prioritizes crawling quickly, or because this specific URL doesn’t look important enough (weak internal linking to it, no external signals) to jump the queue.

A typical example: a large product catalog publishes 2,000 new SKUs in a single sitemap update. A meaningful share of them can sit in “Discovered” for days or weeks, not because anything is wrong with the pages, but because Google’s crawl demand for a site of that size hasn’t caught up to the sudden volume increase. That’s a different situation from a single new blog post on a small site sitting in “Discovered,” which more often points to weak internal linking (the post exists only in the sitemap, with no other page on the site actually linking to it).

Fix: strengthen internal linking to the affected URLs from pages Google already crawls often, confirm the URL is in an accurate, current sitemap, and if the volume is large, revisit whether crawl budget is genuinely a constraint for the site (a separate, dedicated topic, relevant mainly to very large sites).

“Crawled – Currently Not Indexed”

This is a meaningfully different situation from “Discovered,” and conflating the two is one of the most common mistakes in interpreting this report. “Crawled” means Google did fetch the page, evaluated it, and made an active decision not to add it to the index. That’s almost always a content-quality or duplication judgment rather than a queue-priority one: thin content, content that closely duplicates another page on the site or elsewhere on the web, or a page Google judged not valuable enough to index given everything else it already has on a similar topic.

Fix: this category responds to content changes, not technical tweaks. Substantively improving or consolidating thin pages, differentiating near-duplicate content, or, in some cases, accepting that a low-value page (an auto-generated tag page, for instance) genuinely doesn’t need to be indexed and removing it from the sitemap rather than continuing to request its indexing.

The distinction between these first two categories is worth restating in one sentence, since it’s the one most worth internalizing from this whole report: “Discovered” is a queue problem (Google hasn’t looked yet), “Crawled” is a judgment problem (Google looked and passed). Throwing internal links at a “Crawled – currently not indexed” problem, or rewriting content to fix a pure “Discovered” queue backlog, treats the wrong cause and won’t move the number.

“Duplicate Without User-Selected Canonical”

Google identified this URL as a duplicate of another page on the site, but the page itself either has no canonical tag or points to itself, giving Google no clear preference to follow, so Google picked a canonical version on its own.

Fix: add an explicit canonical tag to each URL in the duplicate set, pointing to whichever version should represent the group. The mechanics of canonical tag implementation, including cross-domain and parameter-based cases, are their own topic; the relevant action here is simply making sure every duplicate cluster has an unambiguous, explicit canonical rather than leaving Google to decide.

“Duplicate, Google Chose Different Canonical Than User”

A related but distinct category: the page does have an explicit, user-declared canonical tag, but Google evaluated the duplicate cluster and selected a different URL as canonical anyway. This happens when Google’s own signals (internal linking patterns, backlinks, historical indexing) point more strongly to a different URL than the one the site declared. It’s a signal worth taking seriously rather than dismissing, since it usually means the declared canonical isn’t actually the version Google (and by extension, users) treat as most authoritative.

Fix, in report context: check whether the URL Google selected is genuinely the better canonical; if so, updating the site’s own canonical tags to match resolves the disagreement. If the declared canonical really is correct, strengthening internal links and other signals toward it can shift Google’s selection over time, though there’s no guaranteed timeline for that.

“Soft 404”

Google is treating this URL as functionally a “not found” page even though it returned a success status code, usually because the content looks thin, generic, or “not found”-styled despite the 200 response. The GSC-report-context version of this: it shows up here as an indexing exclusion, distinct from an actual 404 status (which has its own separate category). The full explanation of what causes a soft 404 and how to resolve the content/status-code mismatch is covered in this site’s dedicated 404 management guide; the short version in this context is that the fix is either serving a genuine 404/410 for pages that are actually gone, or substantively building out the content if the page is meant to be live.

“Blocked by Robots.txt”

Google found a reference to this URL but was not permitted to crawl it under the site’s current robots.txt rules. Being blocked doesn’t guarantee the URL stays out of search entirely; if it’s linked to from elsewhere, Google can still list it based on external signals without having read its content, typically showing no snippet.

Fix, in report context: confirm whether the block is intentional. If the URL should be crawlable, adjust the robots.txt rule; robots.txt directive syntax, precedence, and testing are covered in this site’s dedicated robots.txt guide. If the block is intentional and the goal is actually keeping the URL out of the index (not just uncrawled), robots.txt alone isn’t the right tool for that; a meta robots noindex tag is.

“URL Marked ‘Noindex'” and the 4xx-Status Categories

A handful of other categories round out the report and are usually self-explanatory once named correctly. “URL marked ‘noindex'” means Google crawled the page, found a noindex directive in the meta robots tag or X-Robots-Tag header, and honored it exactly as instructed; this is usually intentional and not a bug, though it’s worth periodically auditing noindex’d URLs to confirm none of them were meant to be indexable. “Not found (404)” and “Blocked due to access forbidden (403)” report exactly what they say: the URL returned that status when Google requested it. “Blocked due to unauthorized request (401)” typically shows up when a staging or password-protected environment gets accidentally crawled, or when authentication requirements changed without updating the corresponding robots.txt or noindex rules for pages that should stay private. None of these require deep investigation beyond confirming the status code matches the page’s intended state.

“Server Error (5xx)” and “Page with Redirect”

Server error (5xx) means Google’s crawl attempt hit a server-side error. Isolated, occasional 5xx responses are normal under real-world server load; a persistent or climbing trend in this category signals a genuine infrastructure problem, since Google will pull back its crawl rate on a site that’s throwing frequent server errors, compounding the original issue.

Page with redirect means this specific, non-canonical URL redirects elsewhere, which is expected and correct behavior for a properly implemented redirect; it shows up here because the report is listing why the URL itself isn’t the indexed version, not because something is wrong. A related but distinct category, “Redirect error,” covers cases where the redirect chain is too long, loops, or is otherwise malformed. Both the mechanics of choosing the right redirect status code and the process for detecting and fixing chains and loops are covered in this site’s dedicated redirect guide; in the report itself, these categories are mainly useful for spotting redirect problems at scale rather than one at a time.

Realistic Timelines and What Request Indexing Does

Google’s general public guidance is that reprocessing a fixed URL, from validation request to a status update in the report, takes on the order of days to weeks, not hours. Requesting indexing on an individual URL through URL Inspection submits it into the crawl queue with elevated priority; it does not bypass the queue entirely or guarantee same-day reprocessing. For a handful of URLs, using Search Console’s “Validate Fix” option on a specific exclusion category triggers Google to recrawl the affected sample and update the report once that’s done, which is generally the right approach after a fix rather than requesting indexing on each URL individually.

One category of claim worth actively distrusting: any specific, invented benchmark for “typical” percentage of crawled pages that get indexed (a “60-80%” figure, for instance, has circulated widely without a citable Google source). Google has not published an official baseline like this, and the right amount of indexation depends entirely on a given site’s content quality and duplication level, not a fixed industry percentage. A site with a large volume of legitimately thin or near-duplicate pages (faceted navigation, auto-generated location pages with little unique content) should expect a lower indexed share than a site built entirely from substantive, differentiated pages, and neither number says anything about site quality on its own without that context.

It’s also worth being realistic about what “Validate Fix” actually confirms. A successful validation means Google re-evaluated the sample of URLs shown in that category and no longer finds the same problem; it does not mean every affected URL on a large site was individually reviewed, and a validation can pass on the sampled URLs while a smaller residual set with the same issue remains elsewhere. Spot-checking a few additional URLs outside the sample after a validation passes is a reasonable extra step on large sites.

How to Prioritize When You Have Multiple Issue Types

When several exclusion categories are showing meaningful URL counts at once, a reasonable priority order:

  1. Server errors first. A climbing 5xx trend actively degrades Google’s willingness to crawl the rest of the site, so it compounds every other problem on this list until it’s fixed.
  2. Blocked-by-robots.txt and redirect errors next, since these are usually simple, mechanical misconfigurations with a fast, low-risk fix once identified.
  3. Duplicate-canonical issues, since resolving them is mostly a matter of adding explicit canonical tags rather than producing new content.
  4. Crawled/Discovered – currently not indexed last, not because they’re unimportant, but because they typically require either content-quality work or internal-linking changes that take longer to implement and longer to show results after the fix.

This order isn’t arbitrary; it follows how much each category’s problem compounds the others. A site actively throwing server errors will show worse numbers across every other category too, since Google is crawling it less overall during that period, which makes it look like a broader indexing crisis than it actually is once the underlying server issue is fixed. Working top-down through this list, rather than jumping straight to whichever category has the highest URL count, tends to resolve the report faster overall, since fixing an upstream cause often quietly improves a downstream category’s numbers on its own.

The report’s real value is turning a vague sense that “some pages aren’t indexed” into a specific, named reason with a specific fix. Treating each category by its actual definition, rather than applying the same generic “improve content quality” advice across all of them, is what separates a fast fix from weeks of guessing.

Call Now Button