Article No. 42
How Googlebot Crawls and Renders Your Site
Abstract
Googlebot is the name for the family of crawlers Google uses to discover and fetch content for Search. In practice, one crawler does nearly all of the work today: Googlebot...
On this page
Googlebot is the name for the family of crawlers Google uses to discover and fetch content for Search. In practice, one crawler does nearly all of the work today: Googlebot Smartphone, which mimics a mobile user agent and has been the primary and, for the vast majority of sites, the only crawler used for indexing since Google completed its migration to mobile-first indexing.
That completion date is worth getting right, because a wrong version of it still circulates widely, including on this site until this rewrite. The rollout did not finish in March 2021. Google began rolling out mobile-first indexing in March 2018 (Google Search Central Blog, “Rolling out mobile-first indexing”), announced in May 2019 that it would become the default for newly-registered domains effective July 1, 2019, and confirmed the migration was substantially complete in October 2023 (Google Search Central Blog, “Mobile-first indexing has landed”). The final step, moving the small remaining set of desktop-crawled sites over to mobile Googlebot, took effect July 5, 2024 (Google Search Central Blog, June 2024). After that date, Googlebot Smartphone is effectively the crawler for all sites; Googlebot Desktop still shows up in logs occasionally for specialized tasks, but it is no longer the basis for indexing and ranking any site.
This guide covers Googlebot’s mechanics: what it is, how it decides what to crawl, how it renders JavaScript, and how to confirm a bot claiming to be Googlebot actually is. It doesn’t cover robots.txt syntax or the URL Inspection tool’s interface; those are their own topics.
What Googlebot Is
Googlebot isn’t a single fixed program, and it isn’t just one crawler either. Google operates several distinct bots under the broader Googlebot name, each serving a different purpose:
| Crawler | Purpose |
|---|---|
| Googlebot Smartphone | Primary crawler for virtually all web indexing today |
| Googlebot Desktop | Rarely used for general indexing now; still appears for specialized tasks |
| Googlebot Image | Crawls specifically for Google Images |
| Googlebot Video | Crawls specifically for video content and Google Video results |
| AdsBot | Crawls for ad quality checks, unrelated to organic indexing |
| Googlebot News | Crawls for Google News, where applicable |
For the overwhelming majority of technical SEO work, Googlebot Smartphone is the only one of these that matters day to day, but seeing an unfamiliar Google-associated user agent in server logs isn’t automatically suspicious; it may simply be one of these specialty crawlers doing its own job.
Whichever crawler is making the request, it’s built on an “evergreen” rendering engine, meaning it’s built on the same engine that powers current, stable Chrome, and that engine updates automatically as Chrome itself updates (Google Search Central Blog, “The new evergreen Googlebot”). Before this change in 2019, Googlebot rendered pages using a Chrome 41 engine that was years out of date and couldn’t handle a lot of modern JavaScript. Because the renderer is evergreen now, citing a specific hardcoded Chrome version number as “current” in written content is a mistake by design: whatever version is accurate today will be wrong within weeks. The reliable way to check current rendering behavior for a specific page is to look at the Rendering section of a live URL Inspection test, or to check a site’s own server logs for the user agent string Googlebot is currently sending.
How Googlebot Decides What and How Often to Crawl
Crawl frequency isn’t uniform across a site. Googlebot allocates more frequent crawling to URLs it judges to be higher-value or more likely to have changed, based on signals like internal link structure, historical update patterns, and how often past crawls have found meaningful changes. A homepage that changes daily and carries links from dozens of other pages on the same site might get crawled multiple times a day; a static About page with one internal link and no history of changes might go weeks between crawls on that same domain. This ties into the broader concept of crawl demand, which is one half of what determines a site’s overall crawl budget; the full strategy playbook for managing that at scale, relevant mainly to very large or fast-changing sites, is a separate topic from Googlebot’s mechanics themselves.
Two-Wave Indexing: HTML Crawl vs. JS Rendering
For pages that rely on JavaScript to load content, Google processes them in two passes. The first pass crawls and can index the raw HTML response, whatever content is present without executing any JavaScript. The second pass queues the page for rendering, where Googlebot’s Web Rendering Service actually executes the page’s JavaScript, much like a browser would, and Google updates its understanding of the page based on what that rendering produces.
The gap between these two passes isn’t fixed. Google has described it as ranging from a short delay to a much longer one depending on rendering queue load at the time, a concept Google’s own developer advocates have discussed publicly since introducing the “two waves of indexing” framing at Google I/O in 2018. For a JavaScript-heavy site, this means content that only appears after client-side rendering can take meaningfully longer to be reflected in Google’s index than content present in the initial HTML. Server-side rendering, or hybrid approaches that deliver meaningful content in the initial HTML response, sidestep this delay because Google can index that content during the first pass without waiting on the second.
A specific JavaScript execution timeout of “around 5 seconds” circulates widely in SEO content as if it were an official Google figure. It isn’t; it’s a commonly repeated industry estimate that Google has not confirmed as a current, precise number in its own documentation, and it shouldn’t be presented as one.
A practical example of why this distinction matters: a single-page application that renders its entire product catalog client-side, with the initial HTML response containing little more than an empty <div id="root"> and a bundle of JavaScript, gives Google almost nothing to work with during the first pass. Every product page on that catalog depends entirely on the second, rendering pass to be understood at all, which means the whole catalog’s indexing is gated behind rendering queue capacity in a way a server-rendered equivalent wouldn’t be. This is the practical reason JavaScript-heavy architectures get flagged as an indexing risk: not because Google can’t render JavaScript at all (it generally can), but because doing so adds a queue dependency that a simpler architecture avoids entirely.
Verifying Real Googlebot vs. Spoofed Bots
Because Googlebot’s user agent string is public, anything can claim to be Googlebot in its request headers, including scrapers and bad actors probing for vulnerabilities. Google publishes an official method for telling the two apart, based on DNS rather than the user agent string alone (Google Search Central, “Verifying Googlebot”):
- Run a reverse DNS lookup on the IP address that made the request, and confirm the resulting domain name ends in
googlebot.com,google.com, orgoogleusercontent.com. - Run a forward DNS lookup on that domain name, and confirm it resolves back to the same IP address the original request came from.
If both steps check out, the request is genuinely from Google’s infrastructure. This two-step process (not the user agent string, which is trivially spoofable) is the method Google itself points to, and it’s underused relative to how useful it is for anyone dealing with suspected bot abuse or trying to confirm Googlebot is actually reaching a given page.
A worked example: a server log shows a spike in requests from a user agent claiming to be “Googlebot/2.1” hitting /wp-login/ and other paths that have nothing to do with normal page crawling. Running a reverse DNS lookup on the source IP returns a hostname like 123-45-67-89.some-hosting-provider.example, not crawl-123-45-67-89.googlebot.com. That fails the check immediately, confirming the traffic is a scraper or bot spoofing Googlebot’s user agent string, not Google itself, and can be blocked or rate-limited at the server level without any risk of blocking real Googlebot traffic.
Practical Implications: Server Response Time and Monitoring
Server response time functions as a signal within Google’s crawl capacity calculations: a server that responds quickly and reliably allows Google to fetch more URLs within the same crawl window, while a slow or error-prone server causes Google to pull back its request rate to avoid adding load to a struggling system. Watching average response time and error rates in the Crawl Stats report over time is a more reliable way to catch a developing problem than waiting for a visible drop in indexed pages, since the crawl-rate throttling tends to show up first: a site whose average response time climbs from 400ms to 1,200ms over a few weeks is showing a server problem before it ever shows up as fewer indexed pages.
Understanding what Googlebot is and how it behaves is the foundation for the other technical SEO decisions on a site. The one check worth running now, not just filing away: pull a recent server log sample and confirm the requests claiming to be Googlebot actually pass the reverse-DNS verification above. That single check catches spoofed-bot traffic before it’s mistaken for real crawl activity, and it’s the concrete first move for putting this guide’s mechanics (robots.txt rules, sitemap structure, crawl budget management) to work.