SEO Glossary Part 3: Black Hat to Cache - Search Engine Optimization Directory

Third entry in the running SEO glossary, continuing alphabetically from “black hat” to “cache.” These eight terms don’t share a single theme beyond alphabetical proximity, so this entry is framed honestly as a glossary segment rather than a themed guide.

Black Hat

Black hat SEO refers to tactics that violate a search engine’s stated guidelines in an attempt to manipulate rankings, such as keyword stuffing, cloaking, participating in link schemes, or hiding text from users while showing it to crawlers. Penalties for these tactics range from a ranking demotion on specific pages to full removal from the index, depending on severity and whether the violation appears deliberate.

Detection methods evolve continuously, so techniques that worked at one point often stop working, sometimes abruptly, as search engines update their spam-detection systems. The practical case against black hat tactics isn’t only that they’re against the rules; it’s that the risk-reward math has gotten worse over time as detection has improved, while the technical debt and reputational cost of a manual action or algorithmic penalty can take far longer to unwind than the tactic took to implement.

The line between black hat and legitimate optimization isn’t always obvious from a single tactic in isolation; intent and scale usually matter more than the technique itself. Redirecting an old URL to a genuinely relevant new one is standard practice, while redirecting hundreds of unrelated expired domains to a single site to funnel their residual authority is the same underlying mechanism used manipulatively. Evaluating a tactic by asking whether it would still make sense if search engines didn’t exist, purely from a user’s perspective, is a reasonably reliable gut check for which side of that line a given approach falls on.

Blocklist

A blocklist, in the security and SEO sense, is a database of sites flagged for distributing malware, hosting phishing content, or otherwise violating platform guidelines, maintained by services like Google Safe Browsing and used by browsers, email providers, and other platforms to warn users away.

Browser warning screens for blocklisted sites do measurably suppress click-through, since most users see an interstitial warning and back out rather than proceed. There isn’t a specific, citable study establishing a precise percentage (a commonly repeated “95%+ of users abandon” figure has no locatable source and should be treated as an unverified claim, not a fact), but the directional effect, that most users don’t click past an active malware warning, is well supported by how these warnings are designed. Recovering from a blocklist listing requires fixing the underlying violation first, then submitting a reconsideration or review request through the relevant platform; simply removing the bad content without requesting review typically won’t lift the warning. Because a compromised site (one hacked to inject malware or spam without the owner’s knowledge) can end up blocklisted through no direct fault of its own content strategy, regular security monitoring is as much a part of protecting search visibility as any conventional SEO task, since a single successful hack can erase months of ranking progress almost overnight if it triggers a blocklist warning.

Bot

A bot is automated software that performs repetitive tasks without a human directly driving each action. In SEO contexts this includes search engine crawlers like Googlebot, which discover and index content; monitoring bots that check uptime or rank tracking; and malicious bots that scrape content or attempt attacks. Not all bots are beneficial, and treating every bot request identically in server logs makes it harder to separate legitimate crawling from abuse.

Googlebot respects robots.txt directives to manage what it crawls, and the concept of crawl budget, the number of pages a crawler will request from a given site within a timeframe, becomes relevant mainly for large sites where inefficient crawling can mean important pages get visited less often than they should.

Distinguishing bot traffic from human traffic in analytics matters for accuracy as much as for security. Most analytics platforms filter out known bot user agents automatically, but poorly configured tracking or unusual crawler behavior can still inflate pageview and session counts, which is worth checking whenever traffic numbers look implausibly high relative to other signals like conversions or server load.

Bounce Rate

Bounce rate is the percentage of sessions where a visitor views a single page and leaves without any further interaction recorded. A high bounce rate is often read as a sign of poor content match or a weak user experience, but that reading needs context: a blog post that fully answers a reader’s question in one page and gets closed can register the same “bounce” as a page a visitor abandoned in frustration after two seconds, and industry-typical bounce rates vary a lot by content type, with blog and reference content commonly running much higher than e-commerce or transactional pages.

Google Analytics 4 shifted its default reporting toward engagement-based metrics rather than bounce rate specifically, and Google has stated that Analytics engagement metrics are not used directly as a ranking factor, so bounce rate is better treated as a diagnostic for content and UX than as something to optimize purely to satisfy search engines.

A breadcrumb is a secondary navigation element showing a visitor’s location within a site’s hierarchy, typically formatted as a path like “Home > Category > Subcategory > Current Page.” Breadcrumbs represent structural hierarchy rather than the literal path a visitor took to arrive at a page, and they commonly appear in one of three flavors: location-based (showing where a page sits in the site structure), attribute-based (showing filters applied), or path-based (showing the actual click history).

Marking up breadcrumbs with BreadcrumbList structured data allows them to display directly in search result snippets, and consistent placement across all non-homepage pages helps both users and crawlers understand the site’s structure.

Broken Link

A broken link points to a resource that no longer exists or has moved, typically resulting in a 404 (not found) or 410 (permanently gone) response. Common causes include URL changes made without a corresponding 301 redirect, deleted pages with no replacement, typos in manually created links, and simple link rot on external sites that removed or restructured their own content.

Internal broken links are fully within a site owner’s control to prevent, generally by pairing any URL change with a redirect. External broken links, where a page you don’t control stops working, represent lost value that can sometimes be recovered through outreach asking the linking site to update the URL, or by monitoring and redirecting if the destination changes. A large volume of broken links can also reduce how efficiently a crawler works through a site, since crawl requests spent on dead URLs aren’t spent on live content.

Browser

A browser is software that retrieves and renders web content by interpreting HTML, CSS, and JavaScript. Different browsers use different underlying rendering engines (Chrome and Edge run on Blink, Safari runs on WebKit, Firefox runs on Gecko), which occasionally produces small differences in how a page displays or performs.

Global browser share shifts over time and should be checked against a live source rather than treated as fixed; as of 2026, Chrome holds roughly two-thirds of global all-device share, Safari sits in the high teens (driven heavily by iOS), Edge is in the mid-single digits, and Firefox has fallen to roughly the low single digits, according to StatCounter’s tracking. Because Google’s rendering infrastructure for crawling is Chromium-based, Chrome’s behavior is the practical testing baseline for most sites, though cross-browser testing on Safari in particular remains worthwhile given its concentration among iOS users.

Cache

A cache is temporary storage of web content, whether on a browser, a CDN, or a server, that allows faster delivery on subsequent requests instead of rebuilding or re-fetching content from scratch every time. Caching applies to static assets like images, CSS, and JavaScript, to full HTML pages with stable content, to API responses, and to DNS lookups.

Cache-Control HTTP headers govern how long content is allowed to be cached and under what conditions it should be revalidated, and getting this configuration wrong in either direction causes real problems: too little caching increases server load and slows delivery, while overly aggressive caching without proper invalidation can serve visitors an outdated version of a page after it’s been updated. Google’s own public-facing page cache is no longer part of this picture: the “cached” link in search results was retired in February 2024, and the cache: search operator stopped working later that same year, so a snapshot of Googlebot’s last successful crawl is no longer something a site owner or searcher can view directly through Google. The Internet Archive’s Wayback Machine is the practical substitute for viewing a page’s historical state today.

Quick reference

Term	One-line definition
Black Hat	Manipulative tactics that violate search engine guidelines
Blocklist	A list of sites flagged as unsafe, blocking access with a warning screen
Bot	An automated program that crawls or interacts with web content
Bounce Rate	The share of single-page sessions with no further interaction
Breadcrumb	A secondary navigation trail showing a page's position in site hierarchy
Broken Link	A hyperlink pointing to a page that no longer exists or returns an error
Browser	The software used to render and display web pages
Cache	Temporary storage that speeds up delivery of previously requested content

Sources cited: Google Safe Browsing, StatCounter global browser market share