Programmatic SEO · The mechanics
Getting 200 pages indexed — and keeping them.
Crawling 200 pages isn’t the bottleneck — Google can crawl far more than that. Whether it indexes and keeps them is a quality question, and a structure question. Here’s the crawl reality for a small site, the sitemap and URL hygiene that actually matters, internal linking at scale, and what to do when Google indexes some pages and not others.
Crawling is easy. Getting indexed — and kept — is earned.
The fear that ships with a programmatic build is “what if Google never finds all these pages?” It’s the wrong fear. Google will find them — a clean site with a sitemap and internal links gets crawled, and crawling a few hundred or even a few thousand URLs is nothing for a search engine. The real question is whether Google decides each page is worth indexing, and whether it keeps it indexed after the first look. That’s not a crawl-budget problem; it’s a quality problem wearing a technical mask. So the work here is two things: don’t make the crawl harder than it needs to be, and build the internal-linking structure that signals “these pages matter and they belong together.” Get both right and partial indexation at launch turns into full indexation as the pages prove themselves.
Crawl budget — mostly a non-issue, but hygiene still matters
Crawl budget is real, but it’s an enterprise problem — a site with millions of URLs where Googlebot genuinely has to ration its visits. For a service business with a few hundred or a couple thousand pages, crawl budget is essentially never the reason something isn’t indexed. Google can crawl far more than that without breaking a sweat. If your 200 programmatic pages aren’t indexed, “I need to optimize crawl budget” is the wrong diagnosis nine times out of ten.
That said, URL hygiene still matters — not because Google can’t crawl mess, but because mess produces duplicate and low-value URLs that confuse the picture:
- One URL per page. No trailing-slash-vs-no-slash duplicates, no
?utm=variants getting indexed, no HTTP and HTTPS both live. Canonicalise. Pick one form and 301 the rest. - A sane, flat-ish structure.
/service-area/brandon/beats/locations/florida/hillsborough/tampa-metro/brandon/page.html. Shallow, readable, predictable. The structure should mirror the cluster: hub at the top, spokes beneath. - No infinite spaces. Faceted filters, calendar archives, session IDs in URLs — the classic ways a small site accidentally generates millions of junk URLs. Don’t let the programmatic build do this.
The Tampa web-design firm’s cluster wasn’t a crawl-budget situation — the URL count is manageable. The 1,500+ keywords indexed and ranked because the URL structure was clean, every page was in the sitemap, and the cluster’s internal linking told Google these pages were a coherent, supported set — not 184 orphans that happened to share a domain.
XML sitemaps and the basics
An XML sitemap is the list you hand Google of every URL you want indexed. It doesn’t force indexation — nothing does — but it’s how Google discovers pages efficiently, especially ones that aren’t yet richly linked. Submit it in Search Console. Keep it accurate: only the pages you actually want indexed go in it, which means the cells you’ve noindex‘d (the ones that didn’t clear the bar — see the thin-content line) stay out of the sitemap. A sitemap stuffed with pages you’ve told Google not to index is a contradictory signal; keep it clean. Beyond that the basics are unglamorous: a robots.txt that doesn’t accidentally block anything, no stray noindex left on from staging, fast pages (slow pages get crawled less generously and convert worse — see why a site isn’t generating leads).
Internal linking at scale — no orphans
This is the part that actually moves the needle, and the part most programmatic builds skimp on. A page with no internal links pointing at it is an orphan — Google can technically reach it via the sitemap, but nothing on the site says it matters, so it’s treated accordingly: crawled rarely, indexed reluctantly, ranked poorly. At scale, you can generate orphans by the dozen without noticing. The structure has to be deliberate:
- Hub → spoke. The pillar / hub page links to every spoke in its cluster. That’s how a brand-new spoke inherits authority from day one instead of starting from zero.
- Spoke → sibling. Related spokes link to each other — the Brandon page links to the Riverview page, the “AC repair” comparison links to the “furnace repair” comparison. Not a random web; a sensible one, based on what’s actually related.
- Spoke → hub. Every spoke links back up to the pillar that ties the cluster together. The hub is the page that consolidates the topical signal; everything points home.
- Contextual, not just a footer dump. A block of 50 links in the footer is better than nothing, but a contextual link inside the prose — “if you’re in a neighboring service area, here’s that page” — carries far more weight and reads like a real site instead of a link farm. Build the contextual links into the template logic so they’re relevant per page, not boilerplate.
This is the same architecture that powers topical authority generally — programmatic SEO is just that architecture applied where the pattern repeats. The internal-linking discipline gets its own full treatment in internal link architecture for authority sites; the programmatic-specific point is that “at scale” means you have to bake the linking rules into the template, because you can’t hand-place links across 200 pages and stay sane. The programmatic SEO service builds the linking structure as part of the template, not as a cleanup pass.
An orphan page is a page the site itself isn’t vouching for. Don’t be surprised when Google takes the hint.
Watching indexation in Search Console
After launch, Search Console’s index-coverage report is where you watch what’s happening. Expect partial indexation at first — Google doesn’t index a new batch of pages all at once; it crawls, samples, indexes some, comes back. That’s normal. What you’re watching for over the following weeks: does the indexed count climb toward the full set, or stall well short of it? Climbing is healthy. Stalling is feedback.
When Google indexes some of your programmatic pages and not others, the explanation is usually not a crawl or technical problem — it’s a quality signal. Google looked at the un-indexed ones and decided they weren’t worth indexing: too thin, too near-duplicate, targeting demand that isn’t there. The “Crawled — currently not indexed” and “Discovered — currently not indexed” statuses are Google telling you, fairly directly, “I saw these and I’m not impressed.” The temptation is to fight it technically — more sitemap submissions, indexing API, “request indexing” in a loop. Don’t. You can’t force-index a thin page, and trying just confirms the page count was the goal rather than the value. The right response is to look at the un-indexed pages honestly: are they thin? Are they near-duplicates of the indexed ones? Is there no real search behind them? Fix the template and the data — or noindex and remove the cells that can’t be fixed — and the indexed count tends to recover. (Full triage: why aren’t my programmatic pages ranking.)
If your pages are genuinely substantial and a chunk still isn’t indexed weeks in, give it more time before you panic — indexation of a new set isn’t instant, and a small or low-authority domain gets sampled more slowly. Patience, a clean sitemap, and good internal links do most of the work; there’s no switch that skips the wait. And if they’re not substantial, no amount of waiting or technical fiddling fixes it — that’s the thin-content problem, and the answer is the template and the data, not the indexing settings. The line between “too soon to worry” and “actually thin” is in will Google index 500 new pages.
Common questions
On indexation, specifically.
Will Google actually index 500 new pages?
Crawling 500 is no problem — Google crawls far more than that routinely. Whether it indexes and keeps all 500 depends on whether they’re worth it. Expect partial indexation at first, watch Search Console, and treat the un-indexed ones as feedback that they’re thin. A sitemap, clean URLs and internal links help; nothing force-indexes a thin page. Full version: will Google index 500 (or 5,000) new pages.
Google indexed half my pages and not the other half. Why?
Usually a quality signal, not a crawl problem. Google sampled the set, indexed the ones it found worth indexing, and parked the rest under “Crawled — currently not indexed” or similar. The un-indexed half is almost always the thinner, more near-duplicate, lower-demand half. Look at them honestly and fix the template and the data — or noindex and remove the cells that can’t be fixed. Don’t fight it with the indexing API. Triage: why aren’t my programmatic pages ranking.
How important is internal linking for a programmatic set?
Very — it’s the difference between a coherent cluster and 200 orphans. Hub links to spokes, spokes link to siblings and back to the hub, and the links are contextual, not just a footer dump. At scale you have to bake the linking rules into the template; you can’t hand-place links across hundreds of pages. The full discipline is in internal link architecture for authority sites — programmatic SEO is that architecture applied where the pattern repeats.
Tell us what’s broken — we’ll tell you straight if we can fix it.
No pitch deck. No sales sequence. You fill this in, we read it, and we give you a real answer — including “not a fit right now” if that’s the truth.
Get them indexed. Keep them indexed.
Send us your URL. We’ll send back a free 5-minute Loom — what’s indexed, what isn’t, what’s orphaned, and what we’d change in the structure and the pages. No call required.
Still browsing? Skip ahead.
Send your URL — we’ll point at exactly where the site is leaking, in under 5 minutes. No pitch. One business day.