We pointed a script at a public dataset of world cities and it produced more than 231,000 pages. Add countries, universities, airports, airlines, currencies, languages, and timezones and the total clears a quarter of a million URLs in the sitemap, each one a real page with real data.
That’s programmatic SEO. One template, one dataset, hundreds of thousands of pages.
It’s also the fastest way to get a manual penalty from Google if you do it wrong. So let’s talk about how to do it right.
This is a build log, not a pitch. The worked example is CityAPI, a free, no-auth JSON API for city and geo-data we run at BrotCode. You don’t need to use CityAPI to get anything out of this; the method is the point.
What Programmatic SEO Actually Is
Most content is written page by page. A human picks a topic, researches it, writes it. That doesn’t scale past a few hundred pages without a small newsroom.
Programmatic SEO inverts that. You start with structured data: a table, a database, an API response. Then you write one template that turns a single row into a page. Run it across every row and you have a page per row.
The math is the whole appeal. One template, 231,452 rows of city data, 231,452 city pages. Nobody wrote them. The data wrote them.
Tripadvisor, Zillow, and Wise all run on this. So does every currency-converter site and every “weather in your-city” page you’ve landed on from a search. When you last searched “population of Stuttgart,” the result that knew the answer was probably generated by a template, not written by a person.
The trick is making sure the template produces something worth indexing. Most attempts don’t. That’s where this gets interesting.
The Dataset Is the Product
Before any template, you need data. Good programmatic SEO lives or dies on the dataset underneath it. Thin data makes thin pages, and thin pages get ignored or penalized.
We assembled CityAPI from seven open datasets. Each one is public, each one has a license, and each license has terms we have to honor.
- GeoNames (CC BY 4.0) supplies the cities, alternate names, timezones, admin regions, and country populations. The city list comes from their
cities500export: every populated place with more than 500 residents. - mledoze/countries (ODbL) gives us country demographics, languages, currencies, capitals, borders, and flags.
- OurAirports (public domain) is the airport data: IATA and ICAO codes, coordinates, type.
- OpenTravelData (CC BY 4.0) covers airlines.
- Hipo’s university-domains-list (MIT) provides universities.
- Currency and language reference data round it out.
Here’s the part people skip: attribution isn’t optional. CC BY means you credit the source, visibly. ODbL means you keep derivative data open under the same terms.
Strip the credits and you’re violating the license, and the maintainers will eventually notice. Our footer credits every source on every page. That’s a legal requirement, not a courtesy.
The other thing about open data: it goes stale. Cities grow, airports close, currencies get redenominated. We re-import on a schedule and track when each source was last pulled, so a page can honestly say when its data was verified, and that freshness is part of what keeps a page worth indexing.
One Template, A Quarter-Million Pages
Once the data’s clean, the template does the work. Here’s the actual page count by type, straight from the database:
| Page type | Count | Source |
|---|---|---|
| Cities | 231,452 | GeoNames |
| Universities | 10,219 | Hipo |
| Airports | 9,938 | OurAirports |
| Airlines | 1,144 | OpenTravelData |
| Timezones | 418 | GeoNames |
| Currencies | 307 | reference data |
| Languages | 183 | reference data |
| Countries | 250 | mledoze |
That’s over a quarter-million pages across eight entity types. One Rails view per type renders all of them. Change the city template and every city page updates at once. That’s the whole point.
But a page count is not an index count, and conflating the two is the mistake that sinks most programmatic SEO projects. We generated the pages; how many Google decides to index depends entirely on whether each page is worth indexing. Generating a page is free, but earning a place in the index is not.
Why Templated Doesn’t Mean Thin
Google has a name for what happens when you do this lazily. In March 2024 they expanded their spam policies and introduced “scaled content abuse”: pages “generated for the primary purpose of manipulating search rankings” that don’t help users. The December 2024 spam update enforced it hard, and sites that had spun up thousands of near-identical pages watched their traffic evaporate.
There’s an older policy too: doorway pages. The canonical example is a site that spins up a “plumber in [town]” page for hundreds of towns, each one a template with the town name swapped in and nothing else different. Same words, different noun. That’s a doorway farm, and it’s been against the rules for over a decade.
So the question that decides everything: what makes a city page for Stuttgart genuinely different from a city page for Hamburg, beyond the name?
For us, four things.
Real data per page. Stuttgart’s page shows its actual population, coordinates, elevation, timezone, and admin region. Hamburg’s shows Hamburg’s. These aren’t variables in a sentence template; they’re distinct facts pulled from distinct rows, so a reader who came for “elevation of Stuttgart” gets a real answer Hamburg’s page doesn’t share.
Internal links from real relationships. Each city page lists the nearest cities within 50 kilometers, computed by actual distance, plus nearby airports, so Stuttgart links to Esslingen and Ludwigsburg while Hamburg links to its own neighbors. The link graph isn’t decoration: it’s geography, different on every page, and it’s how crawlers find the deep pages, since a city nobody links to externally still gets discovered through its neighbors.
Auto-generated Q&A from the data itself. Each city page builds a small FAQ from its own relationships: which airport serves it, how far it is from the capital, whether it has a university, what languages and currency the country uses, its elevation. The questions are templated, but the answers are computed per city from real values and marked up as FAQ structured data, so Stuttgart’s “how far from Berlin” answer is a different number than Hamburg’s.
Structured data on every page. We emit Schema.org JSON-LD typed to the entity: City for cities, Country for countries, Airport for airports, Airline, Language, CollegeOrUniversity, plus BreadcrumbList and FAQPage markup, each carrying the entity’s real coordinates, population, codes, and a sameAs link to Wikidata where we have one. Search engines read this directly: it’s the difference between a page that looks like it’s about Stuttgart and one that tells a crawler, in machine-readable terms, exactly what it’s about.
None of this is magic. It’s just the discipline of making sure every page carries information that exists nowhere else on the site.
The Metadata Trap Nobody Warns You About
Here’s a failure mode specific to programmatic SEO: duplicate titles and descriptions. You can have 231,452 pages with genuinely unique bodies and still tank, because every one ships the title “City Page” and the meta description “Browse city data.”
Search engines treat near-identical metadata across thousands of URLs as a thin-content signal. It’s one of the most common ways programmatic sites quietly fail.
So our titles and descriptions are templated with real values and capped to fit. A city title reads “Stuttgart, Germany: population, timezone, coords” and stops at 60 characters. The description names the actual population, timezone, coordinates, and nearby-city count, and stops at 160.
When a value is missing, the clause that needed it gets dropped instead of rendering “population null.” Every page gets a title and description built from that row. At a quarter-million pages, that’s not something you write by hand. It’s something the template has to get right.
Programmatic SEO Tools, Briefly
You don’t need a specialized stack. People reach for one of three setups.
The no-code route pairs a spreadsheet or Airtable with a site builder like Webflow or a tool like Whalesync. You keep rows in a sheet, map columns to template fields, and publish. It’s the fastest way to ship a few thousand pages and the right call if you’re not an engineer.
The framework route uses Next.js, Astro, or similar with getStaticPaths-style generation. You query your data at build time and emit a static page per row. This is where most serious programmatic sites live, because static pages are fast and cheap to serve.
The full-app route, which is what we run, renders pages from a database on request, with a generated sitemap pointing crawlers at every URL. CityAPI is a Rails app: one controller and one view per entity type, a sitemap.rb that walks every table, and PostgreSQL doing the geographic distance queries for the nearby-cities links. We went this way because the data updates often and we wanted server-side rendering with no build step across 250,000 pages.
The tool doesn’t matter much. The dataset and the template discipline are 90% of the outcome.
When Programmatic SEO Is the Wrong Call
This model is seductive because the page count gets big fast. That’s exactly why it’s so often misapplied.
It’s wrong when the dataset is small. Look back at our own table: 183 languages, 307 currencies, 418 timezones. Those are real pages, but they’re a thin tail, without enough distinct, in-demand data to justify treating each one as an SEO play the way cities deserve.
We build them because they complete the API and they cross-link, not because “timezone pages” is a traffic strategy. If your entire dataset is a few hundred rows like that, programmatic SEO is the wrong hammer. Write those pages by hand and make each one excellent.
It’s wrong when there’s no unique value per row. If you can’t point to a real fact that makes row 4,000 different from row 4,001, you don’t have a dataset. You have a template with a noun slot, and that’s a doorway farm waiting for a penalty.
And it’s wrong when nobody’s searching. Generating a page for every possible two-city combination is technically easy and almost entirely pointless, because nobody searches most of those combinations. Page count is not the goal; answering real queries is, and a thousand pages that answer real questions beat a million that answer none.
The honest test is one question: if a human landed on this single page from a search, with no knowledge of the other 250,000, would it answer what they came for? If yes, generate away. If no, you’re building a liability.
What This Looks Like in Practice
The architecture here isn’t exotic: a clean data layer, a thin template layer, and structured data throughout. The same instincts that make a good API-first system make a good programmatic-SEO system: model the data well once, and the surfaces you build on top of it stay simple. Our city pages and our JSON API render from the same models, which is why a page can show you its own curl command and the exact JSON it would return.
If you’re weighing the stack for something like this, our take on choosing a tech stack for 2026 covers the same trade-off we made: boring, server-rendered, and fast beats clever. And if the dataset is multi-tenant or user-generated rather than public, the patterns in our guide to building a multi-tenant SaaS platform matter more than the SEO mechanics.
Programmatic SEO isn’t a growth hack. It’s a data-engineering problem wearing a marketing hat. Get the dataset right and make every page earn its uniqueness, and the scale takes care of itself; skip that work and you’ve built a quarter-million pages nobody will ever see.
Thinking about turning a dataset into a real content surface, or wondering whether yours is big enough to bother? Let’s talk it through. We’ll tell you honestly whether programmatic SEO fits, or whether you’re better off writing the pages by hand.