One of the questions we come across quite often is the deceptively simple "How many layers are there"? At the time of writing our front page says "over 790,000 distinct... layers", so that's the answer right? Well, not quite, and why is that "distinct" in there anyway? There are actually quite a few potential answers so lets go through them.
Note: All numbers in this post are correct at the time of writing but will certainly change within a few weeks as we continue to index more services.
That's a lot of layersLets start with the largest number: 1,224,379 layers. This is also the simplest number - it's the total number of layers that we find in all of the unique capabilities documents that we download ("capabilities documents" are what map servers use to tell the world what layers they have and what features they support). This is the easiest number to give, and the one most normall given. It is "correct" in that there really are over 1.2 million layers out there across various service endpoints, but as you'll see from the other numbers, there are a few problems with using it.
Meaningless layersWe do a lot of work to try and weed out "meaningless" layers from our index. This isn't a reflection on the data inside the layers, but on the metadata in the capabilities document. For instance there's no point us indexing a layer that has a name of "1" and no other information; for all we know these layers may have great data behind them, but if there isn't even a meaningful name our users will never be able to find those layers, so we simply remove them to stop them cluttering up the results.
It's at this stage that we also remove layers that are pre-installed defaults, like the TOPP/Tasmania data that comes with GeoServer.
In total all this filtering gets rid of over 38,500 layers, leaving us with around 1,185,500 layers.
Many endpoints and the same layerIt turns out a lot of those layers are duplicates; there are many services out there which have lots of different endpoints (the URL you use to access it) that all serve the same layer(s). In fact, there is one single layer that is served by over 2400 endpoints on the same domain (we group services by domain as part of the de-duplication process). That's an exception but there are over 580 layers that are duplicated over a hundred of times on the same domain, and in total we identify 387,000 duplicate layers. We don't get rid of them entirely - you may have noticed in the results that we list multiple capabilities URLs for some layers - but we don't count them as separate layers. Once we get rid of all of those, we're down to 798,045 layers.
Different service typesThe final component is - what happens if a layer is served up from the same server as both WMS and WFS? Or WMS/WCS/WMTS, etc? For our purposes we try and group them together and treat them as a single layer, but as you've likely noticed in the results, we do flag that a layer is available as multiple service types. There are surprisingly few of these: only 3,076 layers are used across service types. This is where we get our final, front-page number of 794,943 layers.
So which is it?
As you've probably gathered by now, there isn't a "right" number. We choose to use the lowest number because it's most honest for our purposes; when you search GeoSeer you're searching 794,943 distinct geospatial layers. It's of no help to you if you get the same layer 127 times in the results because that's how many endpoints host it. Yet across all servers and endpoints, GeoSeer is searching what represents 1,185,727 publicly accessible layers.