The spatial data search engine
We created GeoSeer to solve a problem: it's an absolute pain to find spatial data.
There are literally millions of free and open-source datasets out there and a huge number of them are spatial, but how do you find the one that you need? No-one really wants to go rummaging through dozens of CKAN portals simply to get a dataset about the location of fire stations in Warwickshire for example.
In particular we wanted to solve the problem for spatial data because spatial data - by definition - already has a location associated with it. This means we can return more relevant results - in this example the fire stations in Butte, California, USA are unlikely to be of interest to you.
There are many thousands of public-facing web-services out there using the OGC (new window) standards (WMS, WFS, WCS, WMTS) to serve data, but they have only limited discoverability. We wanted to create a search engine that would bring all of these services into a single place, so we created GeoSeer.
How do I use these results?
All of these results are spatial datasets based around OGC web-services. Specifically they're (with links to Wikipedia that open in a new window):
- WMS (Web Map Service) - basically, a map
- WFS (Web Feature Service) - raw vector data
- WCS (Web Coverage Service) - raw raster data
- WMTS (Web Map Tile Service) - a pre-rendered basemap
To access them, you'll need to use a GIS. QGIS (new window) is a popular and free GIS and supports all of these standards and many other spatial and non-spatial formats besides. If you google around you can find lots of tutorials explaining how to add the services to QGIS.
Once you have a GIS, you can find the GetCapabilities URL at the top of the layer result page; you'll need to feed this into the GIS. You can also find the layer name/title with it - you'll need these to choose the right layer.
How does it work?
The GeoSeer spider scrapes lots of different sources (mostly CKAN portals), using various API's to discover the many OGC web-services that are registered with them, downloading the GetCapabilities document for every web-service it can find (GetCapabilities is an OGC standard XML document with lots of information about what layers exist in a service). We then post-process all of those GetCapabilities documents, removing duplicate layers, cleaning them up, determining the spatial extent, and finally making them searchable.
The end result is a database with hundreds of thousands of layers that sits behind a simple, fast web-page.
I don't like the results...
Unfortunately the results brought back by GeoSeer aren't perfect, in large part because the source data is often lacking. Lets be honest here, no-one likes writing metadata!
GeoSeer does its best to clear up bad data and hide it, but the following problems are things it can't do anything about and are remarkably common:
- Bad/missing/unclear layer descriptions/abstracts in the GetCapabilities documents - often they don't describe the layer at all.
- No suitable keywords in the layer.
- Incomplete/non-existent service-level metadata (contact info, costs, license, etc).
- Bad/missing/unclear layer names and titles. A layer called "1" doesn't help anyone know what it's about.
- Bad spatial information - wrong coordinates and/or wrong projections.
So, if you host a OGC service, please make sure your metadata is correct! If you go and update it now, we'll find it on our next crawl and everyone will benefit from better results.
For our part, we're working to improve both the quality and quantity of the results, including adding even more datasets to the mix.
I have a suggestion for...
Cool! Email it to us at the Contact Us address below.
We're always up for new datasets, new CKAN portals, new search ideas, and anything else that will help us solve this problem.
If you have any feedback, questions, suggestions, or comments about GeoSeer we'd love to hear them, send us an email to: email@example.com
We're particularly interested in new data sources.
Boring legal stuff
We value privacy, including yours, and we don't like lawyers (who does?), or companies that have multi-thousand-word EULA's, so we keep this short and simple:
- We don't endorse any of the results. There's no filtering or censorship here, we only do the minimum-necessary data-cleansing for "make it work" and "no empty results" purposes.
- In the incredibly unlikely situation this website breaks your computer, burns down your house, or otherwise starts armageddon, it's not our fault!
- We don't set any cookies.
- We don't don't do any tracking.
- Please don't mine our site. If you want access to the raw data, talk to us (email in the Contact Us section).
- We do keep server-side access logs.