The GeoSeer Blog

All pages with the tag: New

GeoSeer Licensed Products Released

Posted on 2019-11-27

We hinted at it in August and now it's here; today we're releasing a licensed version of the database that sits behind GeoSeer, creatively called: GeoSeer Licensed.

GeoSeer Licensed content allows organisations and businesses to host and integrate their own local copy of GeoSeer's industry leading database of spatial web services directly into their own applications, products, or services. We figure this can improve end-user workflows, make discovery of third-party data and services much easier, and help organisations realise some of vast economic benefits Open Data presents - estimated at €75 billion across the EU in 2020 alone. Also, you can build cool things with it.

This nicely compliments the GeoSeer API which was released back in April. Where the API allows organisations to easily build GeoSeer's search into web applications by making calls to our servers (like the GeoSeer WebGIS demo does), GeoSeer Licensed allows organisations to host the database locally, or release it in a product, meaning any sort of application can be built around it, not just search.

The Value Add

Lets be honest, there's nothing stopping you from building your own spider, finding 550+ dataportals to use as seeds, scraping half a million web-pages a month, and building your own database of geospatial web services. So why use GeoSeer Licensed?

  • Industry Leading Database - At the time of writing we're not aware of any database of geospatial web services anywhere near as large as this (and we've looked).
  • Current - We run regular crawls to make sure everything in GeoSeer Licensed is current. We provide monthly updates so you always have the latest data.
  • Pre-Cleaned - despite these being standards, everyone likes to do things differently. We've pre-cleaned the fields and tried to standardise them to make them consistent. For example we've found no less than 55 ways to say "No license fees" across 12 languages and turned that into a simple: "No".
  • XML Free - it's 2019 and fewer people want to deal with the hassle of namespaces, esoteric data models, and the other complexities XML brings. We've extracted the data from the XML documents and put it into a database, with some JSON sprinkled in where necessary.
  • Spatial Extents - For GeoSeer Datasets we include the extents bounding boxes in WGS84 format, along with scale-appropriate textual representations of the locations, potentially down to county level (like you see in GeoSeer Search).
  • Quick start - Because we've done all the hard work, written up documentation about what each field means (so you don't have to read the standards), and packaged it in an SQLite database, it's super easy to get started with. Simply open your favourite database admin tool (it probably supports SQLite), and get querying with good old fashioned SQL.

The Products

GeoSeer Services is a database with all of the current geospatial web services that GeoSeer knows about in it, as well as information about their endpoints, hosts, and more. At the time of writing it has over 215,000 services in it from across 4,930 different hosts. This information is good for investigating who is hosting services, what sorts of services exist, INSPIRE deployment patterns/conformity, etc.

GeoSeer Datasets builds on GeoSeer Services, including not only all of the service information, but all of the dataset metadata as well. This includes: dataset extent bounding boxes, dataset keywords, declared projections, scale-appropriate textual location, metadata urls, and more. GeoSeer Datasets is well suited to building search engines (surprise!), GIS, web-GIS, academic research, and much more.

Both products are available with a number of different license types, from a research license through to commercial licenses. We can also provide subsets of the database if you don't want everything.

We like to think we've built a great search engine around this data, so now it's your turn - what can you build with it? Find out more about GeoSeer licensing.


New Historical Statistics and Extent Plots

Posted on 2019-09-02

The GeoSeer stats page went live just shy of a year ago and we've been meaning to update it with more stats ever since. Today we've done just that, with a few new stats, and a lot of cool plots.

The first statistic is the most simple: The number of countries that are hosting OGC services. A country for our purposes is simply defined as having a unique ccTLD (the last part of a domain: .pl, .us, .br, .au, etc.). At the time of writing this blog post, it's 87 of the 244 defined ccTLD's. (Note this does include .eu for the European Union which most people wouldn't actually consider a country).

Historical Data

GeoSeer has been live for almost 18 months now, and we've been crawling the WWW for OGC services for even longer. This means we have a trove of historical data about services, and the new stats expose some of that. If you look at the stats page now, you'll see the General Stats section has been tweaked slightly.

As well as continuing to show stats about the current state of OGC services "Now", we've added an extra column for "Ever" which shows the total numbers that we've ever found since we started doing this. Then with a little maths we show the percentage of the things we've ever found that are still alive now.

The Ephemeral Nature of Public Data
Datasets

The single most glaring statistic from this historical data is that we've found a total of 4,949,124 datasets since we started crawling, but only 1,865,660 are live and active in our index right now. Or put another way, just 37.7% of the datasets hosted by OGC services that were publically available at some point in the past 18 months are still online!

Services

And while that's the most stand-out statistic, the others also show how transient the OGC services that host these datasets are. Over the course of the past ~18 months we've found 291,779 different services, yet only 71.83% of them were online and responding on our last crawl.

Hosts

The final statistic of note here is the number of hosts. These are the domain names themselves, and different subdomains are counted as different hosts (so www.example.com is different from ogc.example.com). Even these have experienced considerable churn over what is a relatively short period of time, with only 85.5% of hosts remaining online. We should point out that we ignore the scheme (that's the http:// or https://) and ignore the port when we consider if something is a "host", so if a host changes from insecure to secure (and quite a few do), it won't make a difference to this statistic.

Thoughts

All of this change makes it harder for users to rely on this data even if they can find it. Especially for things like scientific research which relies on repeatability, including the ability for other scientists to go back and take a second look at the original data; a difficult thing to do when the datasets/services/hosts have gone offline.

This also highlights the importance of keeping data portals current. Link rot is a real thing and data curators need to ensure they maintain their portals otherwise the portals are worse than useless (because they're wasting everyone's time with bad links).

Extent Plots

The other part of this statistics update is a collection of extent map plots that show what parts of the world have datasets. We're going to do a separate blog post about them in the future.


GeoSeer API Goes Live

Posted on 2019-04-09

We've hinted at it in previous blog posts, but now it's time for the big reveal: the GeoSeer API is live!

Designed to allow you to integrate the power of GeoSeer's search into your business's Web GIS or other application, the API allows your users to easily and seamlessly search for datasets without having to leave their normal tooling. There's an entire-page with information about it here.

As well as including all the features you're used to in the web-search, the API also includes some cool new features:
  • Bounding Box Search - Search for datasets that are within, disjoint, or intersecting a given bounding box, while also using a search term. Ideal for searching for layers that overlap the user's current viewing area.
  • Lat/Lon Search - Easily find datasets that intersect a specific point. Your user selects a location and now they can find data that intersect it. Simple.
  • Service Type filter - Only find datasets that are of the OGC service type(s) that you're interested in. Does your application only support WMS and WFS for instance? Then filter results to only search those service types.
  • Service Search - The GeoSeer web search only allows users to search datasets/layers, but the API also allows searching by service. Readily find services hosted by anyone from local government, through to global spanning organisations like the World Food Programme and everyone between.

We've created the snazzy GeoSeer API WebGIS that demonstrates the API in action, giving you a feeling for what you can do with it and how it could integrate with your own application(s).

The API has several plans to cover various needs, and the Enterprise plan allows for considerable customisation so you can get exactly what you need. So take a look and find out more about the API


A New Look for a New Year

Posted on 2019-01-09

We thought we'd welcome in 2019 with a slight update to the look of the site to improve usability. In particular, the GeoSeer website should now be much better behaved on mobile devices. There's also now more consistency in page navigation to help you find where you're going, and we've tweaked the search results page to better expose meaningful information, including the service's url.

The changes are not just cosmetic, we've also improved the search functionality to try and provide better results for multi-term queries. So searches for things like tree preservation orders will now preferentially try and find results where the words are next to each other without your having to put quotes ("") around it. We've also done a fair amount of work to the location assignment service (the bit that decides what area of the planet a bounding box covers) so you should be getting better results there too.

And as if all that wasn't enough, there are a couple of new features - this blog now has an RSS feed so you can better keep up with our posts.
But we've been keeping the best until last - we now have an API! It's still in beta for the next few weeks but if you're interested in using it, do let us know. There's an entire page with information about the API on it, and we'll be doing a blog post about it when we launch it.


One Million Layers, and a Stats Page

Posted on 2018-09-27

GeoSeer has now hit the one million distinct spatial layers milestone in its index. That's a staggering amount of spatial data, and all of it is freely accessible via OGC standards, and of course, also easily searchable with GeoSeer. This actually represents over 1.7 million publicly available WMS, WFS, WCS, and WMTS layers - see this previous blog post for a discussion on why this number is even higher. This represents data from over 100,000 OGC services.

We've been gradually increasing the number of layers in our index consistently since launch as a result of a combination of things: our ongoing efforts to expand where we collect data from, improvements to the GeoSeerBot (we feed it lots of veggies!), and ever more layers being added to services we already index.

How many more layers and services are there out there? We don't know; but we plan on doing a blog post about the number of services, so keep an eye out. And we're going to keep trying to find more.

What was that about a Stats Page?

That's right, because we're big data nerds (see what we did there?), we've also created a page that's got a high-level breakdown of statistics for what's in our index. You can find the new stats page here. We don't claim to have a complete index of all public OGC services, but we're fairly certain it's a large chunk of the ones that are out there, so this is a fairly representative sample of what's available on the internet.

The stats page will be updated about once a month and should always approximately represent what's in our index. In the future we plan on adding further and more detailed statistics including a breakdown of what middleware is used to run these services, so keep an eye out for it.

Need more stats? Ask away!

If there's any particular statistic you're interested in that's not on there, let us know and we'll consider adding it. Or if we don't think others will find it interesting (how many people really want to know that the average (mean) number of Layers per Endpoint is (at the time of writing) 12.99? Or that the median and mode are both 2, the minimum is 1, and the maximum is 4,629), we'll tell you directly, we try to be nice like that. So ask away.

Blog content licensed as: CC-BY-SA 4.0