The GeoSeer Blog

GeoSeer Update: CSWs, Search Scoring, and Guatemala

Posted on 2018-05-16

Another month and another update. This month's update comprises two main components - scraping CSW services, and improved results scoring. Plus as a bonus, many more layers for Guatemala!

CSW services

The most notable thing we've done this update is include over 60 CSW services into our crawl. This didn't add as many services as we hoped, in large part because we already have most of them.
We learnt the hard way that despite being a standard, CSW services are highly temperamental and software specific. Both GeoNetwork and PyCSW (the two most-deployed as far as we can see) have numerous bugs and idiosyncrasies that make getting their data very painful, even though both are CSW 2.0.2 "compliant".

Guatemala

We've also manually added about 9 new services for Guatemala, taking the number of layers that are searchable for that country from 95 to 800! A big thanks to Raul Calderon for bringing those services to light.

As a result of this update, and re-crawling all of our already-known services, the number of searchable layers has increased by about 10% to over 790,000 distinct layers. This is despite further improving the quality of the "remove junk layers" filter and removing over 10,000 more poorly-documented layers.

Improved Search

Finally, and possibly most importantly, we've done some work to improve the quality of the results. We now rate the quality of the metadata for each individual layer and use that as part of the search result scoring. You should hopefully see better quality results for any given search now.

Feedback is always welcome and if you have any thoughts or suggestions on the search quality, or services you think we should be indexing, please do contact us.


GeoSeer's First Big Update: Over 250,000 New Layers

Posted on 2018-04-27

You may have noticed the number of layers that GeoSeer now has in its index has jumped dramatically. Previously we had about 450,000 layers, now we have around 715,000 layers, that's over quarter of a million more layers! And that's after we've improved the junk filter to get rid of a lot of the spurious test layers (it's unlikely anyone actually wants to see the GeoServer test layers for instance), and layers with no names/titles.

These extra layers are a result of a whole bunch of work to improve the GeoSeerBot (the thing that goes crawling around the internet trying to find data). We now search many more data sources, and we're also now scraping numerous HTML pages. We haven't yet started scraping CSW services, that's our next goal.

We've also done some work to resolve a few behind-the-scenes niggles. For example previously we kind of didn't have the country of Chile in our spatial data (ooops!), and so no layers were being assigned to Chile.



Blog content licensed as: CC-BY-SA 4.0