GeoSeer Update: CSWs, Search Scoring, and GuatemalaPosted on 2018-05-16
Another month and another update. This month's update comprises two main components - scraping CSW services, and improved results scoring. Plus as a bonus, many more layers for Guatemala!
The most notable thing we've done this update is include over 60 CSW services into our crawl. This didn't add as many services as we hoped, in large part because we already have most of them.
We learnt the hard way that despite being a standard, CSW services are highly temperamental and software specific. Both GeoNetwork and PyCSW (the two most-deployed as far as we can see) have numerous bugs and idiosyncrasies that make getting their data very painful, even though both are CSW 2.0.2 "compliant".
We've also manually added about 9 new services for Guatemala, taking the number of layers that are searchable for that country from 95 to 800! A big thanks to Raul Calderon for bringing those services to light.
As a result of this update, and re-crawling all of our already-known services, the number of searchable layers has increased by about 10% to over 790,000 distinct layers. This is despite further improving the quality of the "remove junk layers" filter and removing over 10,000 more poorly-documented layers.
Finally, and possibly most importantly, we've done some work to improve the quality of the results. We now rate the quality of the metadata for each individual layer and use that as part of the search result scoring. You should hopefully see better quality results for any given search now.
Feedback is always welcome and if you have any thoughts or suggestions on the search quality, or services you think we should be indexing, please do contact us.