Blog

Distributed versioning for geospatial data

Interested in distributed version control for geospatial data? Learn more about OpenGeo’s vision for a versioned geospatial future:

Spatial data has become one aspect of the greater information technology landscape for any given enterprise. Traditionally this data has been siloed and forced through specialized workflows, a process many subscribe to — we don’t. That may sound odd coming from what many would consider a GIS company (we’re not, we do spatial IT) but it’s true. OpenGeo is working to shift how geospatial information is viewed and used. As opposed to spatial data being be locked-up in a single machine or database, we see a future where it could live in a collaborative infrastructure that can track data’s origin and evolution, much like source code.

Just how much is geospatial information like source code though? The comparison is apt in many ways. While many software users have no interest in gaining access to source code,  most map viewers don’t need to engage with the data underlying a map. Yet those who use geospatial data—like those who design or build specialized software—value the ability to access and alter the data to suit their needs. Just as access to source code enables a developer to change software by adding to or changing its functionality and appearance, access to underlying geospatial data enables cartographers and analysts to fix mistakes, conduct analysis and modeling, and update a publicly available dataset with data they have collected themselves.

Enabling true collaboration around geospatial information can have profound implications for users of geospatial data. Open source collaboration has transformed the software landscape by creating a vast commons of powerful tools that anyone can use and improve. Similarly, geospatial crowd-sourcing efforts like OpenStreetMap and Ushahidi have significantly influence the availability of freely available high-quality geospatial data.  However, moving beyond sourcing information from crowds and towards a data commons collaboratively developed and shared by governments, NGOs, commercial companies, and individuals will require a substantial shift in how geospatial data is stored and distributed.  Adopting the distributed version control model pioneered with source code can play a critical role in alleviating the difficulties that have historically plagued users of geospatial data. A distributed version control model can better address such problems as collaborating between users or organization, maintaining authoritative data, and enabling offline, low-bandwidth, or intermittent connectivity.

We’ve written more about our views and plans for versioning in a three-part series entitled “Distributed Versioning for Geospatial Data”:

  1. Distributed Versioning for Geospatial, a new approach offers an outline of our vision of geospatial versioning
  2. Distributed Versioning Implementation outlines the work we’ve already done
  3. Distributed Versioning, Potential Development lays out potential plans for the future

We’d like to hear your thoughts about where you see potential and where you think we may be wrong. OpenGeo is not going to going to create a distributed version control system for geospatial data alone, and we don’t want to. We’re interested in getting the conversation started and want you to be a part of it.

Tags: ,

3 Responses to “Distributed versioning for geospatial data”

  1. Barry Rowlingson Says:

    So instead of downloading a shapefile, the vision is that I’d do something like:

    geogit clone geo://hostname/path/uk

    and get something I can map in Qgis. Someone upstreams fixes some issues with the data and I do:

    geogit pull

    and voila I’ve got the latest revision. I do:

    geogit diff

    and I see highlighted the differences – perhaps someone has snapped some polygons correctly and the region is highlighted in yellow, or fixed a spelling mistake and I see a comparison of attribute tables. I then do

    geogit update

    and start working with the corrected data.

    If I’ve got write permission on the repo I can push.

    If there was something like github I could fork, edit, and send a “geopull” request? Oh that would be sweet.

  2. Gabriel Roldan Says:

    Hi Barry, indeed that’s gonna be possible, although seen as for “power users” on the longer term approach. Since the major target audience may not be so into command line interfaces it’d be wise to build the appropriate tools around geogit to make it as transparent and easy to use as possible. But yeah, there’s a benefit in having such a CLI interface starting with making the life of the ones that are used to it easier.

  3. [...] new multimodal transportation tool, OpenGeo’s demo of improved NHD editing, Chris Holmes theoretical augmentation of this NHD model, and other projects like Ride the City.  This applies to OpenStreetMap.  This makes it squarely a [...]

Leave a Comment