Archive for the ‘Technology’ Category

GeoServer in a clustered configuration (part 2)

In our last post on clustering, we talked about the theory behind some different options for clustering. In this post, we’ll go into an example of clustering, taken from our recent experience with one of our OpenGeo Suite Enterprise clients. If you’ll be attending FOSS4G-NA and want to learn more about clustering and GeoServer consider attending our GeoServer training and Juan Marin’s GeoServer in Production presentation (scheduled for 5/23/2013 at 11:30 am).

Clustering Scenario

In this following scenario, we will work through the installation and configuration of two GeoServers each inside their own servlet container instances on the same machine. Each servlet container will use the same JRE and the same container binaries (Apache Tomcat 7), but they will have independent configurations that allow them to run on different ports. These two GeoServer/Tomcat instances will be fronted by a local software proxy called HAProxy which acts as a HTTP/TCP load balancer. Load balancer configurations provide very basic “round robin” balancing of GeoServers. More sophisticated load-balancing configurations are possible, but are beyond the scope of this example. All GeoServers will be deployed as WAR files placed into each of the Tomcat webapps directories. It is possible to have multiple instances of Tomcat share a single web-application through the use of contexts. This is useful if you anticipate your web-application (GeoServer) will be changed/updated frequently, but isn’t necessary. Read the rest of this entry »

Alpha releases

openlayers3

One thing I love about open source development is the ‘alpha’ release.

Last week was an exciting week of alphas for OpenGeo, both OpenLayers 3.0 and GeoGit had their first releases and launched new websites. The two websites are admittedly not very sophisticated—I made the geogit.org with GitHub’s page generator and Andreas pulled together ol3js.org with Bootstrap—but awesome websites can come later. The point of these alpha releases is to get something out in the world and widen the open source process to new users and potential contributors.

Alpha releases are rarely seen in proprietary software development since software in an alpha state is generally quite buggy. To quote Wikipedia: ”alpha software can be unstable and could cause crashes or data loss.” At this point many would turn away and run as far from the software as possible but to me it’s an awesome thing, an understood pact between the developers and the users that says: “hey, we’re not perfect, and we know our software is far from perfect, but if you understand the risks we’d be really excited to show it to you.”

The process opens up a dialog of equals—not the typical consumer relationship, but a collaborative one. The user of alpha software actually has a responsibility to communicate when (not if) things go wrong and to tell the developers how it crashes, what important option isn’t there, how the installation fails, or even how the website is confusing. In this way, responsibility can grow from being an alpha user to include helping with documentation, improving the website, debugging problems, contributing patches, and eventually building major new features as a core developer. Indeed the point of the alpha release is to put a stake in the ground and open the process to gain feedback from others, allowing users and developers to build the future together. Everyone is expected to be a true participant, in the fullest sense of the word, with responsibilities as well as privileges as opposed to just a passive consumer.

We encourage you to check out both the OpenLayers 3.0 and GeoGit alpha releases and let the teams know what you think. OpenLayers in particular has a very solid core but is looking for practical input from real users. We think the projects show a lot of potential, and we’re excited for your feedback, encouragement, and even contributions. Don’t hesitate to jump in and join us as we build the geospatial future together.

GeoServer in a clustered configuration (part 1)

Recently, we helped one of our clients who wanted to set up a GeoServer cluster. There are different ways to accomplish clustering depending on your specific needs, but we thought it would be illustrative to show what we did in this particular situation. Keep in mind this is a specific treatment and fairly tailored. We encourage you all to experiment with the newest features, but remember to do so in your testing environment!

We’ll start with some clustering theory and tips before launching into the actual details of how to do it.

Background

A computing cluster consists of two or more machines working together to provide a higher level of availability, reliability, and scalability than can be obtained from a single node. Nodes in a cluster are positioned behind a proxy server and/or load balancer that delegates requests to cluster members based on any one member’s ability/availability to handle load.

Clustering

Clustering

Similar to other applications with long-running in-memory states and high data I/O, GeoServer sees performance gains with two (or more) nodes clustered behind a load balancer—even with the slight overhead of the load balancer that sits in front of the cluster.

Generally, there are two complementary purposes for clustering GeoServer:

  • To provide high-performance and/or throughput
  • To achieve high availability

In the most demanding situations, GeoServer can be deployed in combinations of high-performance and high-availability instances.

High-Performance Clusters

A high-performance GeoServer configuration deploys several instances of GeoServer on a single machine.

High-performance cluster

High-performance cluster

Each GeoServer instance is deployed into its own servlet container (Tomcat, Jetty, etc.). Individual servlet containers are configured independently and spin up their own JVM, each with it’s own memory and processor allocations (borrowed from the pool of resources on the host machine). GeoServer’s memory and CPU runtime footprint are optimized for high throughput under heavy concurrency with such a deployment, but always consider that these different deployed units will compete for the physical server’s resources. To find the best balance we recommend, as always, to test for your particular scenario.

A load balancer or proxy fronts the cluster, and directs traffic to the member of the cluster most able to handle the current request. In this case, nodes will likely share the same server name or IP address, but listen for requests on different ports. For example:

Load Balancer @ http://<server>:80/<alias> forwards to one of:

  • GeoServer 1 in Tomcat 1 @ http://<server>/geoserver:8081
  • GeoServer 2 in Tomcat 2 @ http://<server>/geoserver:8082
  • GeoServer 3 in Tomcat 3 @ http://<server>/geoserver:8083
  • GeoServer 4 in Tomcat 4 @ http://<server>/geoserver:8084

An approach that deploys multiple instances of GeoServer into the same servlet container is not recommended. In this case, since host resource allocation (to a common JVM) will not be sequestered as neatly, competition for those resources will occur, limiting the benefits.

Users might also consider using the built-in clustering capabilities found in Enterprise Application Servers (such as Oracle Weblogic or JBoss), however this is beyond the scope of this discussion.

High-Availability Clusters

A high-availability implementation will spread several GeoServer instances across several machines (nodes) in a cluster. These nodes can be physical or virtual machines.

High-availability cluster

High-availability cluster

Nodes are normally located behind a load balancer that redirects traffic to any single GeoServer based on traffic volume and availability. In this case, nodes will likely be on different servers or IP addresses and listen for requests on the same port. For example:

Load Balancer @ http://<server>:80/<alias> forwards to one of:

  • GeoServer 1 in Tomcat 1 @ http://<server1>/geoserver:8080
  • GeoServer 1 in Tomcat 1 @ http://<server2>/geoserver:8080
  • GeoServer 1 in Tomcat 1 @ http://<server3>/geoserver:8080

Data directory location and catalog reloads

Some important considerations to be made when clustering several instances of GeoServer concern the location of the GeoServer data directory and a strategy for reloading all cluster members’ data catalogs.

The GeoServer data directory is the location in the file system where GeoServer stores its configuration information. The configuration defines things such as what data is served by GeoServer, where it is stored, and how services such as WFS and WMS interact with and serve the data. The data directory also contains a number of support files used by GeoServer for various purposes.

The spatial data accessed by GeoServer doesn’t need to reside within the GeoServer data directory, just pointers to the data locations. This should be obvious for data stored in spatial databases, which are certainly in different locations (on disk) and often on different machines; however the same is true for file-based spatial data. (Read more about the GeoServer data directory.)

GeoServer’s catalog is an in-memory representation of the configurations in the data directory. Storing the configurations in memory means that GeoServer can access this information faster than by reading these instructions off disk. However, this sometimes requires that the in-memory catalog be refreshed when configurations changes are made to the disk-based GeoServer data directory, or to the actual data served in GeoServer.

Unless catalog (re)configurations are largely static, or some amount of catalog discrepancy or availability is acceptable, a common GeoServer data directory location for all clustered instances is highly recommended.

The location of the GeoServer data directory is stored in the GEOSERVER_DATA_DIR variable. It can be configured in one of three ways: in each instance’s web.xml file (/webapps/geoserver/WEB-INF), through a common environment variable, or through a parameter passed to the JVM in the container start-up command.

Some implementations have clustered GeoServer instances using separate data directories that are synchronized manually (low change frequency) and automatically (using rsync), but neither approach is as common or recommended as a shared data directory.

Regardless of the mechanism for synchronization, changes to the data directory and the in-memory catalog will normally be directed by one master GeoServer. This can be enforced by disabling the GeoServer user interface on all “slave” GeoServers or by configuring the front-end load balancer to only direct user interface requests to /geoserver/web to the master GeoServer.

Changes to the master GeoServer’s data catalog must be explicitly refreshed on slave instances. This can be accomplished manually through the GeoServer Admin web UI (/geoserver/web), or with some measure of automation (on a schedule, or after a trigger is fired) using GeoServer’s REST API (e.g. by sending a POST/PUT request to /geoserver/rest/reload?recurse=true).

Clustering Enhancements

Enhancements to our clustering story are coming! Specifically, in future releases of GeoServer the data directory will have the option to be database-backed. This means that a central configuration store can be queried more optimally than a file-based counterpart and doesn’t all need to be read into memory.

In the next post, we’ll go into the details on setting up a clustered instance. Remember, Enterprise: Platform clients and higher get custom clustering and deployment advice included in their maintenance agreements.

Have you been looking at deploying GeoServer in a clustered environment? Tell us about it!

Why We Sprint

I spent last week in Boston, attending an annual code sprint for C-based open source geospatial projects.  I’ve been doing this every year since 2008.  Since getting back, I’ve had to explain the event to several people, technical and non-technical, since the concept isn’t obvious at all.

p3

Open source development of characterized by some features that differ a great deal from traditional work environments:

  • the developers work asynchronously, often in different time zones, usually in different locations,
  • the developers coordinate exclusively using text tools, like e-mail, issue tracking systems, and sometimes instant messaging

Because there is no need to be in the same space with other developers, either physically or even temporally, the barriers to entry to a project are lowered. More people can participate than otherwise.

p1

However, there are disadvantages to working asynchronously and with text communications.

  • asking for help when you get stuck can be time consuming, because your colleagues might be asleep at the moment when help would be most useful
  • issues of subtlety or complexity take a great deal of text to describe, and any misunderstandings on the part of a reader take even more text to correct
  • discussion of emotional issues can lead to conflict due to the limited emotional nuance in text communication

A code sprint is a chance to work for a time with your open source colleagues “the old fashioned way”, face to face, on the same clock.

p2

Because everyone is together, and communications are high-bandwidth and high-fidelity, a code sprint is a great time for:

  • planning and designing large scale changes to the code
  • designing new APIs or new user interfaces, and
  • triaging ticket lists to prepare for release

I usually spend the first half of a sprint on communication-heavy tasks like the ones above. The second half I usually spend heads down on a hard piece of code.

If the right experts are around, code sprints are an excellent time to attack a new piece of code you don’t quite understand. Learning how a module works from the expert who wrote it is far faster than doing it alone at home.

And finally, having lunch and dinner and socializing usually provide the social space for unexpected topics to slip out and get a discussion, whether they be uncomfortable issues like dealing with a difficult team member or just a crazy feature idea that turns out to be not so crazy at all when discussed with the group.

If you have a chance to participate in a code sprint on a project you contribute to, don’t pass it up!

GeoScript in Action: Part One

This is the first post of a three-part series dedicated to showing the versatility and functionality of GeoScript. 

The rumors are true: GeoScript is pretty awesome. How awesome? We’ll let you be the judge. In this post we’ll focus on data exploration and show you how to create a few visualizations right from the GeoScript command line, but GeoScript can do much more than that. In subsequent posts we’ll build up enough code and data to create other GeoScript products, then we’ll show how to easily refactor this code into processing web services. If you’d like to follow along with this post please install GeoScript. For the purposes of these exercises make sure that you download the latest version. For our examples we’ll be using the Python flavor of GeoScript. Make sure to keep the API doc close at hand and don’t hesitate to experiment. If you get stuck, please contact us and we’ll try to help you out.

Getting the Source Data. We’ll be using solar resource data from the National Renewable Energy Lab. The data set contains direct normal irradiance (DNI) average values for the contiguous United States and Hawaii. If you want to find out more details should be contained within the metadata. You can download the entire data set to work through the examples below. Unzip to a directory, navigate to it, and fire up the GeoScript shell. (If you see some diagnostic messages along the way, just ignore them and remember that GeoScript is a work in progress.) Let’s get started!

Loading and Exploring Data. First, let’s load our two shapefiles into GeoServer:

>>> from geoscript.layer import Shapefile
>>> solar_all_poly = Shapefile("solar_dni_polygons.shp")
>>> states = Shapefile("usa_l48.shp")

Read the rest of this entry »

Learning How to use Spatial Data for Disaster Risk Management

I was recently presented with a fantastic opportunity; my manager approached me and asked:

“Hey Ian, what do you think about leading a GeoNode training in the Caribbean?”

Without hesitation, or asking for details I said, “Sign me up!”  Though my manager’s smirk should have tipped me off I had no idea how challenging the assignment would be. Nor did I realize that I would find the trip rewarding in ways other than what one typically thinks when that sunny, sandy, pina colada filled, cerulean-hued region of the world is mentioned.

The training was a part of the OpenDRI initiative and took place Feb 18-23 at the St. Augustine Campus of the University of the West Indies (UWI) in Trinidad and Tobago. The event, co-sponsored by the World Bank, the Global Facility for Disaster Reduction and Recovery (GFDRR), and the University of West Indies, brought together 40 GIS specialists and developers from around the Caribbean to provide them with the skills needed to better use, integrate, and extend GeoNode as a component of their specific spatial data infrastructure.

I lead a developer workshop in which 13 participants learned to install and configure GeoNode, create a custom “project”, theme it, and add new functionality related to security. We also threw in an unexpected (read: bonus) section on virtual networking (we used a
VirtualBox appliance and ran into potential problems related to running a
server with a changing IP address). In addition to my inane jokes, I shared all of my bash shortcuts and Python programming tips.

In the evening, there were discussions and presentations on topics such as OpenStreetMap, OpenDataKit, and NASA’s Pilot DRM Program. I’ll miss all of the fantastic Trinidadian food (chokas, coconut bake, roti, buss-up-shut and saltfish to name a few), along with the great people I met. Unfortunately I didn’t allot enough time to enjoy much of island but will look back on my one free afternoon in Maracas Bay fondly.

My special thanks to the World Bank, the Global Facility for Disaster Reduction and Recovery (GFDRR), and the University of West Indies for bringing me down. If you’re interested in such a training, or just want to fly me to nice sunny places please let us know!

How to publish GDAL/MrSID image formats on a production GeoServer on Windows

We had a support ticket recently about adding / enabling the GDAL and MrSID image formats to be able to be published by GeoServer. The client’s production server was Windows Server 2012 running Tomcat as a service. Below is a description of the steps taken to accomplish this.

We followed the instructions as part of our User Manual, but here I’ve added some screenshots from the specific implementation. For more information, please see the section on Enabling GDAL image formats support

Instructions for other operating systems are to be found at the above link, but are similar to this procedure. Also, this example is specific to Tomcat running as a service—instructions are slightly different when Tomcat is run as a local user process.

Out of the box

This is what things look like with a stock installation. Notice that there is no OGR format driver among the Vector Store types, and there aren’t that many Raster Store options.

Put the GDAL JAR in place

The next step is to copy the GDAL JAR to the classpath of GeoServer, which in this case was <TOMCAT_HOME>\webapps\geoserver\WEB-INF\lib. While in our case there already was a GDAL JAR there, I overwrote the existing file to ensure that I had the latest and greatest version (at the time of writing, gdal-1.9.1.jar) .

Add the GDAL libraries to Tomcat

Now we had to add the GDAL libraries to the Tomcat bin directory so that Tomcat would pick them up. This was also pretty straightforward—I copy and paste like a champion.

As with most library changes though, the application will need to be restarted before any changes will occur.

Restart Tomcat

Restarting Tomcat gets us most of the way there. OGR is now available as a Vector Store type and the Raster Store types include some but not all of the GDAL formats. These are the open format types, the ones that don’t require any additional proprietary drivers.

Add the MrSID libraries to Tomcat

The next step installs the piece needed to access proprietary rasters, in this case JP2MRSID and MrSID format. I downloaded the MrSID binaries from the link listed in the first step of the instructions and extracted it to <TOMCAT_HOME>\bin\gdalplugins.

One more restart and MrSID is available as an available datastore and can be published through GeoServer.

Have you published any interesting data using MrSID or GDAL image formats in GeoServer? We’d like to hear about it! Tell us in the comments below, or send us a note.

Enter GeoScript

OpenGeoMarkAt OpenGeo we’re committed to helping IT professionals break out of the traditional “GIS” workflow. Our goal is not to recreate desktop GIS on the web; but to bring “spatial” into the broader IT ecosystem. One way we’re doing this are with tools like GeoScript, and by bringing processing into our platform with OGC’s Web Processing Service (WPS) specification. By combining WPS and GeoScript, a web developer can create processes that perform complex analyses using familiar scripting languages like Python or JavaScript. This enables IT professionals to build Web applications that can run spatial processes against data stored anywhere using standard web development practices.

Read more about these tools after the break. And if you’d still like to find out more about GeoScript, through the web processing with the OpenGeo Suite, or anything else ‘open’ or ‘geo’ consider attending FedGeo Day this Thursday, 2/28/2013 in Washington, DC. I’ll be speaking about these topics at 1:30pm, it would be great to see you there.

Read the rest of this entry »

OpenGeo Suite 3.0.2 Released

This week we released the 3.02 version of the OpenGeo Suite. The primary reason for this release was to fix a small security issue in GeoExplorer. We encourage you all to upgrade to the most recent release. Over the last year there’s been a good deal of updates to GeoExplorer, you can check them out online, or download the suite. As always, OpenGeo Suite 3.0.2 is available for download free of charge with a 30-day trial of OpenGeo’s commercial support.

During the last release cycle we introduced two new support offerings—Community One-Time and Enterprise Plus— and they’ve been met with enthusiasm. For anyone who may have missed the 3.0.1 post we’d like to again let everyone know about these packages. The Community One-Time and Enterprise Plus packages have been added to meet growing demand from smaller organizations seeking support for open source geospatial software. Community One-Time offers enterprise support to Community Edition users for one incident for up to 15 business days. Enterprise Plus provides a year of enterprise support for smaller production environments with straightforward support requirements, such as those publishing data from basic geospatial formats and serving a limited number of users. These offerings have been out in the wild for about three months and are growing in popularity. If you think these, or any other package, would be a good fit for you don’t hesitate to contact us.

More information is available in the release notes and our pricing page.

Optimizing OpenLayers for Mobile Applications

Over the last few months we’ve had the pleasure of creating an HTML5 application using OpenLayers and Sencha Touch. This application is specifically designed to run on Motorola Xyboard 8.2 inch tablets running Android 3.2, but we may extend support for other tablets in the future. Our initial setup included the latest version OpenLayers and rendered points using a vector layer with a canvas renderer. However, this resulted in map display performance, and touch selection issues on Android 3.

We decided to explore additional options for touch selection. Our goal was to increase performance so that when the user hit a point icon it registered the first time, every time. To meet this goal we implemented the following:

  • Using an invisible (almost transparent) stroke around the points to provide a larger hit target. This approach only slightly improved things and made it hard for the user to hit the right point when they overlapped.
  • Using separate markers (DOM elements) instead of a canvas. We created a new markers renderer so we could reuse the existing vector layer configuration. Touch selection improved, but since each icon was a separate DOM element the map became unresponsive to dragging.
  • Leveraging the headless renderer concept. In this approach the POI overlay is rendered by a WMS, but the vector data is also available locally and used for feature selection. Whenever a touch event is received, hit points are determined by comparing their coordinate with the touch location. Then the selected feature is rendered client side with a different symbol to mark it as selected. This greatly improved touch selection and map dragging and was ultimately was the solution selected.

We also tried several approaches to improve the map’s drag responsiveness and visual feedback:

  • Using accumulated drag deltas. This approach limits how often the map updates during a drag operation. This improved responsiveness, but resulted in distracting visual feedback during dragging.
  • Using 3d transforms to enable GPU support. Most mobile devices have advanced graphics processing capabilities, allowing for rich graphics in mobile gaming. By off-loading the work to the GPU we observed significant performance improvements. In a follow up project we were able to further refine this approach and add smooth animated zooming to OpenLayers.
  • Keep references to image tiles for reuse. While working on a separate project we created a TileManager for smart tile queuing and caching. On mobile devices setting the source of an image is an expensive operation. For example, even if the tiles shown on the map were on previously viewed locations, dragging almost always creates a new tile. With the TileManger, tiles get cached so image sources only need to be set the first time a location is viewed. The project was delivered with an early version of the TileManager.

With the exception of headless renderer all mentioned improvements have been contributed back to the project and are now available in OpenLayers. Headless renderer was left out since it is so specific to the application, but it is available as a pull request. This application turned out to be a great opportunity to further OpenLayers mobile features, and we look forward to working on them more in the future.