Author Archive

GeoServer in a clustered configuration (part 2)

In our last post on clustering, we talked about the theory behind some different options for clustering. In this post, we’ll go into an example of clustering, taken from our recent experience with one of our OpenGeo Suite Enterprise clients. If you’ll be attending FOSS4G-NA and want to learn more about clustering and GeoServer consider attending our GeoServer training and Juan Marin’s GeoServer in Production presentation (scheduled for 5/23/2013 at 11:30 am).

Clustering Scenario

In this following scenario, we will work through the installation and configuration of two GeoServers each inside their own servlet container instances on the same machine. Each servlet container will use the same JRE and the same container binaries (Apache Tomcat 7), but they will have independent configurations that allow them to run on different ports. These two GeoServer/Tomcat instances will be fronted by a local software proxy called HAProxy which acts as a HTTP/TCP load balancer. Load balancer configurations provide very basic “round robin” balancing of GeoServers. More sophisticated load-balancing configurations are possible, but are beyond the scope of this example. All GeoServers will be deployed as WAR files placed into each of the Tomcat webapps directories. It is possible to have multiple instances of Tomcat share a single web-application through the use of contexts. This is useful if you anticipate your web-application (GeoServer) will be changed/updated frequently, but isn’t necessary. Read the rest of this entry »

GeoServer in a clustered configuration (part 1)

Recently, we helped one of our clients who wanted to set up a GeoServer cluster. There are different ways to accomplish clustering depending on your specific needs, but we thought it would be illustrative to show what we did in this particular situation. Keep in mind this is a specific treatment and fairly tailored. We encourage you all to experiment with the newest features, but remember to do so in your testing environment!

We’ll start with some clustering theory and tips before launching into the actual details of how to do it.

Background

A computing cluster consists of two or more machines working together to provide a higher level of availability, reliability, and scalability than can be obtained from a single node. Nodes in a cluster are positioned behind a proxy server and/or load balancer that delegates requests to cluster members based on any one member’s ability/availability to handle load.

Clustering

Clustering

Similar to other applications with long-running in-memory states and high data I/O, GeoServer sees performance gains with two (or more) nodes clustered behind a load balancer—even with the slight overhead of the load balancer that sits in front of the cluster.

Generally, there are two complementary purposes for clustering GeoServer:

  • To provide high-performance and/or throughput
  • To achieve high availability

In the most demanding situations, GeoServer can be deployed in combinations of high-performance and high-availability instances.

High-Performance Clusters

A high-performance GeoServer configuration deploys several instances of GeoServer on a single machine.

High-performance cluster

High-performance cluster

Each GeoServer instance is deployed into its own servlet container (Tomcat, Jetty, etc.). Individual servlet containers are configured independently and spin up their own JVM, each with it’s own memory and processor allocations (borrowed from the pool of resources on the host machine). GeoServer’s memory and CPU runtime footprint are optimized for high throughput under heavy concurrency with such a deployment, but always consider that these different deployed units will compete for the physical server’s resources. To find the best balance we recommend, as always, to test for your particular scenario.

A load balancer or proxy fronts the cluster, and directs traffic to the member of the cluster most able to handle the current request. In this case, nodes will likely share the same server name or IP address, but listen for requests on different ports. For example:

Load Balancer @ http://<server>:80/<alias> forwards to one of:

  • GeoServer 1 in Tomcat 1 @ http://<server>/geoserver:8081
  • GeoServer 2 in Tomcat 2 @ http://<server>/geoserver:8082
  • GeoServer 3 in Tomcat 3 @ http://<server>/geoserver:8083
  • GeoServer 4 in Tomcat 4 @ http://<server>/geoserver:8084

An approach that deploys multiple instances of GeoServer into the same servlet container is not recommended. In this case, since host resource allocation (to a common JVM) will not be sequestered as neatly, competition for those resources will occur, limiting the benefits.

Users might also consider using the built-in clustering capabilities found in Enterprise Application Servers (such as Oracle Weblogic or JBoss), however this is beyond the scope of this discussion.

High-Availability Clusters

A high-availability implementation will spread several GeoServer instances across several machines (nodes) in a cluster. These nodes can be physical or virtual machines.

High-availability cluster

High-availability cluster

Nodes are normally located behind a load balancer that redirects traffic to any single GeoServer based on traffic volume and availability. In this case, nodes will likely be on different servers or IP addresses and listen for requests on the same port. For example:

Load Balancer @ http://<server>:80/<alias> forwards to one of:

  • GeoServer 1 in Tomcat 1 @ http://<server1>/geoserver:8080
  • GeoServer 1 in Tomcat 1 @ http://<server2>/geoserver:8080
  • GeoServer 1 in Tomcat 1 @ http://<server3>/geoserver:8080

Data directory location and catalog reloads

Some important considerations to be made when clustering several instances of GeoServer concern the location of the GeoServer data directory and a strategy for reloading all cluster members’ data catalogs.

The GeoServer data directory is the location in the file system where GeoServer stores its configuration information. The configuration defines things such as what data is served by GeoServer, where it is stored, and how services such as WFS and WMS interact with and serve the data. The data directory also contains a number of support files used by GeoServer for various purposes.

The spatial data accessed by GeoServer doesn’t need to reside within the GeoServer data directory, just pointers to the data locations. This should be obvious for data stored in spatial databases, which are certainly in different locations (on disk) and often on different machines; however the same is true for file-based spatial data. (Read more about the GeoServer data directory.)

GeoServer’s catalog is an in-memory representation of the configurations in the data directory. Storing the configurations in memory means that GeoServer can access this information faster than by reading these instructions off disk. However, this sometimes requires that the in-memory catalog be refreshed when configurations changes are made to the disk-based GeoServer data directory, or to the actual data served in GeoServer.

Unless catalog (re)configurations are largely static, or some amount of catalog discrepancy or availability is acceptable, a common GeoServer data directory location for all clustered instances is highly recommended.

The location of the GeoServer data directory is stored in the GEOSERVER_DATA_DIR variable. It can be configured in one of three ways: in each instance’s web.xml file (/webapps/geoserver/WEB-INF), through a common environment variable, or through a parameter passed to the JVM in the container start-up command.

Some implementations have clustered GeoServer instances using separate data directories that are synchronized manually (low change frequency) and automatically (using rsync), but neither approach is as common or recommended as a shared data directory.

Regardless of the mechanism for synchronization, changes to the data directory and the in-memory catalog will normally be directed by one master GeoServer. This can be enforced by disabling the GeoServer user interface on all “slave” GeoServers or by configuring the front-end load balancer to only direct user interface requests to /geoserver/web to the master GeoServer.

Changes to the master GeoServer’s data catalog must be explicitly refreshed on slave instances. This can be accomplished manually through the GeoServer Admin web UI (/geoserver/web), or with some measure of automation (on a schedule, or after a trigger is fired) using GeoServer’s REST API (e.g. by sending a POST/PUT request to /geoserver/rest/reload?recurse=true).

Clustering Enhancements

Enhancements to our clustering story are coming! Specifically, in future releases of GeoServer the data directory will have the option to be database-backed. This means that a central configuration store can be queried more optimally than a file-based counterpart and doesn’t all need to be read into memory.

In the next post, we’ll go into the details on setting up a clustered instance. Remember, Enterprise: Platform clients and higher get custom clustering and deployment advice included in their maintenance agreements.

Have you been looking at deploying GeoServer in a clustered environment? Tell us about it!

Creating dynamic contour maps with server-side processing

As an avid cartophile and hiker, I’ve always enjoyed working with contour maps—vector maps in which lines circumscribe areas of similar or equal value—as they provide just the right amount of context for the vertical aspect of a trail. I routinely use contour maps when evaluating hikes, since being able to see the densely-spaced contour lines at given points is much more helpful than just knowing the total elevation gain of a trail.

So, when I learned that the OpenGeo Suite 3.0 contains a process to generate contour lines from raster data, I jumped at the opportunity to play with it. Neat, I can create my own contour maps! My thoughts turned to Mt. Rainier National Park, where I spent some time this past summer; I was eager to test the contour generation process on the largest eminence in the contiguous United States.

Mount Rainier (Wikipedia)

Mt. Rainier raster data in GeoExplorer

The first step was to acquire topological raster data. Despite a somewhat cumbersome download process I was able to obtain high-quality data via the National Elevation Dataset from the USGS, provided.  I eventually acquired a GeoTIFF with one arcsecond detail and loaded the data into GeoServer. While I could have taken advantage of new raster support in PostGIS 2.0, those features primarily support analysis, not visualization.

There are two ways of generating a contour map using the tools in the OpenGeo Suite: statically and dynamically.

I started out with the static case, processing the raster data and generating a vector layer of the contours using the Web Processing Service (WPS), which allows for server-side processing of data in GeoServer. With WPS, you can perform complex calculations and conversions, either by pulling from almost 100 built-in processes or by creating your own. For my case, the gs:Contourprocess was sufficient for my needs. While quite rudimentary, I used the WPS Request Builder in the GeoServer UI to generate my contour bands, since it’s more straightforward than having to generate the raw XML code for executing the process.

Contour process in WPS Request Builder

The data that I possessed had elevation data in meters, from a low of about 500 meters above sea level to Mt. Rainier’s peak at 4,392 meters. For a nice round number I chose 100 meter increments and then ran this through the WPS process,  piping the output back into GeoServer. In this case I saved the output as a shapefile and then imported the layer back into GeoServer, but I could just as easily have chained the output of gs:Contour to the input of gs:Import to accomplish the same thing.

With this vector data published, I styled the output by adding rules to label the lines and draw the 500 meter interval lines thicker, all features one would expect from a contour map. I also optimized my cartography for the web by only drawing the 100 meter bands at certain zoom levels to keep the map from becoming too cluttered when zoomed out.

Contour map of Mt. Rainier

Once satisfied that my style worked with the generated vector output, I set about creating a contour map dynamically by utilizing new processing features in OpenGeo Suite 3.0.  Using rendering transformations eliminated the intermediate step of generating derivative data by instead applying the gs:Contour process directly to the raster data using the layer’s style. I only needed to specify the transformation in the SLD and associate it with my raster layer to get exactly the same output in real time—without generating any new vector layers!

Contour map of Mt. Rainier in GeoExplorer

Though I stopped at this point, there are other ways I could have improved this further. For example, instead of using the SLD to selectively display the 100 or 500 meter intervals, I could have tied the interval parameter to the scale denominator of the map. This way, I could zoom the map in all the way and have bands with, say, 10 meter intervals, and then zoom out and render bands with 1000 meter intervals. This would further improve the interactivity of the web map without adding much overhead.

Learn more about creating a dynamic contour map with GeoServer WPS, including the SLD code used in the creation of this map.

Have you tried using GeoServer WPS processes? What have you accomplished with them? Let us know about your experiences by commenting below. And, if you’re ever in Mt. Rainier, I highly recommend the Skyline Trail—just be sure to bring winter clothing, even in summer and at lower elevations there’s likely to be snow!

Building closed-source applications with OpenGeo Suite

At OpenGeo, we get lots of questions regarding licensing. The truth is that open source licenses are varied, broad, sometimes confusing, and are definitely not all created equal. We’ve written about licensing before on this blog, but there is of course more to say.

One question we get often from both prospective and current clients is:
Can a company that does not want to open their source code use the OpenGeo Suite?

Short answer

Yes.

Longer answer

The OpenGeo Suite is licensed GPL v2. This license applies to all flavors of the OpenGeo Suite (Community, Enterprise, Cloud, etc.). The components of the OpenGeo Suite use the following licenses:

With all of this in mind, it is possible to create closed-source applications that depend on the OpenGeo Suite without having to distribute the code. However, if you are making code-level changes to the OpenGeo Suite or its components, you are obligated to redistribute those changes if you redistribute the software. So, the obligation to open source applies to modifications of the source code, not to any applications that leverage the software. You are free to provide whatever licensing on your code that you deem is appropriate.

Many people ask about dual-licensing the OpenGeo Suite (much like how Sencha does with Ext JS). However, as we’re a true open source company, we don’t own all of the intellectual property for our source code and are just as obligated by the license to distribute modifications as anyone else is. So, dual-licensing just isn’t an option.

While we promote open source application development (and do so in house), we respect that everyone’s needs are different and strive to be as accommodating as possible to any organization that wishes to use the OpenGeo Suite.

Hopefully this will clear up some confusion surrounding licensing, but as always, if you have questions, please feel free to comment on this post, send us a message on Twitter, or even a private message through our contact form. Happy coding!

Five things you didn’t know about GeoExplorer

GeoExplorer is a map composition tool that comes bundled with the OpenGeo Suite. Most people know that you can use it as a layer browser for displaying content not only from your local GeoServer but from any compliant WMS, including MapServer and Esri ArcGIS Server. It even supports hosted services like Google Maps and MapBox.

But, if you think you know everything there is to know about GeoExplorer, think again. Here are five things you may not have known about GeoExplorer:

1. GeoExplorer has built-in tools for graphical styling and editing

Back in the halcyon days of 2011, we had three demonstration applications: GeoExplorer for map composition, Styler for graphical styling, and GeoEditor for graphical editing. Eventually, we realized our users preferred one tool that could accomplish all of these without having to switch back and forth. So, as we refined and rebuilt GeoExplorer, we added in those tools that allowed for styling and editing.

The styling tool has a rule editor, where one can set options such as color, opacity, and shape. You can also set conditions for display, such as scale rules. The results are saved directly back to GeoServer and are displayed in real time.

With the editing tools, one can edit both the attributes and geometries of a feature by clicking directly on the map. One can also create new features and delete features as well. The results are posted back to GeoServer through WFS-Transactions.

Styling and editing require that GeoExplorer be deployed in the same container as GeoServer and that GeoExplorer be authenticated to this GeoServer instance. After all, security is very important in web publishing—you don’t want to be able to allow read/write access to any application in the wild!

Note: Styler and GeoEditor are still available in the latest version of the OpenGeo Suite, however, development has been discontinued and they will be removed when the upcoming version is released in a few months. While not linked from the Dashboard anymore, they can still be found in the same place, by default http://localhost:8080/styler and http://localhost:8080/geoeditor.

2. GeoExplorer allows uploading of Shapefiles and GeoTIFFs

While importing data to GeoServer has been possible for a while, you can actually upload Shapefiles and GeoTIFFs directly into GeoServer through the GeoExplorer interface. Just click on the Upload Layers button and select your files. Shapefiles need to be zipped but GeoTIFFs don’t. (While you can technically zip up a whole directory of shapefiles and upload them in one go, we recommend using the GeoServer layer importer, available from the Dashboard or GeoServer sidebar, for that operation.)

3. GeoExplorer makes use of server caching

Wondering what makes GeoExplorer so speedy? That’s because it uses GeoWebCache, the built-in caching server in GeoServer, to cache tiles on the fly. To avoid stale tiles, when a change happens to a layer in GeoExplorer (via styling or editing), a request is sent back to GeoWebCache to truncate the cache.

Don’t want to use caching in your GeoExplorer display? No problem. Simply click on the Layer Properties for the specific layer, got to the Display tab, and uncheck the “Use cached tiles” option.

4. GeoExplorer can export maps to PDF

With GeoExplorer, you can compose a map and click the Print button to export the map view as a fully vectorized PDF. While the tool is still a bit rudimentary for professional map publishing standards, it is often sufficient for basic uses.

5. GeoExplorer is built with the OpenGeo Client SDK

At OpenGeo, we not only build our own tools, but we use them as well. GeoExplorer is built using the Client SDK, a toolset (built on GeoExt and OpenLayers) for building web mapping applications using simple JSON for configuration. GeoExplorer is just one example of what is possible with the Client SDK. If you’re just starting out, we have a tutorial on building an app with the SDK.

What about you?

What cool things have you done with GeoExplorer? Let us know in the comments below, or by sending us a message on Twitter. If you haven’t tried out GeoExplorer, you can get it as part of the OpenGeo Suite.

OpenGeo Suite 2.4.5 released

We are excited to release a new version of the OpenGeo Suite! In order to capture the many improvements and bug fixes happening in the open source community, we are moving toward a more rapid release cycle. For example, GeoServer now has JDBC datastore session startup/teardown SQL comments, as well as support for paletted PNG images with alpha transparency.

In GeoExplorer (which really is pretty amazing, if you haven’t seen it recently) there is now smoother tile display, including fade-in. Also, the map legend has now been integrated directly into the layer tree. Finally, we have changed the default base layer to be MapQest OSM, moving away from Google (though Google base layers are still available).

All of these new features are available in the Community Edition, Enterprise Edition (which includes a free 30 day trial of our support), and all Cloud Editions! Try any version you’d like and contact us to purchase the support you need to put your project into production!

OpenGeo Suite 2.4.4 released

The OpenGeo team is excited to announce the release of OpenGeo Suite 2.4.4. This is the first new version in a few months so there have been lots of stability improvements and updates.

GeoServer incorporates the new features from the recently released GeoServer 2.1.3. It now has Basic HTTP authentication for cascaded WMS servers, a feature that has been asked for by a number of our clients. GeoServer also has support for non-advertised layers, with layers configured and active, yet not publicized in the capabilities documents. We’ve also incorporated the Web Processing Service (WPS) extension and, for our European friends, enhanced the GeoServer INSPIRE extension to better support View Service.

The GeoServer-embedded GeoWebCache now has a significantly improved UI, exposing many options previously only configurable via a text editor. It’s now possible to add a new layer, configure tile size, view disk quotas, enable GWC services and cache formats.

GeoExplorer has improved stability when deployed under Glassfish and WebSphere containers. Logout functionality has now been exposed, based on many user requests. In general, GeoExplorer now has a faster loading of JavaScript resources.

The OpenGeo Suite is and continues to be 100% open source and we’ve migrated the source code onto GitHub to improve our development process and make it easier for anyone to check out our source code.

We invite everyone to check out our new release—register for a trial of the Enterprise Edition or download the free (but unsupported) Community Edition. If you’re looking for support, unlimited bug fixes, access to core developers, updates, telephone support, and even custom development hours, we invite you to consider becoming an OpenGeo Suite Enterprise Edition client.

Thanks to everyone who submitted bug reports and feature requests. Thanks as well to all developers involved in our component projects. Finally, thanks to our current Enterprise Edition clients, who enable to us to continue to develop the best geospatial software.

OpenGeo Suite now on GitHub

The OpenGeo Suite team has migrated all of our source code over to Git from Subversion, and we are now hosting the code on GitHub. This follows the trend of lots of open source software projects toward a distributed version control system.

Switching from Subversion to Git has all sorts of benefits for the development team, as well for anyone interested in playing with the code. There are numerous sites that detail the advantages of Git (we particularly like this one), but it will allow us to more easily incorporate features for our clients, manage multiple release streams, and work simultaneously without breaking development for everyone else. As the client base of the OpenGeo Suite grows (and as more and more people download the free Community Edition) this change has been a long time in coming.

You can also visit OpenGeo’s main GitHub repository as well as the main repositories for GeoExplorer, GXP, and more. Please fork the code and play around. If you have patches, feel free to send us a pull request. While we can’t guarantee that all patches will be accepted, we value every suggestion we receive.

If you have thoughts about our svn to git conversion, we’d love to hear about in the comments section. Though please, no x-is-better-than-y wars. Each one of us is correct!

Why choose? A hybrid approach to GIS

Earlier this year Esri released a white paper highlighting the benefits of open source and open specifications. No, that wasn’t a joke; the article is real, and well worth a read. It is a solid summation of open source in the marketplace and discusses the differences between open source software and open standards (what Esri calls “open specifications”). But more fundamentally, in this paper, Esri makes a bold and sensible claim that may surprise some people:

Deciding between open source and ArcGIS is not an either/or question. Esri encourages users to choose a hybrid model, a combination of open source and closed source technology, based on their needs.

The paper goes on to talk about Esri’s integration with various open source projects, from their ArcGIS Editor for OpenStreetMap, to their integration of Python into ArcGIS 10 (ArcPy), to their Geoportal Server, which is hosted on SourceForge. On the use of hybrid technology, we are in firm agreement. From the beginning, we have designed our software with integration in mind. For example, the OpenGeo Suite can connect to a number of proprietary databases, including ArcSDE, Oracle Spatial, IBM DB2, and Microsoft SQL Server, and the list continues to grow. In addition, with GeoCat Bridge you can publish data from ArcGIS Desktop to the web with the OpenGeo Suite.

Why would we advocate for proprietary systems? Simply put, we always suggest using the right tool for the job. Esri has great desktop tools, but on the server side there are faster, more reliable, more flexible options that support more standards. It can make sense to use ArcGIS Desktop and then use GeoCat Bridge to publish directly to the OpenGeo Suite. Or to use ArcSDE for data collaboration, then connect to the OpenGeo Suite to serve to the web. We know you have options when choosing any piece of software: Apache Tomcat versus IBM WebSphere, PostgreSQL versus Oracle Spatial, QGIS or uDIG versus ArcGIS Desktop, and, of course, the OpenGeo Suite versus ArcGIS Server. While we feel that open source holds the best route forward for software development, we are happy to give advice on the pros and cons of various architectures. The OpenGeo Suite Enterprise Edition clients use a variety of solutions that meet their needs. We’ll be publishing some white papers in the near future to help you compare the different software options in the marketplace; and we applaud Esri for their moves toward open source, and appreciate their candor in promoting a hybrid model.

OpenGeo Suite 2.4.3 released

We’re happy to announce the release of OpenGeo Suite 2.4.3!

For the first time ever, we’re releasing the Enterprise and Community editions of the OpenGeo Suite simultaneously. We’re even updating our Cloud offerings on both Skygone and Amazon Web Services. Aligning our release process to account for all tiers seems to be a sensible step, and one that we have been working toward for a while behind the scenes.

So what’s the difference? Glad you asked. The OpenGeo Suite Enterprise Edition comes with valuable add-ons for administrators, such as Suite Analytics for graphically viewing and managing server load. No need to go digging through the logs when you can get a report of all the failed requests right in your browser. You can even see where your requests are originating from, due to an embedded IP-based geolocation service.

It’s more than just the add-ons. The OpenGeo Suite comes complete with the entire OpenGeo Suite team! (We’re glad companies don’t ship software boxes anymore.) You get access to the core developers of all the components, unlimited bug fixes, updates, and even custom development hours on some plans. We understand that commercial support is one of the key barriers to adoption of open source software, and our clients allow us to reinvest directly into our communities, furthering development of the software in line with our core mission of bringing the best practices of open source software to organizations around the world.

See what’s new in this release. And then download a free trial of the Enterprise Edition (or the Community Edition) today!