-
Notifications
You must be signed in to change notification settings - Fork 1
GSIP 69
Improved vertical scalability of Catalog resources (i.e. being able to efficiently manage hundreds of thousands of layers, styles, etc).
Gabriel Roldan
GeoServer 2.3.x master branch.
Under Discussion, In Progress, Completed, Rejected, Deferred
With the arrival of Virtual Services with workspaces , Workspace Local Services, and Workspace Local SettingsGeoServer gets more suited to Multitenancy and hence supporting a large number of configuration resources becomes even more important.
Prior art on this regard includes the development of the [DBConfig Module], which allows to externalize the storage of the configuration objects to a RDBMS using Hibernate O/R mapping, and hence adds the ability for the Catalog to scale up to an unbounded number of workspaces, stores, layers, etc.
Regardless of the Catalog’s backend ability to scale up, GeoServer
itself doesn’t gracefully scale as the number of config objects in the
catalog increases, since given the way the current Catalog
API is
designed, assumptions are made that full scans and defensive copies of
lists of catalog resources are cheap both in processing time and memory
consumption.
This proposal aims to provide a means to solve this problem in a way that allows to progressively adopt any API change throughout the code base where the benefits are clear and measurable.
Given a relatively large number of Catalog
configuration objects:
- Identify some exemplary use cases that result in scalability/performance bottle necks throughout the GeoServer code base;
- Identify the needed requirements and main QA goals to satisfactorily solve the problems described in the use cases;
- Design Catalog API enhancements that fulfill the requirements;
- To validate the API design by providing more than one concrete backend implementation, and to upgrade the
Catalog
client code from the exemplary use cases. - To provide general guidelines on how and when to progressively adopt the new API methods.
- It is not in this proposal’s scope to allow applications outside
GeoServer to directly edit the backend’s (RDBMS or other) configuration
objects.
CatalogFacade
andGeoServerFacade
implementations are free to use whatever storage format and mechanisms they see fit. That said, this proposal also doesn’t forbid Catalog/Config backend implementations to allow for applications outside GeoServer to directly edit the configuration objects.
Check the GSIP 69 - Use Cases page for further detail.
In attention to the above use cases, the following list of high level
requirements and QA goals shall be met by Catalog
API change
proposal:
- Filtering: Shall allow for filtering of catalog objects through arbitrary query criteria;
- Streaming: Shall allow for a streamed approach to catalog objects retrieval;
-
Paging: Shall allow for paged queries.
Catalog
backends shall provide a consistent “natural order” of resources. Doesn’t need to be based on id or any other prescribed property. - Leverage query engines: Shall allow to move any in-process filtering criteria back to the backend, allowing for optimization in the common cases;
- Query generality: in-process filtering shall work out of the box for the general case;
- Compactness: API changes should be additive and minimal;
- Usability: Easy of use and compactness is highly desired;
- Incremental adoption: Shall allow for progressive/iterative adoption;
- Leverage sub-system cohesion: Shall introduce no external dependencies at the API level.
Check the GSIP 69 - API Proposal page for further detail.
In this section two ways of validating the Catalog
API extension
from this proposal is presented. First, we’ll migrate the code from the
use cases to the new API to verify its usability and correctness. Then
we’ll provide a couple Catalog
back end implementations to verify
its implementability and effectiveness.
GSIP 69 - Use Case Code Migration
In addition to the default Catalog implementation , a JDBC based catalog and configuration storage has been developed.
The current prototype for the JDBC backend is located at this github branch. The jdbcconfig community module is based on the spring-jdbc framework, and utilizes a RDBMS (either H2 or PostgreSQL at the time of writing) as a key/value store with extra indices for Catalog objects ‘searchable’ properties. The key on this single-table store is the object identifier and the value it’s XStream representation, leveraging exactly the same serialization mechanism GeoServer uses for the on-disk catalog persistence. This is so to minimize the maintenance costs while the Catalog and configuration object model evolves, hence having to maintain only the XStream persistence code for both the on-disk and database back ends.
-
If you need to get a count of Catalog objects, use the
count
method instead ofgetXXX ().size ()
:int allLayers = catalog.count(LayerInfo.class, Predicates.acceptAll()); int workspaceLayers = catalog.count(LayerInfo.class, Predicates.equal("resource.workspace.id", workspaceId);
-
If only a subset of objects is needed, consider using a
Filter
instead of in-process filtering://BAD: for(LayerInfo layer : catalog.getLayers()){ if("topp".equals(layer.getResource().getStore().getWorkspace().getName()){ //do something with layer } } //GOOD: Filter filter = Predicates.equal("resource.store.workspace.name", "topp"); Iterator layers = catalog.list(LayerInfo.class, filter); try{ LayerInfo layer; while(layers.hasNext()){ layer = layers.next(); // do something with layer } }finally{ CloseableIteratorAdapter.close(layers); }
-
Push sorting to the backend:
//BAD: List styles = new ArrayList(catalog.getStyles()); Comparator comparator = new Comparator{ @Override public int compare(StyleInfo s1, StyleInfo s2){ return s1.getName().compareTo(s2.getName()); } } Collections.sort(styles);
//GOOD: boolean ascending = true; SortBy sortOrder = Predicates.sortBy("name", ascending); Iterator styles = catalog.list(StyleInfo.class, acceptAll(), null, null, sortOrder);
-
Use catalog backend’s paging, even if what you really want is a List and not an Iterator:
int startIndex = 50; int pageSize = 25; //BAD: List layers = catalog.getLayers(); List page = layers.subList(startIndex, startIndex + pageSize);
//GOOD: Iterator pageIterator = catalog.list(LayerInfo.class, acceptAll(), startIndex, pageSize, null); List page; try{ page = com.google.common.collect.Lists.newArrayList(pageIterator); }finally{ CloseableIteratorAdapter.close(pageIterator); }
This section should contain feedback provided by PSC members who may have a problem with the proposal.
Backwards compatibility is preserved since the API changes are additive only. All existing code using the current API will keep working untouched.
Andrea Aime: +1 Alessio Fabiani: Ben Caradoc Davies: +1 Gabriel Roldan: +1 Justin Deoliveira: +1 Jody Garnett: +1 Simone Giannecchini: +1
- JIRA Task
- Email Discussion *** *** [http://osgeo-org.1560.n6.nabble.com/Re-GSIP-69-Catalog-scalability-enhancements-OGC-Filters-VS-predicate-tc4936553.html] *** *** [http://osgeo-org.1560.n6.nabble.com/Re-GSIP-69-Catalog-scalability-enhancements-fast-startup-tc4941645.html] *** *** [http://osgeo-org.1560.n6.nabble.com/Re-GSIP-69-Catalog-scalability-enhancements-proof-of-concept-tc4936635.html]
- Current development branch
©2020 Open Source Geospatial Foundation