Curator has move to Apache. Please see: http://curator.apache.incubator.org
The previous Netflix branch is now in a branch named "archive".
- Possible final release - removed code branches that are now in Apache. Added a version of ZkClient bridge that uses Apache Curator.
- Issue 257: Fixed a race condition in LeaderLatch that cause the recipe to create two nodes in some edge cases.
-
Issue 250: Restore support for passing null to usingWatcher().
-
Issue 251: Allow a custom Executor Service to be used for PathChildrenCache.
-
DistributedDoubleBarrier wasn't handling wait expiration correctly and was sending negative numbers to wait().
-
Issue 254: Check that executorService isn't null before closing.
-
Pull 258: Fix bad performing use of Guava's transform.
-
MAJOR BUG FIX - Issue 232: ZooKeeper guarantees that "A watch object, or function/context pair, will only be triggered once for a given notification." Curator was breaking this guarantee by internally creating a new Watcher object each time one was needed. This is now fixed and ZooKeeper's guarantee is restored. Big thanks to user barkbay for his persistence and help on this.
-
Issue 247: POST_INITIALIZED_EVENT wasn't correctly handling an initially empty node.
-
Issue 245: Auth info specified in the CuratorFrameworkFactory.Builder was not being re-set in cases where the internal ZooKeeper handle was recreated. i.e. if the cluster has issues auth info would be lost.
-
The default watcher in the ZooKeeper handle is now cleared before the ZooKeeper handle is closed. This avoids an edge case where events meant for the old ZooKeeper handle get processed.
-
Tightened up a possible race deep inside the connection management.
-
PathChildrenCache.rebuild() and PathChildrenCache.rebuildNode() were not handling deleted nodes.
-
Issue 237: New feature. PathChildrenCache now optionally posts an event when the initial cache is populated. To accommodate this behavior there is a new version of start() that takes an enum. See the Javadoc for each value. For this new behavior, use StartMode.POST_INITIALIZED_EVENT. Once the cache is initialized a PathChildrenCacheEvent.Type.INITIALIZED will be posted. Huge thanks to user philflesh for the idea and co-implementation.
-
MAJOR CHANGE (thus a version bump): I'd always thought that if the client is disconnected from the server long enough then an Expired event would be generated. Testing, however, shows this not to be the case. I believe it's related to ZOOKEEPER-1159. The behavior associated with this is that if the clients lost connection to the cluster for longer than the session expiration they would never be able to reconnect. The connection would be permanently lost. Many users were seeing this as endless log messages indicating "Connection timed out for connection...". As a workaround, in 1.3.0+ when the Curator state changes to LOST, a flag will be set so that the next time Curator needs to get the ZooKeeper instance, the current instance will be closed and a new ZooKeeper instance will be allocated (as if the session had expired).
-
Added checks for illegal namespaces.
-
Issue 232: NodeCache wasn't handling server connection issues well. It would repeatedly execute checkExists() with a watcher causing the heap to fill with watcher objects.
-
Issue 233: An internal idiom being used to create an EnsurePath instance with the parent of a passed in path wasn't correct. Due to an unfortunate implementation of ZKPaths.PathAndNode (mea culpa) the root path is specified differently than non-root paths. To work around this, I added a method to EnsurePath - excludingLast() - that can be used instead of the idiom.
-
Issue 230: Added a filter to control which IP address are returned by ServiceInstanceBuilder.getAllLocalIPs(). Set the filter via ServiceInstanceBuilder.setLocalIpFilter().
-
Issue 214: Added rebuildNode method to PathChildrenCache.
-
Added a NodeCache to compliment the PathChildrenCache. The doc is here: https://github.com/Netflix/curator/wiki/Node-Cache
-
Creating nodes in background wasn't handling createParentsIfNeeded.
-
Issue 216: Rewrote LeaderLatch to better handle connection/server instability. At the same time, made most of the calls async which will help concurrency and performance.
-
Issue 217: DistributedAtomicLong (et al) should use ensurePath internally to be consistent with other recipes.
-
Issue 220: When creating a ServiceCacheImpl, a PathChildrenCache is created. The cache loads all existing services, but because preloading does not create events, ServiceCacheImpl never notices this. ServiceCacheImpl.getInstances() will return an empty list.
-
Issue 221: client.getACL().forPath("/") throws a NullPointerException, because the Zookeeper API expects a Stat, but GetACLBuilderImpl initializes responseStat to null.
-
Issue 222: Counter and log messages reversed in RetryLoop.takeException().
-
New feature: CuratorTempFramework. Temporary CuratorFramework instances are meant for single requests to ZooKeeper ensembles over a failure prone network such as a WAN. The APIs available from CuratorTempFramework are limited. Further, the connection will be closed after a period of inactivity. Based on an idea mentioned in a post by Camille Fournier: http://whilefalse.blogspot.com/2012/12/building-global-highly-available.html - details here: https://github.com/Netflix/curator/wiki/Temporary-Framework
-
Issue 224: ExponentialBackoffRetry was not protected against edge-cases where a too big maxRetries argument was used. It now also incorporates a maxSleep value.
-
Depend on ZooKeeper 3.4.5
-
Issue 177: PathChildrenCache wasn't shutting down the executor when closed. Also, reworked the event queue to avoid potential herding of messages in unstable conditions. The herding could result in runaway memory allocation as reported in the issue. NOTE: due to this change, the PathChildrenCache node refresh code and the PathChildrenCacheListener notification threads have been merged. Do not block for very long inside of your PathChildrenCacheListener or you will prevent the cache from getting updated.
-
Issue 200: Post-creation services registered in ServiceDiscovery via registerService() were not being treated the same as the service passed in the constructor. Consequently they wouldn't get re-registered if there were connection problems.
-
Creating nodes withProtection() is now supported in the background. e.g. client.create().withProtection().inBackground()...
-
Added methods to InterProcessSemaphoreV2: setNodeData() and getParticipantNodes() and, to the Lease interface, getData().
-
Issue 205 - already started error message was misleading.
-
Pull 209 - Fixed inconsistent API for get() in DiscoveryResource.java - thanks to user dougnukem
-
Issue 211 - Added getState() method to CuratorFramework.
-
Issue 212 - There wasn't a good way to update the data for a Service. I've added a new method ServiceDiscovery: updateService(). NOTE: this method requires all ServiceDiscovery instances to be using version 1.2.5 of Curator. Internally, ServiceCache now uses PathChildrenCache.
-
Pull 210 - For convenience, a version of {@link DiscoveryContext} that uses any generic type as the payload. Thanks to user dougnukem.
-
Depend on ZooKeeper 3.4.4
-
Added a new Examples sub project - better late than never.
-
Guaranteed deletes were not working correctly if CuratorFramework.usingNamespace() was used.
-
I can't believe this has been like this for so long. The executor passed to listeners was never used. Doh!!! Major bug.
-
Issue 188: Display a meaningful message if the value node is corrupted
-
Issue 194: Initial sync() operation should occur immediately - like the change in 1.2.3 for all "background" operations.
-
Added support for ZK 3.4's read only operation as described here: http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode - CuratorFrameworkFactory.Builder has a new method to set canBeReadOnly(). There is a new ConnectionState: READ_ONLY. Note: Your servers need to see a system property set "readonlymode.enabled" as true. This isn't documented anywhere that I can see.
-
Pull Request: 196 - Fix some issues with NamespaceFacade stemming from inconsistent state. Thanks to Answashe.
-
Issue 197 - Possible NullPointerException from ConnectionStateManager line 133 that is caused by a race condition. In CuratorFrameworkImpl, connectionStateManager.start() is called after client.start().
-
Previously, all background operations (i.e. when the inBackground() method is used) were put into a queue and processed in an internal thread. Curator does this to handle retries in background operations. This can be improved, however. The first time the background operation is executed the ZooKeeper method can be called directly - without queuing. This will get the operation into ZooKeeper immediately and will help prevent Curator's internal queue from backing up.
-
Issue 173: The DistributedQueue (and, thus, all the other queue recipes) was unnecessarily calling getChildren (with a watch) after each group of children was processed. It can just as easily wait for the internal cache to get its watch notified. This change creates an edge case, though, for ErrorMode.REQUEUE. Consequently, when in mode ErrorMode.REQUEUE the DistributedQueue now deletes the bad message and re-creates it. This required the use of ZooKeeper 3.4.x's transactions. So, if you use ErrorMode.REQUEUE you MUST be running ZooKeeper 3.4+.
-
NOTE: The 1.0.x branch is not being released and has been deprecated. It was advised many versions ago that this was coming. So, here it is.
-
For ZKClient Bridge: 1. Previous method of sending initial connect event to ZKClient was unreliable; 2. Added an option to not close the curator instance
-
The default connection timeout has increased to 15 seconds. The default session timeout has increased to 60 seconds. These both can now be overridden via system properties: "curator-default-connection-timeout" and "curator-default-session-timeout".
-
Thanks to Ben Bangert: the InterProcessSemaphore waiting semantics weren't ideal. The nth waiting node has to wait for all nodes in front of it. I've improved this a bit. However, the algorithm used still suffers from potential out of order acquisition as well as potential starvation if a given client does not release a lease. Therefore, I'm deprecating InterProcessSemaphore in favor of the new InterProcessSemaphoreV2 which is based on Ben's algorithm.
-
Issue 164: The PathChildrenCache no longer clears its internal state when there is a connection issue. Consequently, the PathChildrenCacheEvent.Type values have changed. Instead of a RESET event there are events that match the ConnectionState events.
- New extension project: "ZKClient Bridge". A bridge between Curator and ZKClient. Useful for projects that would like to use Curator but don't want to risk porting working code that uses ZKClient.
-
Issue 132: If namespace is set in builder, getNamespace() does not return it
-
Issue 131: If connection is lost to the server, the ServiceInstance needs to re-register once there is a re-connection.
-
PathChildrenCache was not sending UPDATE messages when a node's data changed in the case that false was passed in the constructor for cacheData.
-
Merge 136 from wt: Add eclipse support to gradle.
-
Merge 137 from pbh101: ConnectionState declares IOException, never throws it
-
Merge 114 from amuraru: Make sure internal executor services are not started until startup.
-
Merge 116 from samuelgmartinez: Fix for Issue 115: Wrong behaviour in LeaderLatch when a candidate loses connection
-
Issue 118: Ignore nonode exceptions when deleting lock path
-
Added a non-reentrant mutex: InterProcessSemaphoreMutex. This mutex doesn't have the threading restrictions that InterProcessMutex has. This should help with issues 75 and 117.
-
Merge 122 from ithaka that addresses Issue #98 - JsonInstanceSerializer does not deserialize custom payload types. IMPORTANT! This change introduces a breaking incompatibility with List payloads that will show up in environments that mix the old code and the new code. The new code will throw a JsonMappingException when it attempts to read a List payload written by the old code. The old code, when it reads a List payload generated by the new code, will generate a list with two elements, the first of which is the class name of the List type, and the second of which is the original list.
-
Issue 121: Apply bytecode rewriting to turn off JMX registrations to TestingServer as well as TestingCluster.
-
Issue 125: Use ScheduledThreadPoolExecutor instead of blocking threads that have period work.
-
Issue 126: Added getNamespace() method.
-
Issue 120: Additional check for connection loss in DistributedDoubleBarrier.
-
Added ChildReaper. This builds on the newly added Reaper. This utility monitors a single node and reaps empty children of that node.
-
Issue 107: The namespace wrapper was corrupting the path if the EnsurePath handler had an error. The best thing to do is let the code continue.
-
Issue 109: Make duplicate close() calls in CuratorFrameworkImpl a NOP instead of an error.
-
A more complete solution for background build-ups. The previous implementation did the retry sleep in the background process which ends up blocking ZooKeeper. During connection problems, this would cause ZooKeeper packets/watchers to back up. The new implementation uses a DelayQueue to simulate a sleep in the background. NOTE: this caused a change to the RetryPolicy APIs.
-
Merge #100 from bbeck: Added BoundedExponentialBackoffRetry.
-
Merge #102 from artemip: Added REAP_UNTIL_GONE mode to Reaper; Remove items from activePaths once they are deletes; Tests
-
Issue 99: The Double Barrier should allow more than the max to enter the barrier. I don't see any harm in this.
-
Issue 103: Important change/fix for ExhibitorEnsembleProvider: the previous implementation wasn't handling outages very well. The connectionString could get stuck to an old value if the list of Exhibitors all went down and couldn't be contacted. Now, a backup provider is required and the backup is used to update the list of Exhibitors should there be connection problems.
-
IMPORTANT NOTE: The 1.0.x branch of Curator is now end of life. There will be a few more releases but please migrate to the 1.1.x branch.
-
New queue features: a) bounded queues: use setMaxItems() in the builder to set an (approx) upper bound on the queue size; b) the builder now has an option to turn off background puts; c) queues now default to flushing remaining puts when closed - this can be controlled in the builder via finalFlushTime().
-
Issue 82: Generalized (and deprecated) nonNamespaceView() by adding the usingNamespace() method to allow getting a facade of the client that uses a specified namespace.
-
createParentsIfNeeded() should now perform better. Instead of "pre" checking, it now only does the check if KeeperException.NoNodeException is thrown. LockInternals now uses this method and, so, should perform a bit better.
-
Added a new utility: Reaper. This can be used to clean up parent lock nodes so that they don't stay around as garbage. See the Utilities wiki for details: https://github.com/Netflix/curator/wiki/Utilities
-
Unit tests should be a lot less noisy. A system property now turns off most internal error logging.
-
Issue 88: Children processor should wait for all nodes to be processed before fetching more items
-
Pull Request 81: Avoid invalid ipv6 localhost addresses
-
Another big bug: guaranteed deletions were not working with namespaces.
-
MAJOR BUG FIX!!!! Many of the Curator recipes rely on the internal class LockInternals. It has a bug that exhibits when the ZooKeeper cluster is unstable. There is an edge case that causes LockInternals to leak a node in the lock path that it is managing. This results in a deadlock. The leak occurs when retries are exhausted. NOTE: TestLockCleanlinessWithFaults now tests for this condition.
-
Added some missing combinations in the backgrounding API
-
Added QueueSharder utility. Due to limitations in ZooKeeper's transport layer, a single queue will break if it has more than 10K-ish items in it. This class provides a facade over multiple distributed queues. It monitors the queues and if any one of them goes over a threshold, a new queue is added. Puts are distributed amongst the queues.
-
Issue 80: Check for null data before decompressing data in getData().
-
Merge from user bbeck - enhanced the testing in-memory ZK server to handle some edge cases. A nice benefit is that it starts up faster. Thanks Brandon!
-
Generalized the ProtectedEphemeralSequential so that it works with any create mode. withProtectedEphemeralSequential() is deprecated in favor of the new method withProtection().
-
Update all uses of Preconditions to make sure they print a reasonable diagnostic message.
-
Added a new wrapped Watcher type that can throw exceptions as a convenience. The various usingWatcher() methods now can take CuratorWatcher instances.
-
InterProcessSemaphore and LeaderSelector weren't respecting the default bytes feature.
-
Make the default data for nodes be the local IP address. This helps in debugging and enables the deadlock analysis in Exhibitor.
-
New recipe added: DistributedDelayQueue
-
Based on suggestion in Issue 67: Added new concept of UriSpec to the ServiceInstance in the Service Discovery Curator extension.
-
User "Pierre-Luc Bertrand" pointed out a potential race condition that would cause a SysConnected to get sent before an Expired. So, now I push the event to the parent watcher before resetting the connection in ConnectionState.process(WatchedEvent)
-
New Feature: SessionFailRetryLoop. Huge thanks to Pierre-Luc Bertrand for his work on this. SessionFailRetryLoop is a special type of retry loop that causes all Curator methods in a thread to fail when a session failure is detected. This enables sets of Curator operations that must be tied to a single ZooKeeper session. See Tech Note 3 for details: https://github.com/Netflix/curator/wiki/Tech-Note-3
-
Several users have expressed dissatisfaction with the LeaderSelector implementation - requiring a thread, etc. So, LeaderLatch has been added which behaves a lot like a CountDownLatch but for leader selection.
-
Added methods to compress data via create() and setData() and to decompress data via getData(). The compression is GZIP by default. You can change this via the CuratorFrameworkFactory by specifying a CompressionProvider.
-
Added ZookeeperFactory to the client as a testing aid.
-
Added ACLProvider to make it easier to use ACLs and recipes. It can be set via the CuratorFrameworkFactory builder.
-
Several of the recipes were creating new watcher objects each time they were needed when the watcher(s) can be created once in the constructor.
-
Issue 62: DistributedQueue wasn't handling getting interrupted very well. It was logging an error.
-
Issue 64: wasn't handling SASL events. Any non-SysConnected event was being treated as a disconnection.
-
Issue 65: Accepted a pull request that fixes a bug in RetryUntilElapsed.
-
Issue 66: Bad log string - needed String.format()
-
Accepted a change so that testng is testCompile in Gradle
-
Rewrote TestingServer and TestingCluster based on work by Jeremie BORDIER (ahfeel)
-
Rewrote the log4j property files
-
Moved to ZK 3.4.3
-
More work on the Exhibitor integration
-
Moved to Gradle as the build system.
-
Added SimpleDistributedQueue, a drop-in replacement for the DistributedQueue that comes with the ZK distribution.
-
IMPORTANT CHANGE TO LeaderSelector. Previous versions of Curator overloaded the start() method to allow re-queueing. THIS IS NO LONGER SUPPORTED. Instead, there is a new method, requeue(), that does this. Calling start() more than once will now throw an exception.
-
LeaderSelector now supports auto re-queueing. In previous versions, it wasn't trivial to requeue the instance. Now, make a call to autoRequeue() to put the instance in a mode where it will requeue itself when the leader selector listener returns.
-
The mechanism that calls any kind of Curator listener wasn't protected against exceptions. Thus, an exception in a listener could break the listener event thread.
-
deleteDirectoryContents() no longer checks for sym links. This was a major issue in the Guava version and possibly one of the reasons they removed the method altogether.
-
Introduced a parent interface for Queues so that they can have some common methods
-
Added new Recipe: DistributedIdQueue - a version of DistributedQueue that allows IDs to be associated with queue items. Items can then be removed from the queue if needed.
-
Curator can now be configured to poll a cluster of Exhibitor (https://github.com/Netflix/exhibitor) instances to get the connection string to use with the ZooKeeper client. Should the connection string change, any new connections will use the new connection string.
-
Issue 27: This bug exposed major problems with the PathChildrenCache. I ended up completely rewriting it. The original version was very inefficient and prone to herding. This new version is as efficient as possible and adds some nice new features. The major new feature is that when calling start(), you can have the cache load an initial working set of data.
-
Issue 31: It turns out an instance of InterProcessMutex could not be shared in multiple threads. My assumption was that users would create a new InterProcessMutex per thread. But, this restriction is arbitrary. For comparison, the JDK Lock doesn't have this requirement. I've fixed this however it was a significant change internally. I'm counting on my tests to prove correctness.
-
EnsurePath wasn't doing its work in a RetryLoop.
-
Added a new class to the Test module, Timing, that is used to better coordinate timings in tests
-
LockInternals had a retry loop for all failures when it was only needed if the session expired and the lock node was lost. So, I refined the code to handle this specific case.
-
Issue 34: PathChildrenCache should ensure the path
-
Moved to Guava 11.x
-
Lots of work on the Gradle build. NOTE: Gradle will soon become the build system for Curator
-
Added listener to Queues to listen for put completion
-
Issue 24: If InterProcesMutex.release() failed to delete the node (due to connection errors, etc.) the instance was left in an inconsistent state that would cause a future call to acquire() to succeed without actually creating the lock. A new feature (see next bullet) was added to solve this problem: guaranteed deletes. The various lock-based recipes now use this feature.
-
New feature: guaranteed deletes. The delete builder now has a method that will record failed node deletions and attempt to delete them in the background until successful. NOTE: you will still get an exception when the deletion fails. But, you can be assured that as long as the CuratorFramework instance is open attempts will be made to delete the node: client.delete().guaranteed() ...
-
Issue 22: Make ServiceCache close itself down properly.
-
Issue 21: Move TestNG to the top-level pom and define its scope as test
-
Issue 17: ConnectionStateManager should use the builder's thread factory if present
-
1.1.x marks a separate branch of Curator:
- 1.0.x will stay compatible with ZooKeeper 3.3.x
- 1.1.x+ will require ZooKeeper 3.4.x+
-
Added support for ZooKeeper 3.4's Transactions:
- CuratorFramework has a new method: inTransaction() that starts a transaction builder
- See TestTransactions for examples
- Updated and tested against ZooKeeper 3.4.2
- Added a REST server for Service Discovery
- Switched to slf4j for logging
- Moved to 1.0 version
- Curator is now feature complete
-
Added Barrier
-
Added Double Barrier
-
Added Read/Write lock
-
Added revocation to InterProcessMutex
-
Fixed (hopefully) intermittent failures with testRetry()
-
Updates/enhancements to Discovery based on suggestions from Eishay Smith
- Added Service Discovery
-
Added new methods to LeaderSelector to identify/get all Participants
-
Moved to ZooKeeper 3.3.3
-
Made the TestingCluster not throw an assertion error due to internal JMX registrations in ZK. This is done with Javaassist ugliness.
-
Refactored listeners in Curator to a common methodology
-
Major changes to error handling. Adding a ConnectionStateManager that allows users to listen for connection changes. Connection loss is first treated as a recoverable Suspension. If the connection is not re-established, the state changes to connection loss. Any recipes that are affected by this have been updated.
-
PathChildrenCache now handles connection state changes much better.
-
All Curator created threads now have a meaningful name.
-
Jérémie Bordier posted on the ZK mailing list about a split brain issue with the Leader Selector. If the Leader is connected to a server that suffers a network partition, it needs to get notified that it has lost leadership. Curator handled this somewhat but only if the client application executed periodic ZooKeeper operations. I've enhanced the CuratorFramework implementation to check for disconnection and executed a background sync (with retries). This will cause any listener's unhandledError() method to get called when there is a network partition.
-
New utility: TestingCluster. Allows for testing with an in-memory ZK ensemble.
-
Reworked distributed atomic implementations. I was unhappy with the complexity of the previous one. Now, there's a simpler master implementation DistributedAtomicValue that is the basis for the others. Adding new versions should be simpler as well.
- Another pass at fixing the semaphore. Went back to the model of the count being merely a convention. Added a new recipe for a SharedCount that can be used in place of the count convention with the semaphore. This is the best of both worlds. The semaphore code is a lot simpler and will perform better. Thanks to Monal Daxini for the idea.