Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

Bump pulsar version to 3.0.0-SNAPSHOT #6034

Closed
wants to merge 481 commits into from

Conversation

streamnativebot
Copy link

This is a PR created by snbot to trigger the check suite in each repository.

zymap and others added 30 commits August 17, 2023 16:13
…re conf (apache#20804)

### Motivation

The offload policies have limited the configurations for the offloaders.  That means if the offloader needs more configurations, we need to extend more fields in the OffloadPoliciesImpl. That doesn't make sense. We should make it extendable easily. Add a configuration map support to allow it to set more configurations.
…ocket.conf (apache#20840)

Motivation: Since the PR apache#16234 add the prop `cryptoKeyReaderFactoryClassName` for the WebSocket Proxy, but did not add this prop to `websocket.conf`.  This can make the script which try to replacement attribute a bit difficult to write

Modifications: add the conf `cryptoKeyReaderFactoryClassName` into the file `websocket.conf`
…currently (apache#20971)

### Motivation

**Background**: when calling `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>`, Pulsar will read the first entry which is not acknowledged, and respond with the entry write time. The flow is like this:
- get the mark deleted position of the subscription
- if no backlog, response `-1`
- else read the next position of the mark deleted position, and respond with the entry write time.

**Issue**: if the command `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` and `consumer.acknowledge` are executed at the same time, the step 2 in above flow will get a position which is larger than the last confirmed position, lead a read entry error.

| time | `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` | `consumer.acknowledge` |
| --- | --- | --- |
| 1 | mark deleted position is `3:1` and LAC is `3:2` now |
| 2 | the check `whether has backlog` is passed |
| 3 | | acknowledged `3:2`, mark deleted position is `3:2` now |
| 4 | calculate next position: `3:3` |
| 5 | Read `3:3` and get an error: `read entry failed` |

Note: the test in PR is not intended to reproduce the issue.

### Modifications

Respond `-1` if the next position of the mark deleted position is larger than the LAC
…pache#21035)

Motivation: After [PIP-118: reconnect broker when ZooKeeper session expires](apache#13341), the Broker will not shut down after losing the connection of the local metadata store in the default configuration. However, before the ZK client is reconnected, the events of BK online and offline are lost, resulting in incorrect BK info in the memory. You can reproduce the issue by the test `BkEnsemblesChaosTest. testBookieInfoIsCorrectEvenIfLostNotificationDueToZKClientReconnect`(90% probability of reproduce of the issue, run it again if the issue does not occur)

Modifications: Refresh BK info in memory after the ZK client is reconnected.
(cherry picked from commit db20035)
…ageId read reaches lastReadId (apache#20988)

(cherry picked from commit 9e2195c)
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
…ata when doing lookup (apache#21063)

Motivation: If we set `allowAutoTopicCreationType` to `PARTITIONED`, the flow of the create topic progress is like the below:
1. `Client-side`: Lookup topic to get partitioned topic metadata to create a producer.
1. `Broker-side`: Create partitioned topic metadata.
1. `Broker-side`: response `{"partitions":3}`.
1. `Client-side`: Create separate connections for each partition of the topic.
1. `Broker-side`: Receive 3 connect requests and create 3 partition-topics.

In the `step 2` above, the flow of the progress is like the below:
1. Check the policy of topic auto-creation( the policy is `{allowAutoTopicCreationType=PARTITIONED, defaultNumPartitions=3}` )
1. Check the partitioned topic metadata already exists.
1. Try to create the partitioned topic metadata if it does not exist.
1. If created failed by the partitioned topic metadata already exists( maybe another broker is also creating now), read partitioned topic metadata from the metadata store and respond to the client.

There is a race condition that makes the client get non-partitioned metadata of the topic:
| time | `broker-1` | `broker-2` |
| --- | --- | --- |
| 1 | get policy: `PARTITIONED, 3` | get policy: `PARTITIONED, 3` |
| 2 | check the partitioned topic metadata already exists | Check the partitioned topic metadata already exists |
| 3 | Partitioned topic metadata does not exist, the metadata cache will cache an empty optional for the path | Partitioned topic metadata does not exist, the metadata cache will cache an empty optional for the path |
| 4 |  | succeed create the partitioned topic metadata |
| 5 | Receive a ZK node changed event to invalidate the cache of the partitioned topic metadata |
| 6 | Creating the metadata failed due to it already exists |
| 7 | Read the partitioned topic metadata again |

If `step-5` is executed later than `step-7`, `broker-1` will get an empty optional from the cache of the partitioned topic metadata and respond non-partitioned metadata to the client.

**What thing would make the `step-5` is executed later than `step-7`?**
Provide a scenario: Such as the issue that the PR apache#20303 fixed, it makes `zk operation` and `zk node changed notifications`  executed in different threads: `main-thread of ZK client` and `metadata store thread`.

Therefore, the mechanism of the lookup partitioned topic metadata is fragile and we need to optimize it.

Modifications: Before reading the partitioned topic metadata again, refresh the cache first.
(cherry picked from commit d099ac4)
…che#21070)

### Motivation

Current, when the producer resend the chunked message like this:
- M1: UUID: 0, ChunkID: 0
- M2: UUID: 0, ChunkID: 0 // Resend the first chunk
- M3: UUID: 0, ChunkID: 1

When the consumer received the M2, it will find that it's already tracking the UUID:0 chunked messages, and will then discard the message M1 and M2. This will lead to unable to consume the whole chunked message even though it's already persisted in the Pulsar topic.

Here is the code logic:
https://github.com/apache/pulsar/blob/44a055b8a55078bcf93f4904991598541aa6c1ee/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1436-L1482

The bug can be easily reproduced using the testcase `testResendChunkMessages` introduced by this PR.

### Modifications

- When receiving the new duplicated first chunk of a chunked message, the consumer discard the current chunked message context and create a new context to track the following messages. For the case mentioned in Motivation, the M1 will be released and the consumer will assemble M2 and M3 as the chunked message.

(cherry picked from commit eb2e3a2)
… was failed (apache#20935)

The progress Persist mark deleted position is like this:
- persist to BK
- If failed to persist to BK, try to persist to ZK

But in the current implementation: if the cursor ledger was created failed, Pulsar will not try to persist to ZK. It makes if the cursor ledger created fails, a lot of ack records can not be persisted, and we will get a lot of repeat consumption after the BK recover.

Modifications: Try to persist the mark deleted position to ZK if the cursor ledger was created failed
(cherry picked from commit 843b830)
Technoboy- and others added 28 commits February 9, 2024 11:14
…22022)

### Motivation

For some use case, the users need to store all the messages even though these message are acked by all subscription.
So they set the retention policy of the namespace to infinite retention (setting both time and size limits to `-1`).  But the data in the system topic does not need for infinite retention. 

### Modifications

For system topics, do not retain messages that have already been acknowledged.
…e-auth-data` flag to update function/sink/source (apache#19450)

Co-authored-by: tison <[email protected]>
(cherry picked from commit 2b92ed1)
…rect zip/bytecode access (apache#22122)

(cherry picked from commit bbc6224)
…-io/solr (apache#22047)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 7a90426)
… exist and do not expect to create a new one. apache#21995 (apache#22004)

Co-authored-by: Jiwe Guo <[email protected]>
(cherry picked from commit 1c652f5)
…ack and delivery concurrency (apache#22090)

(cherry picked from commit 1b1cfb5)
@streamnativebot streamnativebot deleted the branch-3.0.0-SNAPSHOT branch March 4, 2024 21:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.