From 88d70e88fd61bed3a336c9a2e4675538e20c1c85 Mon Sep 17 00:00:00 2001 From: Haochen Pan Date: Sat, 27 Jan 2024 09:25:26 -0600 Subject: [PATCH] readme --- README.md | 24 ++--- docs/README_old.md | 185 -------------------------------------- docs/quickstart.md | 27 +++--- tests/unit/test_client.py | 2 +- 4 files changed, 25 insertions(+), 213 deletions(-) delete mode 100644 docs/README_old.md diff --git a/README.md b/README.md index 623aa0f..be08b30 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,27 @@ -

Diaspora Event Fabric: Resilience-enabling services for science from HPC to edge

+# Diaspora: Resilience-enabling services for science from HPC to edge -## Installation -### Recommended Installation with Kafka Client Library -The `KafkaProducer` and `KafkaConsumer` classes within the SDK are designed for seamless integration with Diaspora Event Fabric using pre-configured settings. For utilizing these classes, the `kafka-python` library is necessary. +## Event Fabric SDK Installation Guide +### Recommended Method: Use with `kafka-python` +For easy integration with Diaspora Event Fabric, use the `KafkaProducer` and `KafkaConsumer` classes from our SDK. This requires the `kafka-python` library. -To install the Diaspora Event SDK along with `kafka-python,` execute: +To install the Event Fabric SDK and `kafka-python,` with the following command: ```bash pip install "diaspora-event-sdk[kafka-python]" ``` -### Installation Without Kafka Client Library -If you prefer using different client libraries for Kafka communication, you can install the SDK without the kafka-python dependency. The SDK still serves for topic-level access control (authorization) and login credential management (authentication). +### Alternative Installation: Without Kafka Client Library +To use alternative Kafka client libraries (e.g., `confluent-kafka-python`, `aiokafka`, and libraries for other programming laguages), you can install the SDK without the `kafka-python` dependency. This option still provides topic-level access control (authorization) and login credential management features. -To install the SDK without client libraries, simply run: +To install the SDK without `kafka-python`, use: ```bash pip install diaspora-event-sdk ``` -Note: This does not install the necessary dependency for `KafkaProducer` and `KafkaConsumer` classes. +Note: This method does not include dependencies for `KafkaProducer` and `KafkaConsumer` classes mentioned in the QuickStart ## Use Diaspora Event Fabric SDK -Please refer to our [QuickStart Guide](docs/quickstart.md) for recommended use with `kafka-python` library as well as steps to use your own Kafka client. +**Getting Started**: Visit our [QuickStart Guide](docs/quickstart.md) for details on using the SDK with the kafka-python library and instructions for other Kafka clients. -Please refer to our [TrobleShooting Guide](docs/troubleshooting.md) for debugging common problems and effective key management strategies. +**Troubleshooting and Credential Management**: Consult our [TrobleShooting Guide](docs/troubleshooting.md) for solving common issues and tips on managing keys effectively. -[Topic: Use faust to Process Records](docs/faust_weather_app.md) \ No newline at end of file +**Advanced Usage**: Explore the [Faust Streaming Guide](docs/faust_weather_app.md) for advanced event streaming with Faust. diff --git a/docs/README_old.md b/docs/README_old.md deleted file mode 100644 index 4c558fc..0000000 --- a/docs/README_old.md +++ /dev/null @@ -1,185 +0,0 @@ -

Diaspora Event Fabric: Resilience-enabling services for science from HPC to edge

- -- [Installation](#installation) - * [Recommended Installation with Kafka Client Library](#recommended-installation-with-kafka-client-library) - * [Installation Without Kafka Client Library](#installation-without-kafka-client-library) -- [Use Diaspora Event SDK](#use-diaspora-event-sdk) - * [Use the SDK to communicate with Kafka (kafka-python Required)](#use-the-sdk-to-communicate-with-kafka--kafka-python-required-) - + [Register Topic (create topic ACLs)](#register-topic--create-topic-acls-) - + [Block Until Ready](#block-until-ready) - + [Start Producer](#start-producer) - + [Start Consumer](#start-consumer) - + [Unregister Topic (remove topic ACLs)](#unregister-topic--remove-topic-acls-) - * [Use Your Preferred Kafka Client Library](#use-your-preferred-kafka-client-library) - + [Register and Unregister Topic](#register-and-unregister-topic) - + [Cluster Connection Details](#cluster-connection-details) - * [Advanced Usage](#advanced-usage) - + [Password Refresh](#password-refresh) -- [Common Issues](#common-issues) - * [ImportError: cannot import name 'KafkaProducer' from 'diaspora_event_sdk'](#importerror--cannot-import-name--kafkaproducer--from--diaspora-event-sdk-) - * [kafka.errors.NoBrokersAvailable and kafka.errors.NodeNotReadyError](#kafkaerrorsnobrokersavailable-and-kafkaerrorsnodenotreadyerror) - * [kafka.errors.KafkaTimeoutError: KafkaTimeoutError: Failed to update metadata after 60.0 secs.](#kafkaerrorskafkatimeouterror--kafkatimeouterror--failed-to-update-metadata-after-600-secs) - * [ssl.SSLCertVerificationError](#sslsslcertverificationerror) - - -## Installation -### Recommended Installation with Kafka Client Library -`KafkaProducer` and `KafkaConsumer` classes in the SDK enable using Diaspora Event Fabric with ready-made configurations. To use these two classes,`kafka-python` library installation is required. - -To install Diaspora Event SDK with `kafka-python`, run: -```bash -pip install "diaspora-event-sdk[kafka-python]" -``` - -### Installation Without Kafka Client Library -If you plan to use other client libraries to communicate with Kafka, you can install the SDK without `kafka-python` dependency. - -To install the SDK without client libraries, simply run: -```bash -pip install diaspora-event-sdk -``` -Note that this option does not install the necessary dependency for `KafkaProducer` and `KafkaConsumer` below to work. - -## Use Diaspora Event SDK -### Use the SDK to communicate with Kafka (kafka-python Required) - -#### Register Topic (create topic ACLs) - -Before you can create, describe, and delete topics we need to set the appropriate ACLs in ZooKeeper. Here we use the Client to register ACLs for the desired topic name. - -```python -from diaspora_event_sdk import Client as GlobusClient -c = GlobusClient() -topic = "topic-" + c.subject_openid[-12:] -print(c.register_topic(topic)) -print(c.list_topics()) - -# registration_status = c.register_topic(topic) -# print(registration_status) -# assert registration_status["status"] in ["no-op", "success"] - -# registered_topics = c.list_topics() -# print(registered_topics) -# assert topic in registered_topics["topics"] -``` -Register a topic also creates it, if the topic previously does not exist. - -#### Block Until Ready -`KafkaProducer` and `KafkaConsumer` would internally call `create_key` if the the connection credential is not found locally (e.g., when you first authenticated with Globus). Behind the sence, the middle service contacts AWS to initialize the asynchronous process of creating and associating the secret. The method below blocks until the credential is ready to be used by producer and consumer. When the method finishes, it returns True and the producer and consumer code below should work without further waiting. By default, the method retries in loop for five minutes before giving up and return False. Use parameter `max_minutes` to change the number of minutes of max waiting. - -```python -from diaspora_event_sdk import block_until_ready -assert block_until_ready() -``` - -#### Start Producer - -Once the topic is created we can publish to it. The KafkaProducer wraps the [Python KafkaProducer](https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html) Event publication can be either synchronous or asynchronous. Below demonstrates the synchronous approach. - -```python -from diaspora_event_sdk import KafkaProducer -producer = KafkaProducer() -future = producer.send( - topic, {'message': 'Synchronous message from Diaspora SDK'}) -print(future.get(timeout=10)) -``` - -#### Start Consumer - -A consumer can be configured to monitor the topic and act on events as they are published. The KafkaConsumer wraps the [Python KafkaConsumer](https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html). Here we use the `auto_offset_reset` to consume from the first event published to the topic. Removing this field will have the consumer act only on new events. - -```python -from diaspora_event_sdk import KafkaConsumer -consumer = KafkaConsumer(topic, auto_offset_reset='earliest') -for msg in consumer: - print(msg) -``` - -#### Unregister Topic (remove topic ACLs) -```python -from diaspora_event_sdk import Client as GlobusClient -c = GlobusClient() -topic = "topic-" + c.subject_openid[-12:] -print(c.unregister_topic(topic)) -print(c.list_topics()) -``` - -### Use Your Preferred Kafka Client Library - -#### Register and Unregister Topic -The steps are the same as above by using the `register_topic`, `unregister_topic`, and `list_topics` methods from the `Client` class. - -#### Cluster Connection Details -| Configuration | Value | -| ----------------- | ------------------------------------------------------------------- | -| Bootstrap Servers | [`MSK_SCRAM_ENDPOINT`](/diaspora_event_sdk/sdk/_environments.py#L6) | -| Security Protocol | `SASL_SSL` | -| Sasl Mechanism | `SCRAM-SHA-512` | -| Api Version | `3.5.1` | -| Username | (See instructions below) | -| Password | (See instructions below) | - -Execute the code snippet below to obtain your unique username and password for the Kafka cluster: -```python -from diaspora_event_sdk import Client as GlobusClient -c = GlobusClient() -print(c.retrieve_key()) -``` - -### Advanced Usage -#### Key Migration -In case you want to use the same credential (OpenID, secret key) on the second macine, but a call to create_key() invalidates all previous keys so you cannot rely on this API for generating a new key for the new machine, while keeping the old key on the old machine valid. Starting at 0.0.19, you can call the following code to retrieve the active secret key on the first machine and store it to the second machine, so that both machines hold valid keys. - -On the first machine, call: -```python -from diaspora_event_sdk import Client as GlobusClient -c = GlobusClient() -print(c.get_secret_key()) # note down the secret key -``` - -On the second machine, call: -```python -from diaspora_event_sdk import Client as GlobusClient -from diaspora_event_sdk import block_until_ready - -c = GlobusClient() -c.put_secret_key("") -assert block_until_ready() # should unblock immediately -``` -Note that a call to create_key() on any machine would invalidate the current key, on both machines. - -#### Password Refresh -In case that you need to invalidate all previously issued passwords and generate a new one, call the `create_key` method from the `Client` class -```python -from diaspora_event_sdk import Client as GlobusClient -c = GlobusClient() -print(c.create_key()) -``` -Subsequent calls to `retrieve_key` will return the new password from the cache. This cache is reset with a logout or a new `create_key` call. - -## Common Issues - -### ImportError: cannot import name 'KafkaProducer' from 'diaspora_event_sdk' - -It seems that you ran `pip install diaspora-event-sdk` to install the Diaspora Event SDK without `kafka-python`. Run `pip install kafka-python` to install the necessary dependency for our `KafkaProducer` and `KafkaConsumer` classes. - -### kafka.errors.NoBrokersAvailable and kafka.errors.NodeNotReadyError -These messages might pop up if `create_key` is called shortly before instanciating a Kafka client. This is because there's a delay for AWS Secret Manager to associate the newly generated credential with MSK. Note that `create_key` is called internally by `kafka_client.py` the first time you create one of these clients. Please wait a while (around 1 minute) and retry. - -### kafka.errors.KafkaTimeoutError: KafkaTimeoutError: Failed to update metadata after 60.0 secs. -**Step 1: Verify Topic Creation and Access:** -Before interacting with the producer/consumer, ensure that the topic has been successfully created and access is granted to you. Execute the following command: - -```python -from diaspora_event_sdk import Client as GlobusClient -c = GlobusClient() -# topic = -print(c.register_topic(topic)) -``` -This should return a `status: no-op` message, indicating that the topic is already registered and accessible. - -**Step 2: Wait Automatic Key Creation in KafkaProducer and KafkaConsumer** -`KafkaProducer` and `KafkaConsumer` would internally call `create_key` if the keys are not found locally (e.g., when you first authenticated with Globus). Behind the sence, the middle service contacts AWS to initialize the asynchronous process of creating and associating the secret. Please wait a while (around 1 minute) and retry. - -### ssl.SSLCertVerificationError -This is commmon on MacOS system, see [this StackOverflow answer](https://stackoverflow.com/a/53310545). \ No newline at end of file diff --git a/docs/quickstart.md b/docs/quickstart.md index 9e77167..674c3a0 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -11,7 +11,7 @@ topic = ... # e.g., "topic-" + c.subject_openid[-12:] print(c.register_topic(topic)) print(c.list_topics()) ``` -Expect `success` or `no-op` for the first print, and a list including your topic for the second. For group projects, contact Haochen or Ryan to access topics that have been registered by others before. +Expect `success` or `no-op` for the first print, and a list including your topic for the second. For group projects, contact Haochen or Ryan to access topics that have been registered by other users before. Aside from topic authorization, authentication requires a username (user's OpenID) and password (AWS secret key in `$HOME/.diaspora/storage.db`). If the secret key is missing (e.g., new login), `KafkaProducer` or `KafkaConsumer` instance would internally call `create_key()` to generate and store one. However, the key takes 30 seconds to 2 minutes to activate due to AWS processing. @@ -23,13 +23,12 @@ assert block_until_ready() # now the secret key is ready to use ``` -This function waits until the key activates, usually within 30 seconds to 2 minutes. Subsequent `block_until_ready()` calls should return in 1-10 seconds. Still, include this primarily in test/setup scripts, but not on the critical (happy) path. - +This function waits until the key activates, usually within a few seconds. Subsequent `block_until_ready()` calls should return even faster. Still, include this in test/setup scripts, but not on the critical (happy) path. ## Producing or Consuming -Once the topic is registered and the secret key is ready, we can publish messages to it. The `KafkaProducer` wraps the [Python KafkaProducer](https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html), and event publication can be either synchronous or asynchronous. Below demonstrates the synchronous approach. +Once the topic is registered and the access key and secret key are ready (through `c = GlobusClient()`), we can publish messages to it. The `KafkaProducer` wraps the [Python KafkaProducer](https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html), and event publication can be either synchronous or asynchronous. Below demonstrates the synchronous approach. ```python from diaspora_event_sdk import KafkaProducer @@ -55,28 +54,26 @@ To prevent reaching AWS's limit on topic partitions, unregister a topic from Dia ```python from diaspora_event_sdk import Client as GlobusClient c = GlobusClient() -topic = "topic-" + c.subject_openid[-12:] print(c.unregister_topic(topic)) print(c.list_topics()) ``` -Once a topic is successfully unregistered, you'll lose access to it. However, it becomes available for registration and access by other users. +Once a topic is successfully unregistered, you'll lose access to it. And it becomes available for registration and access by other users. ## Integrating Your Own Kafka Client with Diaspora Event Fabric If you're opting to use a custom Kafka client library, here are the necessary cluster connection details: -| Configuration | Value | -| ----------------- | ------------------------------------------------------------------- | -| Bootstrap Servers | (See instructions below) | -| Security Protocol | `SASL_SSL` | -| Sasl Mechanism | `OAUTHBEARER` | -| Api Version | `3.5.1` | -| Username | (See instructions below) | -| Password | (See instructions below) | +| Configuration | Value | +| ----------------- | ------------------------------------- | +| Bootstrap Servers | `c.retrieve_key()['endpoint']` | +| Security Protocol | `SASL_SSL` | +| Sasl Mechanism | `OAUTHBEARER` | +| Username | `c.retrieve_key()['access_key']` | +| Password | `c.retrieve_key()['secret_key']` | -The bootstrap server address, OAuth access key, and OAuth secret key can be retrieved through `retrieve_key()` and invalidated through `create_key()`. Please refer to our [TroubleShooting Guide](docs/troubleshooting.md) for secret key management. +The bootstrap server address, OAuth access key, and OAuth secret key can be retrieved through `retrieve_key()` and invalidated through `create_key()`. To use Diaspora Event Fabric with `confluent-kafka-python`, please refer to [this guide](https://github.com/aws/aws-msk-iam-sasl-signer-python?tab=readme-ov-file). For other programming languages, please refer to [this post](https://aws.amazon.com/blogs/big-data/amazon-msk-iam-authentication-now-supports-all-programming-languages/). ```python from diaspora_event_sdk import Client as GlobusClient c = GlobusClient() diff --git a/tests/unit/test_client.py b/tests/unit/test_client.py index a2983da..fdf51cb 100644 --- a/tests/unit/test_client.py +++ b/tests/unit/test_client.py @@ -1,5 +1,5 @@ import pytest -from unittest.mock import Mock, patch +from unittest.mock import Mock from unittest.mock import MagicMock from diaspora_event_sdk import Client from diaspora_event_sdk.sdk.web_client import WebClient