Skip to content

Commit

Permalink
Log event to sql database (#696)
Browse files Browse the repository at this point in the history
* support _token and remove prefix in login

* allow pass workspace when login

* Support current workspace for http endpoint

* fix login optional

* increase page size for list workspaces

* Bump version for hypha-rpc 0.20.38

* Update change logs and login instructions

* Support artifact endpoint via http

* Implement sql database

* change _id to _prefix

* clean up

* redirect login

* use sql to store workspace info

* add stage_files

* Update helm charts

* Fix workspaces db

* skip default database uri

* Use in-memory sql for artifacts

* Fix workspace loading error

* restore workspace info

* rename it to test-3

* restore version

* add logging service

* Fix artifacts

* Merge event log to workspace

* make sure error is raised

* support observability

* Add tests

* test observability

* Fix counter duplicated error

* add change log

* Support download statistics

* Update docs

* Update change log

* Remove set logging service
  • Loading branch information
oeway authored Oct 10, 2024
1 parent a30119e commit fdcc433
Show file tree
Hide file tree
Showing 16 changed files with 835 additions and 146 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Hypha Change Log

### 0.20.38

- Support event logging in the workspace, use `log_event` to log events in the workspace and use `get_events` to get the events in the workspace. The events will be persists in the SQL database.
- Allow passing workspace and expires_in to the `login` function to generate workspace specific token.
- When using http endpoint to access the service, you can now pass workspace specific token to the http header `Authorization` to access the service. (Previously, all the services are assumed to be accessed from the same service provider workspace)
- Breaking Change: Remove `info`, `warning`, `error`, `critical`, `debug` from the `hypha` module, use `log` or `log_event` instead.
- Support basic observability for the workspace, including workspace status, event bus and websocket connection status.
- Support download statistics for the artifacts in the artifact manager.

### 0.20.37
- Add s3-proxy to allow accessing s3 presigned url in case the s3 server is not directly accessible. Use `--enable-s3-proxy` to enable the s3 proxy when starting Hypha.
Expand Down
107 changes: 45 additions & 62 deletions docs/artifact-manager.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Artifact Manager

The `Artifact Manager` is a built-in Hypha service for indexing, managing, and storing resources such as datasets, AI models, and applications. It provides a structured way to manage datasets and similar resources, enabling efficient listing, uploading, updating, and deleting of files.
The `Artifact Manager` is a built-in Hypha service for indexing, managing, and storing resources such as datasets, AI models, and applications. It provides a structured way to manage datasets and similar resources, enabling efficient listing, uploading, updating, and deleting of files. It also now supports tracking download statistics for each artifact.

A typical use case for the `Artifact Manager` is as a backend for a single-page web application that displays a gallery of datasets, AI models, applications, or other types of resources. The default metadata of an artifact is designed to render a grid of cards on a webpage.
A typical use case for the `Artifact Manager` is as a backend for a single-page web application that displays a gallery of datasets, AI models, applications, or other types of resources. The default metadata of an artifact is designed to render a grid of cards on a webpage. It also supports tracking download statistics.

**Note:** The `Artifact Manager` is only available when your Hypha server has S3 storage enabled.

---

## Getting Started

### Step 1: Connecting to the Artifact Manager Service

To use the `Artifact Manager`, you first need to connect to the Hypha server. This API allows you to create, read, edit, and delete datasets in the artifact registry (stored in a S3 bucket for each workspace).
To use the `Artifact Manager`, you first need to connect to the Hypha server. This API allows you to create, read, edit, and delete datasets in the artifact registry (stored in an S3 bucket for each workspace).

```python
from hypha_rpc.websocket_client import connect_to_server
Expand All @@ -25,7 +26,7 @@ artifact_manager = await server.get_service("public/artifact-manager")

### Step 2: Creating a Dataset Gallery Collection

Once connected, you can create a collection to organize datasets in the gallery.
Once connected, you can create a collection to organize datasets in the gallery.

```python
# Create a collection for the Dataset Gallery
Expand Down Expand Up @@ -59,13 +60,15 @@ await artifact_manager.create(prefix="collections/dataset-gallery/example-datase
print("Dataset added to the gallery.")
```

### Step 4: Uploading Files to the Dataset
### Step 4: Uploading Files to the Dataset with Download Statistics

Once you have created a dataset, you can upload files to it by generating a pre-signed URL. This URL allows you to upload the actual files to the artifact's S3 bucket.

Once you have created a dataset, you can upload files to it by generating a pre-signed URL.
Additionally, when uploading files to an artifact, you can specify a `download_weight` for each file. This weight determines how the file impacts the artifact's download count when it is accessed. For example, primary files might have a higher `download_weight`, while secondary files might have no impact. The download count is automatically updated whenever users download files from the artifact.

```python
# Get a pre-signed URL to upload a file
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv")
# Get a pre-signed URL to upload a file, with a download_weight assigned
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv", options={"download_weight": 0.5})

# Upload the file using an HTTP PUT request
with open("path/to/local/data.csv", "rb") as f:
Expand Down Expand Up @@ -99,7 +102,7 @@ print("Datasets in the gallery:", datasets)

## Full Example: Creating and Managing a Dataset Gallery

Here’s a full example that shows how to connect to the service, create a dataset gallery, add a dataset, upload files, and commit the dataset.
Here’s a full example that shows how to connect to the service, create a dataset gallery, add a dataset, upload files with download statistics, and commit the dataset.

```python
import asyncio
Expand Down Expand Up @@ -135,8 +138,8 @@ async def main():
await artifact_manager.create(prefix="collections/dataset-gallery/example-dataset", manifest=dataset_manifest, stage=True)
print("Dataset added to the gallery.")

# Get a pre-signed URL to upload a file
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv")
# Get a pre-signed URL to upload a file, with a download_weight assigned
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv", options={"download_weight": 0.5})

# Upload the file using an HTTP PUT request
with open("path/to/local/data.csv", "rb") as f:
Expand Down Expand Up @@ -217,43 +220,10 @@ await artifact_manager.commit(prefix="collections/schema-dataset-gallery/valid-d
print("Valid dataset committed.")
```

### Step 3: Accessing the collection via HTTP API

You can access the collection via the HTTP API to retrieve the schema and datasets.
This can be used for rendering a gallery of datasets on a webpage.

```javascript
// Fetch the schema for the collection
fetch("https://hypha.aicell.io/my-workspace/artifact/public/collections/schema-dataset-gallery")
.then(response => response.json())
.then(data => console.log("Schema:", data.collection_schema));
```

## API Reference

This section details the core functions provided by the `Artifact Manager` for creating, managing, and validating artifacts such as datasets and collections.

### `create(prefix: str, manifest: dict, overwrite: bool = False, stage: bool = False) -> dict`

Creates a new artifact or collection with the provided manifest. If the artifact already exists, you must set `overwrite=True` to overwrite it.

**Parameters:**

- `prefix`: The path where the artifact or collection will be created (e.g., `"collections/dataset-gallery"`).
- `manifest`: The manifest describing the artifact (must include fields like `id`, `name`, and `type`).
- `overwrite`: Optional. If `True`, it will overwrite an existing artifact. Default is `False`.
- `stage`: Optional. If `True`, it will put the artifact into staging mode. Default is `False`.

**Returns:** The created manifest as a dictionary.

**Example:**

```python
await artifact_manager.create(prefix="collections/dataset-gallery", manifest=gallery_manifest)
```

---

## API References

### `edit(prefix: str, manifest: dict) -> None`

Edits an existing artifact. You provide the new manifest to update the artifact. The updated manifest is stored temporarily as `_manifest.yaml`.
Expand Down Expand Up @@ -303,21 +273,24 @@ await artifact_manager.delete(prefix="collections/dataset-gallery/example-datase

---

### `put_file(prefix: str, file_path: str) -> str`
### `put_file(prefix: str, file_path: str, options: dict = None) -> str`

Generates a pre-signed URL to upload a file to an artifact. You can then use the URL with an HTTP `PUT` request to upload the file.

**Parameters:**

- `prefix`: The path of the artifact where the file will be uploaded (e.g., `"collections/dataset-gallery/example-dataset"`).
- `file_path`: The relative path of the file to upload within the artifact (e.g., `"data.csv"`).
- `options`: Optional. Additional options for the file upload. Default is `None`.
The options can include:
- `download_weight`: A float value representing the impact of the file on the download count. Default is `0`.

**Returns:** A pre-signed URL for uploading the file.

**Example:**

```python
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv")
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv", options={"download_weight": 1.0})
```

---
Expand All @@ -339,14 +312,17 @@ await artifact_manager.remove_file(prefix="collections/dataset-gallery/example-d

---

### `get_file(prefix: str, path: str) -> str`
### `get_file(prefix: str, path: str, options: dict=None) -> str`

Generates a pre-signed URL to download a file from the artifact.

**Parameters:**

- `prefix`: The path of the artifact (e.g., `"collections/dataset-gallery/example-dataset"`).
- `path`: The relative path of the file to download (e.g., `"data.csv"`).
- `options`: Optional. Additional options for the file download. Default is `None`.
The options can include:
- `silent`: A boolean flag to suppress download statistics. Default is `False`.

**Returns:** A pre-signed URL for downloading the file.

Expand Down Expand Up @@ -460,6 +436,20 @@ print("Datasets in the gallery:", datasets)

The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifests and data. This is useful for public-facing web applications that need to access datasets, models, or applications.


### Resetting Download Statistics

You can reset the download statistics of a dataset using the `reset_stats` function.

```python
await artifact_manager.reset_stats(prefix="collections/dataset-gallery/example-dataset")
print("Download statistics reset.")
```

## HTTP API for Accessing Artifacts and Download Counts

The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifests, data, and download statistics. This is useful for public-facing web applications that need to access datasets, models, or applications.

### Endpoint: `/{workspace}/artifact/{path:path}`

- **Workspace**: The workspace in which the artifact is stored.
Expand All @@ -472,17 +462,18 @@ The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifes
- **Method**: `GET`
- **Parameters**:
- `workspace`: The workspace in which the artifact is stored.
- `path`: The path to the artifact (e.g., `public/collections/dataset-gallery/example-dataset`).
- `path`:

The path to the artifact (e.g., `public/collections/dataset-gallery/example-dataset`).
- `stage` (optional): A boolean flag to indicate whether to fetch the staged version of the manifest (`_manifest.yaml`). Default is `False`.

### Response:

- **For public artifacts**: Returns the artifact manifest if it exists under the `public/` prefix.
- **For public artifacts**: Returns the artifact manifest if it exists under the `public/` prefix, including any download statistics.
- **For private artifacts**: Returns the artifact manifest if the user has the necessary permissions.

### Example:

#### Fetching a public artifact:
### Example: Fetching a public artifact with download statistics

```python
import requests
Expand All @@ -493,17 +484,9 @@ response = requests.get(f"{SERVER_URL}/{workspace}/artifact/public/collections/d
if response.ok:
artifact = response.json()
print(artifact["name"]) # Output: Example Dataset
print(artifact["_stats"]["download_count"]) # Output: Download count for the dataset
else:
print(f"Error: {response.status_code}")
```

#### Fetching a private artifact:

```python
response = requests.get(f"{SERVER_URL}/{workspace}/artifact/collections/private-dataset-gallery/private-example-dataset")
if response.ok:
artifact = response.json()
print(artifact["name"]) # Output: Private Example Dataset
else:
print(f"Error: {response.status_code}")
```
2 changes: 1 addition & 1 deletion hypha/VERSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"version": "0.20.37.post4"
"version": "0.20.38"
}
Loading

0 comments on commit fdcc433

Please sign in to comment.