Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log event to sql database #696

Merged
merged 36 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
c17958e
support _token and remove prefix in login
oeway Sep 28, 2024
dc66b2e
allow pass workspace when login
oeway Sep 28, 2024
efc4517
Support current workspace for http endpoint
oeway Sep 28, 2024
2e87f31
fix login optional
oeway Sep 28, 2024
e1a691c
increase page size for list workspaces
oeway Sep 28, 2024
e9111a1
Bump version for hypha-rpc 0.20.38
oeway Sep 28, 2024
b4458a5
Update change logs and login instructions
oeway Sep 28, 2024
a3dba93
Support artifact endpoint via http
oeway Sep 29, 2024
1d71bc8
Implement sql database
oeway Sep 30, 2024
16f19b4
change _id to _prefix
oeway Sep 30, 2024
20901b9
clean up
oeway Sep 30, 2024
710eeb2
redirect login
oeway Sep 30, 2024
be07e79
use sql to store workspace info
oeway Oct 1, 2024
b1f894c
add stage_files
oeway Oct 6, 2024
e70b741
Update helm charts
oeway Oct 6, 2024
4cc49e0
Fix workspaces db
oeway Oct 7, 2024
24a4b7f
skip default database uri
oeway Oct 7, 2024
5e455ae
Use in-memory sql for artifacts
oeway Oct 7, 2024
f490b0b
Fix workspace loading error
oeway Oct 7, 2024
d20f619
restore workspace info
oeway Oct 7, 2024
6a89e0f
rename it to test-3
oeway Oct 7, 2024
7c4db80
restore version
oeway Oct 7, 2024
6a5c617
add logging service
oeway Oct 7, 2024
9efd1dc
Merge remote-tracking branch 'origin/main' into log-event
oeway Oct 7, 2024
2de43e3
Fix artifacts
oeway Oct 10, 2024
e379b74
Merge event log to workspace
oeway Oct 10, 2024
3fad225
make sure error is raised
oeway Oct 10, 2024
d4af37c
support observability
oeway Oct 10, 2024
a5e408d
Add tests
oeway Oct 10, 2024
782506a
test observability
oeway Oct 10, 2024
9debe04
Fix counter duplicated error
oeway Oct 10, 2024
c08ba6e
add change log
oeway Oct 10, 2024
66882d7
Support download statistics
oeway Oct 10, 2024
d75dc6e
Update docs
oeway Oct 10, 2024
af4f310
Update change log
oeway Oct 10, 2024
fee518b
Remove set logging service
oeway Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Hypha Change Log

### 0.20.38

- Support event logging in the workspace, use `log_event` to log events in the workspace and use `get_events` to get the events in the workspace. The events will be persists in the SQL database.
- Allow passing workspace and expires_in to the `login` function to generate workspace specific token.
- When using http endpoint to access the service, you can now pass workspace specific token to the http header `Authorization` to access the service. (Previously, all the services are assumed to be accessed from the same service provider workspace)
- Breaking Change: Remove `info`, `warning`, `error`, `critical`, `debug` from the `hypha` module, use `log` or `log_event` instead.
- Support basic observability for the workspace, including workspace status, event bus and websocket connection status.
- Support download statistics for the artifacts in the artifact manager.

### 0.20.37
- Add s3-proxy to allow accessing s3 presigned url in case the s3 server is not directly accessible. Use `--enable-s3-proxy` to enable the s3 proxy when starting Hypha.
Expand Down
107 changes: 45 additions & 62 deletions docs/artifact-manager.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Artifact Manager

The `Artifact Manager` is a built-in Hypha service for indexing, managing, and storing resources such as datasets, AI models, and applications. It provides a structured way to manage datasets and similar resources, enabling efficient listing, uploading, updating, and deleting of files.
The `Artifact Manager` is a built-in Hypha service for indexing, managing, and storing resources such as datasets, AI models, and applications. It provides a structured way to manage datasets and similar resources, enabling efficient listing, uploading, updating, and deleting of files. It also now supports tracking download statistics for each artifact.

A typical use case for the `Artifact Manager` is as a backend for a single-page web application that displays a gallery of datasets, AI models, applications, or other types of resources. The default metadata of an artifact is designed to render a grid of cards on a webpage.
A typical use case for the `Artifact Manager` is as a backend for a single-page web application that displays a gallery of datasets, AI models, applications, or other types of resources. The default metadata of an artifact is designed to render a grid of cards on a webpage. It also supports tracking download statistics.

**Note:** The `Artifact Manager` is only available when your Hypha server has S3 storage enabled.

---

## Getting Started

### Step 1: Connecting to the Artifact Manager Service

To use the `Artifact Manager`, you first need to connect to the Hypha server. This API allows you to create, read, edit, and delete datasets in the artifact registry (stored in a S3 bucket for each workspace).
To use the `Artifact Manager`, you first need to connect to the Hypha server. This API allows you to create, read, edit, and delete datasets in the artifact registry (stored in an S3 bucket for each workspace).

```python
from hypha_rpc.websocket_client import connect_to_server
Expand All @@ -25,7 +26,7 @@ artifact_manager = await server.get_service("public/artifact-manager")

### Step 2: Creating a Dataset Gallery Collection

Once connected, you can create a collection to organize datasets in the gallery.
Once connected, you can create a collection to organize datasets in the gallery.

```python
# Create a collection for the Dataset Gallery
Expand Down Expand Up @@ -59,13 +60,15 @@ await artifact_manager.create(prefix="collections/dataset-gallery/example-datase
print("Dataset added to the gallery.")
```

### Step 4: Uploading Files to the Dataset
### Step 4: Uploading Files to the Dataset with Download Statistics

Once you have created a dataset, you can upload files to it by generating a pre-signed URL. This URL allows you to upload the actual files to the artifact's S3 bucket.

Once you have created a dataset, you can upload files to it by generating a pre-signed URL.
Additionally, when uploading files to an artifact, you can specify a `download_weight` for each file. This weight determines how the file impacts the artifact's download count when it is accessed. For example, primary files might have a higher `download_weight`, while secondary files might have no impact. The download count is automatically updated whenever users download files from the artifact.

```python
# Get a pre-signed URL to upload a file
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv")
# Get a pre-signed URL to upload a file, with a download_weight assigned
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv", options={"download_weight": 0.5})

# Upload the file using an HTTP PUT request
with open("path/to/local/data.csv", "rb") as f:
Expand Down Expand Up @@ -99,7 +102,7 @@ print("Datasets in the gallery:", datasets)

## Full Example: Creating and Managing a Dataset Gallery

Here’s a full example that shows how to connect to the service, create a dataset gallery, add a dataset, upload files, and commit the dataset.
Here’s a full example that shows how to connect to the service, create a dataset gallery, add a dataset, upload files with download statistics, and commit the dataset.

```python
import asyncio
Expand Down Expand Up @@ -135,8 +138,8 @@ async def main():
await artifact_manager.create(prefix="collections/dataset-gallery/example-dataset", manifest=dataset_manifest, stage=True)
print("Dataset added to the gallery.")

# Get a pre-signed URL to upload a file
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv")
# Get a pre-signed URL to upload a file, with a download_weight assigned
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv", options={"download_weight": 0.5})

# Upload the file using an HTTP PUT request
with open("path/to/local/data.csv", "rb") as f:
Expand Down Expand Up @@ -217,43 +220,10 @@ await artifact_manager.commit(prefix="collections/schema-dataset-gallery/valid-d
print("Valid dataset committed.")
```

### Step 3: Accessing the collection via HTTP API

You can access the collection via the HTTP API to retrieve the schema and datasets.
This can be used for rendering a gallery of datasets on a webpage.

```javascript
// Fetch the schema for the collection
fetch("https://hypha.aicell.io/my-workspace/artifact/public/collections/schema-dataset-gallery")
.then(response => response.json())
.then(data => console.log("Schema:", data.collection_schema));
```

## API Reference

This section details the core functions provided by the `Artifact Manager` for creating, managing, and validating artifacts such as datasets and collections.

### `create(prefix: str, manifest: dict, overwrite: bool = False, stage: bool = False) -> dict`

Creates a new artifact or collection with the provided manifest. If the artifact already exists, you must set `overwrite=True` to overwrite it.

**Parameters:**

- `prefix`: The path where the artifact or collection will be created (e.g., `"collections/dataset-gallery"`).
- `manifest`: The manifest describing the artifact (must include fields like `id`, `name`, and `type`).
- `overwrite`: Optional. If `True`, it will overwrite an existing artifact. Default is `False`.
- `stage`: Optional. If `True`, it will put the artifact into staging mode. Default is `False`.

**Returns:** The created manifest as a dictionary.

**Example:**

```python
await artifact_manager.create(prefix="collections/dataset-gallery", manifest=gallery_manifest)
```

---

## API References

### `edit(prefix: str, manifest: dict) -> None`

Edits an existing artifact. You provide the new manifest to update the artifact. The updated manifest is stored temporarily as `_manifest.yaml`.
Expand Down Expand Up @@ -303,21 +273,24 @@ await artifact_manager.delete(prefix="collections/dataset-gallery/example-datase

---

### `put_file(prefix: str, file_path: str) -> str`
### `put_file(prefix: str, file_path: str, options: dict = None) -> str`

Generates a pre-signed URL to upload a file to an artifact. You can then use the URL with an HTTP `PUT` request to upload the file.

**Parameters:**

- `prefix`: The path of the artifact where the file will be uploaded (e.g., `"collections/dataset-gallery/example-dataset"`).
- `file_path`: The relative path of the file to upload within the artifact (e.g., `"data.csv"`).
- `options`: Optional. Additional options for the file upload. Default is `None`.
The options can include:
- `download_weight`: A float value representing the impact of the file on the download count. Default is `0`.

**Returns:** A pre-signed URL for uploading the file.

**Example:**

```python
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv")
put_url = await artifact_manager.put_file(prefix="collections/dataset-gallery/example-dataset", file_path="data.csv", options={"download_weight": 1.0})
```

---
Expand All @@ -339,14 +312,17 @@ await artifact_manager.remove_file(prefix="collections/dataset-gallery/example-d

---

### `get_file(prefix: str, path: str) -> str`
### `get_file(prefix: str, path: str, options: dict=None) -> str`

Generates a pre-signed URL to download a file from the artifact.

**Parameters:**

- `prefix`: The path of the artifact (e.g., `"collections/dataset-gallery/example-dataset"`).
- `path`: The relative path of the file to download (e.g., `"data.csv"`).
- `options`: Optional. Additional options for the file download. Default is `None`.
The options can include:
- `silent`: A boolean flag to suppress download statistics. Default is `False`.

**Returns:** A pre-signed URL for downloading the file.

Expand Down Expand Up @@ -460,6 +436,20 @@ print("Datasets in the gallery:", datasets)

The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifests and data. This is useful for public-facing web applications that need to access datasets, models, or applications.


### Resetting Download Statistics

You can reset the download statistics of a dataset using the `reset_stats` function.

```python
await artifact_manager.reset_stats(prefix="collections/dataset-gallery/example-dataset")
print("Download statistics reset.")
```

## HTTP API for Accessing Artifacts and Download Counts

The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifests, data, and download statistics. This is useful for public-facing web applications that need to access datasets, models, or applications.

### Endpoint: `/{workspace}/artifact/{path:path}`

- **Workspace**: The workspace in which the artifact is stored.
Expand All @@ -472,17 +462,18 @@ The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifes
- **Method**: `GET`
- **Parameters**:
- `workspace`: The workspace in which the artifact is stored.
- `path`: The path to the artifact (e.g., `public/collections/dataset-gallery/example-dataset`).
- `path`:

The path to the artifact (e.g., `public/collections/dataset-gallery/example-dataset`).
- `stage` (optional): A boolean flag to indicate whether to fetch the staged version of the manifest (`_manifest.yaml`). Default is `False`.

### Response:

- **For public artifacts**: Returns the artifact manifest if it exists under the `public/` prefix.
- **For public artifacts**: Returns the artifact manifest if it exists under the `public/` prefix, including any download statistics.
- **For private artifacts**: Returns the artifact manifest if the user has the necessary permissions.

### Example:

#### Fetching a public artifact:
### Example: Fetching a public artifact with download statistics

```python
import requests
Expand All @@ -493,17 +484,9 @@ response = requests.get(f"{SERVER_URL}/{workspace}/artifact/public/collections/d
if response.ok:
artifact = response.json()
print(artifact["name"]) # Output: Example Dataset
print(artifact["_stats"]["download_count"]) # Output: Download count for the dataset
else:
print(f"Error: {response.status_code}")
```

#### Fetching a private artifact:

```python
response = requests.get(f"{SERVER_URL}/{workspace}/artifact/collections/private-dataset-gallery/private-example-dataset")
if response.ok:
artifact = response.json()
print(artifact["name"]) # Output: Private Example Dataset
else:
print(f"Error: {response.status_code}")
```
2 changes: 1 addition & 1 deletion hypha/VERSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"version": "0.20.37.post4"
"version": "0.20.38"
}
Loading
Loading