Skip to content

Commit

Permalink
Fix zip supports
Browse files Browse the repository at this point in the history
  • Loading branch information
oeway committed Nov 25, 2024
1 parent 82d9a3d commit 9a010d6
Show file tree
Hide file tree
Showing 6 changed files with 728 additions and 357 deletions.
233 changes: 195 additions & 38 deletions docs/artifact-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -706,70 +706,227 @@ datasets = await artifact_manager.list(collection.id)
print("Datasets in the gallery:", datasets)
```


## HTTP API for Accessing Artifacts and Download Counts

The `Artifact Manager` provides an HTTP endpoint for retrieving artifact manifests, data, and download statistics. This is useful for public-facing web applications that need to access datasets, models, or applications.
The `Artifact Manager` provides an HTTP API for retrieving artifact manifests, data, file statistics, and managing zip files. These endpoints are designed for public-facing web applications that need to interact with datasets, models, or applications.

### Endpoints:
---
### Artifact Metadata and File Access Endpoints

- `/{workspace}/artifacts/{artifact_alias}` for fetching the artifact manifest.
- `/{workspace}/artifacts/{artifact_alias}/children` for listing all artifacts in a collection.
- `/{workspace}/artifacts/{artifact_alias}/files` for listing all files in the artifact.
- `/{workspace}/artifacts/{artifact_alias}/files/{file_path:path}` for downloading a file from the artifact (will be redirected to a pre-signed URL).
#### Endpoints:

- `/{workspace}/artifacts/{artifact_alias}`: Fetch the artifact manifest.
- `/{workspace}/artifacts/{artifact_alias}/children`: List all artifacts in a collection.
- `/{workspace}/artifacts/{artifact_alias}/files`: List all files in the artifact.
- `/{workspace}/artifacts/{artifact_alias}/files/{file_path:path}`: Download a file from the artifact (redirects to a pre-signed URL).

### Request Format:
#### Request Format:

- **Method**: `GET`
- **Headers**:
- `Authorization`: Optional. The user's token for accessing private artifacts (obtained via the login logic or created by `api.generate_token()`). Not required for public artifacts.

### Path Parameters:
- **Headers**:
- `Authorization`: Optional. The user's token for accessing private artifacts (obtained via login logic or created by `api.generate_token()`). Not required for public artifacts.

The path parameters are used to specify the artifact or file to access. The following parameters are supported:
#### Path Parameters:

- **workspace**: The workspace in which the artifact is stored.
- **artifact_alias**: The alias or id of the artifact to access. This can be an artifact id generated by `create` or `edit` function, or it can be an alias of the artifact under the current workspace. Note that this artifact_alias should not contain the workspace.
- **file_path**: Optional, the relative path to a file within the artifact. This is optional and only required when downloading a file.
- **artifact_alias**: The alias or ID of the artifact to access. This can be generated by `create` or `edit` functions or be an alias under the current workspace.
- **file_path**: (Optional) The relative path to a file within the artifact.

#### Response Examples:

- **Artifact Manifest**:
```json
{
"manifest": {
"name": "Example Dataset",
"description": "A dataset for testing.",
"version": "1.0.0"
},
"view_count": 150,
"download_count": 25
}
```

- **Files in Artifact**:
```json
[
{"name": "example.txt", "type": "file"},
{"name": "nested", "type": "directory"}
]
```

- **Download File**: A redirect to a pre-signed URL for the file.

### Query Parameters:

Qury parameters are passed after the `?` in the URL and are used to control the behavior of the API. The following query parameters are supported:
---

- **stage**: A boolean flag to fetch the staged version of the manifest. Default is `False`.
- **silent**: A boolean flag to suppress the view count increment. Default is `False`.
### Dynamic Zip File Creation Endpoint

- **keywords**: A list of search terms used for fuzzy searching across all manifest fields, separated by commas.
- **filters**: A dictionary of filters to apply to the search, in the format of a JSON string.
- **mode**: The mode for combining multiple conditions. Default is `AND`.
- **offset**: The number of artifacts to skip before listing results. Default is `0`.
- **limit**: The maximum number of artifacts to return. Default is `100`.
- **order_by**: The field used to order results. Default is ascending by id.
- **silent**: A boolean flag to prevent incrementing the view count for the parent artifact when listing children, listing files, or reading the artifact. Default is `False`.
#### Endpoint:

### Response:
- `/{workspace}/artifacts/{artifact_alias}/create-zip-file`: Stream a dynamically created zip file containing selected or all files in the artifact.

For `/{workspace}/artifacts/{artifact_alias}`, the response will be a JSON object representing the artifact manifest. For `/{workspace}/artifacts/{artifact_alias}/__files__/{file_path:path}`, the response will be a pre-signed URL to download the file. The artifact manifest will also include any metadata such as download statistics, e.g. `view_count`, `download_count`. For private artifacts, make sure if the user has the necessary permissions.
#### Request Format:

For `/{workspace}/artifacts/{artifact_alias}/children`, the response will be a list of artifacts in the collection.
- **Method**: `GET`
- **Query Parameters**:
- **file**: (Optional) A list of files to include in the zip file. If omitted, all files in the artifact are included.
- **token**: (Optional) User token for private artifact access.
- **version**: (Optional) The version of the artifact to fetch files from.

For `/{workspace}/artifacts/{artifact_alias}/files`, the response will be a list of files in the artifact, each file is a dictionary with the `name` and `type` fields.
#### Response:

For `/{workspace}/artifacts/{artifact_alias}/files/{file_path:path}`, the response will be a pre-signed URL to download the file.
- Streams the zip file back to the client.
- **Headers**:
- `Content-Disposition`: Attachment with the artifact alias as the filename.

### Example: Fetching a public artifact with download statistics
#### Example Usage:

```python
import requests

SERVER_URL = "https://hypha.aicell.io"
workspace = "my-workspace"
response = requests.get(f"{SERVER_URL}/{workspace}/artifacts/example-dataset")
if response.ok:
artifact = response.json()
print(artifact["manifest"]["name"]) # Output: Example Dataset
print(artifact["download_count"]) # Output: Download count for the dataset
artifact_alias = "example-dataset"
files = ["example.txt", "nested/example2.txt"]

response = requests.get(
f"{SERVER_URL}/{workspace}/artifacts/{artifact_alias}/create-zip-file",
params={"file": files},
stream=True,
)
if response.status_code == 200:
with open("artifact_files.zip", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print("Zip file created successfully.")
else:
print(f"Error: {response.status_code}")
```

---

### Zip File Access Endpoints

These endpoints allow direct access to zip file contents stored in the artifact without requiring the entire zip file to be downloaded or extracted.

#### Endpoints:

1. **`/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path:path}?path=...`**
- Access the contents of a zip file, specifying the path within the zip file using a query parameter (`?path=`).

2. **`/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path:path}/~/{path:path|}`**
- Access the contents of a zip file, separating the zip file path and the internal path using `/~/`.

---

#### Endpoint 1: `/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path:path}?path=...`

##### Functionality:

- **If `path` ends with `/`:** Lists the contents of the directory specified by `path` inside the zip file.
- **If `path` specifies a file:** Streams the file content from the zip.

##### Request Format:

- **Method**: `GET`
- **Path Parameters**:
- **workspace**: The workspace in which the artifact is stored.
- **artifact_alias**: The alias or ID of the artifact to access.
- **zip_file_path**: Path to the zip file within the artifact.
- **Query Parameters**:
- **path**: (Optional) The relative path inside the zip file. Defaults to the root directory.

##### Response Examples:

1. **Listing Directory Contents**:
```json
[
{"type": "file", "name": "example.txt", "size": 123, "last_modified": 1732363845.0},
{"type": "directory", "name": "nested"}
]
```

2. **Fetching a File**:
Streams the file content from the zip.

---

#### Endpoint 2: `/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path:path}/~/{path:path}`

##### Functionality:

- **If `path` ends with `/`:** Lists the contents of the directory specified by `path` inside the zip file.
- **If `path` specifies a file:** Streams the file content from the zip.

##### Request Format:

- **Method**: `GET`
- **Path Parameters**:
- **workspace**: The workspace in which the artifact is stored.
- **artifact_alias**: The alias or ID of the artifact to access.
- **zip_file_path**: Path to the zip file within the artifact.
- **path**: (Optional) The relative path inside the zip file. Defaults to the root directory.

##### Response Examples:

1. **Listing Directory Contents**:
```json
[
{"type": "file", "name": "example.txt", "size": 123, "last_modified": 1732363845.0},
{"type": "directory", "name": "nested"}
]
```

2. **Fetching a File**:
Streams the file content from the zip.

---

#### Example Usage for Both Endpoints

##### Listing Directory Contents:

```python
import requests

SERVER_URL = "https://hypha.aicell.io"
workspace = "my-workspace"
artifact_alias = "example-dataset"
zip_file_path = "example.zip"

# Using the query parameter method
response = requests.get(
f"{SERVER_URL}/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path}",
params={"path": "nested/"}
)
print(response.json())

# Using the tilde method
response = requests.get(
f"{SERVER_URL}/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path}/~/nested/"
)
print(response.json())
```

##### Fetching a File:

```python
# Using the query parameter method
response = requests.get(
f"{SERVER_URL}/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path}",
params={"path": "nested/example2.txt"},
stream=True,
)
with open("example2.txt", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)

# Using the tilde method
response = requests.get(
f"{SERVER_URL}/{workspace}/artifacts/{artifact_alias}/zip-files/{zip_file_path}/~/nested/example2.txt",
stream=True,
)
with open("example2.txt", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
```
2 changes: 2 additions & 0 deletions hypha/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,7 @@ async def start(
+ f"&server_url={server_url}"
+ (f"&token={token}" if token else "")
+ (f"&version={version}" if version else "")
+ (f"&use_proxy=true")
)
server_url = self.public_base_url
public_url = (
Expand All @@ -389,6 +390,7 @@ async def start(
+ f"&server_url={server_url}"
+ (f"&token={token}" if token else "")
+ (f"&version={version}" if version else "")
+ (f"&use_proxy=true")
)

runner = random.choice(self._runner)
Expand Down
Loading

0 comments on commit 9a010d6

Please sign in to comment.