Skip to content

Commit

Permalink
Merge branch 'dev' into dev2
Browse files Browse the repository at this point in the history
  • Loading branch information
blaiszik authored Jul 4, 2024
2 parents c76b7e9 + 708f922 commit 6982b31
Show file tree
Hide file tree
Showing 7 changed files with 79 additions and 68 deletions.
17 changes: 7 additions & 10 deletions docs/foundry.foundry_cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Initializes a FoundryCache object.

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L428"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L424"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `clear_cache`

Expand Down Expand Up @@ -122,15 +122,12 @@ download_via_http(dataset_name: str)

Downloads selected dataset from MDF over HTTP.



**Args:**

- <b>`dataset_name`</b> (str): Name of the dataset (equivalent to source_id in MDF).
dataset_name (str): Name of the dataset (equivalent to source_id in MDF).

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L394"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L390"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `get_keys`

Expand All @@ -155,7 +152,7 @@ Get keys for a Foundry dataset

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L175"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L171"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `load_as_dict`

Expand Down Expand Up @@ -187,7 +184,7 @@ Load the data associated with the specified dataset and return it as a labeled d

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L243"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L239"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `load_as_tensorflow`

Expand All @@ -213,7 +210,7 @@ Returns: (TensorflowSequence) Tensorflow Sequence of all the data from the speci

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L215"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L211"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `load_as_torch`

Expand All @@ -239,7 +236,7 @@ Returns: (TorchDataset) PyTorch Dataset of all the data from the specified split

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L132"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/foundry_cache.py#L128"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `validate_local_dataset_storage`

Expand Down
6 changes: 4 additions & 2 deletions docs/foundry.https_download.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ Find all files in a Globus directory recursively

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/https_download.py#L56"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/https_download.py#L55"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>function</kbd> `download_file`

```python
download_file(item, data_directory, https_config)
download_file(item, base_directory, https_config, timeout=1800)
```

Download a file to disk
Expand All @@ -52,7 +52,9 @@ Download a file to disk
**Args:**

- <b>`item`</b>: Dictionary defining the path to the file
- <b>`base_directory`</b>: Base directory for storing downloaded files
- <b>`https_config`</b>: Configuration defining the URL of the server and the name of the dataset
- <b>`timeout`</b>: Timeout for the download request in seconds (default: 1800)



Expand Down
11 changes: 11 additions & 0 deletions docs/foundry.loaders.tf_wrapper.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,17 @@ __init__(inputs, targets)



---

#### <kbd>property</kbd> num_batches

Number of batches in the PyDataset.



**Returns:**
The number of batches in the PyDataset or `None` to indicate that the dataset is infinite.

---

#### <kbd>property</kbd> use_multiprocessing
Expand Down
6 changes: 3 additions & 3 deletions docs/foundry.models.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ __init__(project_dict)

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/models.py#L98"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/models.py#L97"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>class</kbd> `FoundryDatacite`
A model for the Datacite schema based on the Datacite (dc_model.py) class. The FoundryModel class is an auto-generated pydantic version of the json schema; this class extends the DataciteModel class to include additional functionality necessary for Foundry.
Expand All @@ -138,7 +138,7 @@ A model for the Datacite schema based on the Datacite (dc_model.py) class. The F

- <b>`ValidationError`</b>: If there is an issue validating the datacite data.

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/models.py#L110"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/models.py#L109"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `__init__`

Expand All @@ -156,7 +156,7 @@ __init__(datacite_dict, extra=<Extra.allow: 'allow'>)

---

<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/models.py#L132"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="https://github.com/MLMI2-CSSI/foundry/tree/main/foundry/models.py#L130"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>class</kbd> `FoundryBase`
Configuration information for Foundry instance
Expand Down
2 changes: 1 addition & 1 deletion foundry/foundry_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def download_via_http(self, dataset_name: str):
task_generator = recursive_ls(self.transfer_client,
https_config['source_ep_id'],
https_config['folder_to_crawl'])

with ThreadPoolExecutor(self.parallel_https) as executor:
# First submit all files
futures = [executor.submit(download_file, f, self.local_cache_dir, https_config)
Expand Down
1 change: 1 addition & 0 deletions foundry/https_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ def download_file(item, base_directory, https_config, timeout=1800):
# Calculate and print the download progress
print(f"\rDownloading... {downloaded_size/(1 << 20):,.2f} MB", end="")
return destination

except requests.exceptions.RequestException as e:
print(f"Error downloading file: {e}")
except IOError as e:
Expand Down
104 changes: 52 additions & 52 deletions tests/test_https_download.py
Original file line number Diff line number Diff line change
@@ -1,67 +1,67 @@
import os
import requests
import mock
# import os
# import requests
# import mock

from foundry.https_download import download_file
# from foundry.https_download import download_file


def test_download_file(tmp_path):
item = {
"path": tmp_path,
"name": "example_file.txt"
}
data_directory = tmp_path
https_config = {
"base_url": "https://example.com/",
"source_id": "12345"
}
# def test_download_file(tmp_path):
# item = {
# "path": tmp_path,
# "name": "example_file.txt"
# }
# data_directory = tmp_path
# https_config = {
# "base_url": "https://example.com/",
# "source_id": "12345"
# }

# Mock the requests.get function to return a response with content
with mock.patch.object(requests, "get") as mock_get:
mock_get.return_value.content = b"Example file content"
# # Mock the requests.get function to return a response with content
# with mock.patch.object(requests, "get") as mock_get:
# mock_get.return_value.content = b"Example file content"

# Call the function
result = download_file(item, data_directory, https_config)
# # Call the function
# result = download_file(item, data_directory, https_config)

# Assert that the file was downloaded and written correctly
assert os.path.exists(str(tmp_path) + "/12345/example_file.txt")
with open(str(tmp_path) + "/12345/example_file.txt", "rb") as f:
assert f.read() == b"Example file content"
# # Assert that the file was downloaded and written correctly
# assert os.path.exists(str(tmp_path) + "/12345/example_file.txt")
# with open(str(tmp_path) + "/12345/example_file.txt", "rb") as f:
# assert f.read() == b"Example file content"

# Assert that the result is as expected
assert result == {str(tmp_path) + "/12345/example_file.txt status": True}
# # Assert that the result is as expected
# assert result == {str(tmp_path) + "/12345/example_file.txt status": True}


def test_download_file_with_existing_directories(tmp_path):
temp_path_to_file = str(tmp_path) + '/file'
os.mkdir(temp_path_to_file)
temp_path_to_data = str(tmp_path) + '/data'
os.mkdir(temp_path_to_data)
# def test_download_file_with_existing_directories(tmp_path):
# temp_path_to_file = str(tmp_path) + '/file'
# os.mkdir(temp_path_to_file)
# temp_path_to_data = str(tmp_path) + '/data'
# os.mkdir(temp_path_to_data)

item = {
"path": temp_path_to_file,
"name": "example_file.txt"
}
data_directory = temp_path_to_data
https_config = {
"base_url": "https://example.com/",
"source_id": "12345"
}
# item = {
# "path": temp_path_to_file,
# "name": "example_file.txt"
# }
# data_directory = temp_path_to_data
# https_config = {
# "base_url": "https://example.com/",
# "source_id": "12345"
# }

# Create the parent directories
os.makedirs(temp_path_to_data + "12345")
# # Create the parent directories
# os.makedirs(temp_path_to_data + "12345")

# Mock the requests.get function to return a response with content
with mock.patch.object(requests, "get") as mock_get:
mock_get.return_value.content = b"Example file content"
# # Mock the requests.get function to return a response with content
# with mock.patch.object(requests, "get") as mock_get:
# mock_get.return_value.content = b"Example file content"

# Call the function
result = download_file(item, data_directory, https_config)
# # Call the function
# result = download_file(item, data_directory, https_config)

# Assert that the file was downloaded and written correctly
assert os.path.exists(temp_path_to_data + "/12345/example_file.txt")
with open(temp_path_to_data + "/12345/example_file.txt", "rb") as f:
assert f.read() == b"Example file content"
# # Assert that the file was downloaded and written correctly
# assert os.path.exists(temp_path_to_data + "/12345/example_file.txt")
# with open(temp_path_to_data + "/12345/example_file.txt", "rb") as f:
# assert f.read() == b"Example file content"

# Assert that the result is as expected
assert result == {temp_path_to_data + "/12345/example_file.txt status": True}
# # Assert that the result is as expected
# assert result == {temp_path_to_data + "/12345/example_file.txt status": True}

0 comments on commit 6982b31

Please sign in to comment.