updating to reflect new approach to publishing

MLMI2-CSSI · Mar 18, 2024 · c03fd76 · c03fd76
1 parent fbdf3ff
commit c03fd76
Showing 1 changed file with 74 additions and 24 deletions.
diff --git a/examples/publishing-guides/dataset_publishing.ipynb b/examples/publishing-guides/dataset_publishing.ipynb
@@ -156,7 +156,7 @@
     "id": "eA1nPvoZe68H"
    },
    "source": [
-    "This section describes and defines the variables for all of the possible arguments you could pass to `f.publish()`, for illustrative purposes."
+    "This section describes and defines the variables for all of elements needed to construct a FoundryDataset object and publish it."
    ]
   },
   {
@@ -263,6 +263,34 @@
     "The DataCite Metadata Schema is a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes. More information about this schema and the larger DataCite project can be found at https://datacite.org/"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "example_iris_datacite = {'identifier': {'identifier': '10.xx/xx', 'identifierType': 'DOI'},\n",
+    "                         'rightsList': [{'rights': 'CC-BY 4.0'}],\n",
+    "                         'creators': [{'creatorName': 'Brown, C', 'familyName': 'Brown', 'givenName': 'Charles'},\n",
+    "                                      {'creatorName': 'Van Pelt, L', 'familyName': 'Van Pelt', 'givenName': 'Lucia'}],\n",
+    "                         'subjects': [{'subject': 'blockheads'},\n",
+    "                                      {'subject': 'foundry'},\n",
+    "                                      {'subject': 'test_data'}],\n",
+    "                         'publicationYear': 2024,\n",
+    "                         'publisher': 'Materials Data Facility',\n",
+    "                         'dates': [{'date': '2024-08-03', 'dateType': 'Accepted'}],\n",
+    "                         'titles': [{'title': \"You're a Good man, Charlie Brown\"}],\n",
+    "                         'resourceType': {'resourceTypeGeneral': 'Dataset', \n",
+    "                                          'resourceType': 'Dataset'}}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating a FoundryDataset object"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -271,6 +299,12 @@
    },
    "outputs": [],
    "source": [
+    "\"\"\"\n",
+    "This is the depricated way of adding all of the datacite information via kwargs to the foundry.publish() method.\n",
+    "Keeping it around for now as we might want to create a datacite_generator function that can create a datacite json\n",
+    "object from kwargs, so folks don't have to mess with json formatting.\n",
+    "\"\"\"\n",
+    "\n",
     "from datetime import datetime\n",
     "timestamp = datetime.now().timestamp()\n",
     "\n",
@@ -301,43 +335,40 @@
     "publication_year = 2023"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have the metadata and datacite information contained in the json objects we created above, we can create an instance of a FoundryDataset object. This serves as a container to hold and organize all of the data as well as the metadata for the dataset. We just need one additional bit of information which is a `dataset name` associated with the dataset."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "{'identifier': {'identifier': '10.xx/xx', 'identifierType': 'DOI'},\n",
-    " 'rightsList': [{'rights': 'CC-BY 4.0'}],\n",
-    " 'creators': [{'creatorName': 'Brown, C', 'familyName': 'Brown', 'givenName': 'Charles'},\n",
-    "              {'creatorName': 'Van Pelt, L', 'familyName': 'Van Pelt', 'givenName': 'Lucia'}],\n",
-    " 'subjects': [{'subject': 'blockheads'},\n",
-    "              {'subject': 'foundry'},\n",
-    "              {'subject': 'test_data'}],\n",
-    " 'publicationYear': 2024,\n",
-    " 'publisher': 'Materials Data Facility',\n",
-    " 'dates': [{'date': '2024-08-03', 'dateType': 'Accepted'}],\n",
-    " 'titles': [{'title': \"You're a Good man, Charlie Brown\"}],\n",
-    " 'resourceType': {'resourceTypeGeneral': 'Dataset', \n",
-    "                  'resourceType': 'Dataset'}}"
+    "dataset_name = 'charlies_iris_dataset'"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "3YNK1e5UfTaN"
-   },
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
-    "We won't use all of these variables in our call to `f.publish()`, because many of the default values for the parameters (such as \"MDF\" for `publisher`) work well for our use case. \n",
-    "\n",
-    "However, the **metadata**, **data path** (HTTPS) or **data source** (Globus Connect Client), **title**, and **authors** are all required."
+    "from foundry import FoundryDataset"
    ]
   },
   {
-   "cell_type": "markdown",
+   "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
+   "outputs": [],
    "source": [
-    "Note that instead of `https_data_path`, you'll want to specify `globus_data_source` if you are uploading data using Globus Connect Client instead of HTTPS (see _Uploading via Globus Connect Client_ at the end of this notebook)."
+    "iris_dataset = FoundryDataset(dataset_name, \n",
+    "                              example_iris_metadata, \n",
+    "                              example_iris_datacite)"
    ]
   },
   {
@@ -349,6 +380,24 @@
     "## Publishing to Foundry"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "3YNK1e5UfTaN"
+   },
+   "source": [
+    "We won't use all of these variables in our call to `f.publish()`, because many of the default values for the parameters (such as \"MDF\" for `publisher`) work well for our use case. \n",
+    "\n",
+    "However, the **metadata**, **data path** (HTTPS) or **data source** (Globus Connect Client), **title**, and **authors** are all required."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that instead of `https_data_path`, you'll want to specify `globus_data_source` if you are uploading data using Globus Connect Client instead of HTTPS (see _Uploading via Globus Connect Client_ at the end of this notebook)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -358,7 +407,8 @@
    "outputs": [],
    "source": [
     "# publish to Foundry! returns a result object we can inspect\n",
-    "res = f.publish_dataset(example_iris_metadata, title, authors, https_data_path=data_path, short_name=short_name)"
+    "res = f.publish_dataset(iris_dataset, \n",
+    "                        https_data_path=data_path)"
    ]
   },
   {