fairdataihub · slugb0t · Aug 11, 2023 · Aug 11, 2023 · Aug 11, 2023 · Aug 11, 2023
@@ -1,9 +1,8 @@
 ## [2.12.3](https://github.com/fairdataihub/dev.fairdataihub.org/compare/v2.12.2...v2.12.3) (2023-07-31)
 
-
 ### Bug Fixes
 
-* **deps:** update dependency mermaid to v10.3.0 ([#138](https://github.com/fairdataihub/dev.fairdataihub.org/issues/138)) ([2c76932](https://github.com/fairdataihub/dev.fairdataihub.org/commit/2c76932700736f2e621cc61341b5450d9242e69b))
+- **deps:** update dependency mermaid to v10.3.0 ([#138](https://github.com/fairdataihub/dev.fairdataihub.org/issues/138)) ([2c76932](https://github.com/fairdataihub/dev.fairdataihub.org/commit/2c76932700736f2e621cc61341b5450d9242e69b))
 
 ## [2.12.2](https://github.com/fairdataihub/dev.fairdataihub.org/compare/v2.12.1...v2.12.2) (2023-07-03)
 

@@ -209,6 +209,14 @@ function sidebarGuide() {
           text: 'SODA Server',
           link: '/soda-for-sparc/soda-server.md',
         },
+        {
+          text: 'Upload Dataset Workflows',
+          link: '/soda-for-sparc/upload-workflow.md',
+        },
+        {
+          text: 'Import Pennsieve Dataset Workflow',
+          link: '/soda-for-sparc/dataset-import-workflow.md',
+        },
       ],
     },
 

@@ -0,0 +1,38 @@
+---
+lang: en-US
+title: Function flowcharts
+description: Understanding the import Pennsieve dataset function in SODA for SPARC
+---
+
+# Overview
+
+The page outlines how the import function works for Pennsieve datasets. It describes the backend process of using the Pennsieve API to fetch a dataset and import it into SODA for SPARC.
+
+## Import Pennsieve dataset
+
+The import Pennsieve dataset function is the process of importing a Pennsieve dataset into SODA for SPARC. The process is initiated by the client. The client sends
+request to the server to import. The server then goes through a series of steps to place all files/folders in the SODA JSON Structure.
+
+```mermaid
+graph TB
+  subgraph main["import_pennsieve-dataset()"]
+    direction TB
+    A[[Get access token]] .-> B[[Get dataset name from JSON structure]]
+    B .-> C[[Check if user has editing permissions]]
+    C .-> D[[Get dataset ID from Pennsieve]]
+    D .-> E[[Get amount of folders/files that need to be imported]]
+    E .-> F[[Iterate through the root folder and organize metadata files + high level folders]]
+    F .-> G[[Iterate through the high level folders to find manifest files]]
+    G .-> H[[If manifest file is found, create a dataframe from the manifest file and convert to dictionary]]
+
+    subgraph recursive["createFolderStructure()"]
+      direction TB
+      I[[Requests files/folders of current folder from Pennsieve]] --> J[[If files/folders are found, begin iterating through them and apply information to the SODA JSON structure]]
+      J --> K[[Apply manifest file information to the SODA JSON structure]]
+      K --> L[[Recusively iterate through each folder]]
+    end
+
+    H --> recursive
+  end
+
+```
@@ -0,0 +1,170 @@
+---
+lang: en-US
+title: Function flowcharts
+description: Understanding major functions in SODA for SPARC
+---
+
+# Overview
+
+This page outlines the major functions in SODA for SPARC. It describes the upload and import processes. It also describes the process of creating a new dataset and uploading data to it. To aid in understanding key concepts links to the flask_restx documentation are included.
+
+## Main upload process
+
+The main upload process goes through a series of checks to ensure that the upload process will be successful. When uploading to Pennsieve a valid Pennsieve dataset and account is needed to begin. Local files/folders that will be uploaded are also validated
+to ensure the paths are correct.
+The process is initiated by the client. The client sends a request to the SODA server to upload data to a Pennsieve dataset.
+The upload process is the process of uploading data to a Pennsieve dataset. The process is initiated by the client. The client sends a request to the server to upload data to a Pennsieve dataset. The server then sends a request to the Pennsieve Agent to
+upload the data. The Agent then uploads the data to the Pennsieve dataset. The server then sends a request to the Pennsieve service to import the data. The Pennsieve service then imports the data into the Pennsieve dataset.
+
+```mermaid
+graph TB
+A[SODA-for-SPARC] -- Upload Request --> B[(SODA Server)]
+
+subgraph main["main_curate_function()"]
+    direction TB
+    subgraph prechecks["Checking for potential errors"]
+        direction TB
+        C[[If local dataset, ensure destination is valid]] .-> D[[Verify Pennsieve dataset is valid]]
+        D .-> E[[Verify Pennsieve account is valid]]
+        E .-> F[[Ensure locally selected files paths are valid and are over 0KB]]
+        F .-> G[[If uploading to an existing dataset on Pennsieve check if files/folder are valid on Pennsieve]]
+    end
+    subgraph localgen[" "]
+        direction LR
+        I(generate_dataset_locally)
+    end
+    subgraph newgen[" "]
+        direction LR
+        K(ps_upload_to_dataset)
+    end
+    subgraph existinggen[" "]
+        direction LR
+        L(ps_update_existing_dataset)
+    end
+    prechecks -- Generate Locally --> localgen
+    prechecks -- Generate New Pennsieve Dataset --> newgen
+    prechecks -- Generate To Existing Pennsieve Dataset --> existinggen
+end
+
+B --> main
+```
+
+## Generate Dataset Locally
+
+When generating datasets locally, the server will gather all files/folders and create them in the SDS 2.0 format. The server will then create a dataset on the user's local machine.
+
+```mermaid
+graph LR
+
+subgraph localgen["generate_dataset_locally()"]
+    direction TB
+    A[[Create new folder for dataset or use existing folder if 'merge existing' is selected]]
+    A .-> B[[Scan the dataset structure and create all folders with new name if renamed]]
+    B .-> C[[Compile a list of files to be copied and a list of files to be moved with new name recorded if renamed]]
+    C .-> D[[Add high-level metadata files in the list]]
+    D .-> E[[Add manifest files in the list]]
+    E .-> F[[Add manifest files in the list]]
+    F .-> G[[Move files into new location]]
+    G .-> H[[Copy files into new location and track amount of copied files for loggin purposes]]
+    H .-> I[[Delete manifest folder and original folder if merge requested and rename new folder]]
+end
+
+
+```
+
+## Generate New Dataset To Pennsieve
+
+When generating new datasets to Pennsieve there are less pre-checks than
+when uploading to an existing dataset. The server will still check if the
+Pennsieve account and dataset are valid. Depending if the dataset is newly
+created on SODA and is being uploaded to a new or existing dataset will determine
+the level of checks needed.
+The server will recursively create the folders on Pennsieve before
+uploading to files to the dataset if you are uploading to an existing dataset
+even when uploading a newly created dataset. This ensures that
+existing files/folders are handled properly for renaming, moving and deleting.
+The server will then make a list of the files that will be uploaded to Pennsieve
+and send it to the Pennsieve Agent. When a dataset is newly created on SODA
+and being uploaded to a new Pennsieve dataset the folders will be created by the
+agent.
+The Agent will create a manifest of all the files that will be uploaded
+to Pennsieve. We add a subscriber to the upload process to track the progress
+of the upload. This process will happen again for metadata files and manifest files.
+Three manifest files will be created for the Pennsieve agent to upload the dataset.
+During this process, if files that need to be renamed are detected, the SODA server
+will create a dictionary of the files that need to be renamed and the new name. The
+relative path will be the parent key of these files. Once the upload is completed,
+the server will iterate through the relative paths to find the Pennsieve ID of the
+files that need to be renamed. At times not all files will have been processed
+by Pennsieve so the server will poll the relative path until necessary
+information is given to rename the files. Once all files have been renamed
+the upload is complete.
+
+### New SODA Dataset to New Pennsieve Dataset
+
+```mermaid
+graph LR
+
+subgraph newgen["ps_upload_to_dataset()"]
+    direction TB
+    A[[If dataset is newly created and being uploaded to a new dataset on Pennsieve]] .-> B[[Recursively add into a list all files' paths, name, and final name]]
+    B .-> C[[Gather metadata files into new list with their local paths]]
+    C .-> D[[Gather manifest files into new list with the local paths]]
+    D .-> E[[Iterate through list of gathered files to add to Pennsieve agent, if file has been renamed add information to another list to rename at the end of the uploads]]
+    E .-> F[[Using the Pennsieve agent, upload the files to Pennsieve]]
+    F .-> G[[Add metadata files to a manifest for the agent and upload]]
+    G .-> H[[Add manifest files to a manifest for the agent and upload]]
+    H .-> I[[If any files are in the rename dictionary, begin iterating through the relative folder paths to find the file's Pennsieve ID]]
+    I .-> J[[Store the package ID and iterate until all or most files have been renamed]]
+    J .-> K[[Once all files have been iterated to gather their ID's begin renaming the files on Pennsieve]]
+    K .-> L[[If any file does not have an ID try to see if file has been processed by Pennsieve and try to rename again until successful]]
+end
+```
+
+### New SODA Dataset to Existing Pennsieve Dataset
+
+```mermaid
+graph LR
+
+subgraph newgen["ps_upload_to_dataset()"]
+    direction TB
+    A[[Scan the dataset structure to create all non-existing folders on Pennsieve]] .-> B[[Create a tracking dictionary which would track the generation of the dataset on Pennsieve]]
+    B .-> C[[Iterate through list of gathered files to add to Pennsieve agent, if file has been renamed add information to another list to rename at the end of the uploads]]
+    C .-> D[[Add high level metadata files to a list]]
+    D .-> E[[Add manifest files to a list]]
+    E .-> F[[Using the Pennsieve agent, upload the files to Pennsieve]]
+    F .-> G[[Upload the Metadata files]]
+    G .-> H[[Upload the manifest files]]
+    H .-> I[[If any files are in the rename dictionary, begin iterating through the relative folder paths to find the file's Pennsieve ID]]
+    I .-> J[[Store the package ID and iterate until all or most files have been renamed]]
+    J .-> K[[Once all files have been iterated to gather their ID's begin renaming the files on Pennsieve]]
+    K .-> L[[If any file does not have an ID try to see if file has been processed by Pennsieve and try to rename again until successful]]
+end
+```
+
+## Generate Dataset To Existing Pennsieve Dataset
+
+When generating datasets to an existing Pennsieve dataset there are a few
+more pre-checks than when uploading to a new Pennsieve dataset.
+The server will still check if the Pennsieve account and dataset are valid but also
+determine if any existing files/folders have been marked as deleted, moved
+or renamed. Once files/folders that already exist on Pennsieve have been accounted
+the function ps_upload_to_dataset() will run, which will upload all newly
+added files.
+
+```mermaid
+graph LR
+
+subgraph existinggen["ps_update_existing_dataset()"]
+    direction TB
+    A[[Remove all existing files on Pennsieve that user deleted]] .-> B[[Rename and append '-DELETED' to any folders marked as deleted on Pennsieve]]
+    B .-> C[[Rename any folder done by the user]]
+    C .-> D[[Get the status of all files currently on Pennsieve and create the folder path for all items in dataset structure]]
+    D .-> E[[Move any files that are marked as moved on Pennsieve]]
+    E .-> F[[Rename any Pennsieve files that are marks as 'renamed']]
+    F .-> G[[Delete any Pennsieve folders that are marked as 'deleted']]
+    G .-> H[[Delete any metadata files that are marked as 'deleted']]
+    H .-> I[[Run the original code to upload any new files to Pennsieve dataset]]
+    I --> J(ps_upload_to_dataset)
+end
+```