diff --git a/docs/dev-docs/file-upload.md b/docs/dev-docs/file-upload.md new file mode 100644 index 000000000..71325ba0f --- /dev/null +++ b/docs/dev-docs/file-upload.md @@ -0,0 +1,30 @@ +### How File Upload works in Chaise + +There're a sequence of operations that are performed to upload the files that can be tracked in `upload-progress-modal.tsx`. This component interacts with `hatrac.js` in `ERMrestJS` to communicate with the hatrac server. + +1. The component receives an array of rows from the recordedit app. These rows contain objects for files to be uploaded. The code iterates over all rows, looking for `File` objects and creates an `UploadFileObject` type object for each file to be uploaded. It calls `calculateChecksum` if there are any files to upload. + +2. `calculateChecksum` calls `calculateChecksum` in `hatrac.js` for the `hatracObj`. It keeps track of checksum calculation progress for each file and once all are done it calls `checkFileExists`. + - After each checksum is completed, use the returned url to check if an existing file upload job exists in the local memory for this file + - If it does, mark this file upload job as a partial upload for continuing upload instead of restarting + - currently the "local memory" is only within the same javascript session and is NOT persisted to local storage (future implementation) + +3. `checkFileExists` function checks whether a file already exists calling `fileExists` in `hatrac.js` for the `hatracObj`. A parameter including the `previousJobUrl` is passed to this call for resuming file upload. It keeps track of the `checkFileExists` calls progress for each file and once all are done it calls `createUploadJob`. + - If the file already exists, creating the upload job is skipped and marked as complete. `filesToUploadCt` is reduced by 1 + - If there is a 403 returned (job/file exists but the current user can't read it), use the same job with a new version + - If there is a 409 returned, it could mean the namespace already exists + - If this occurs, check if we have an existing job for that namespace we know is partially uploaded + +4. `createUploadJob` creates an upload job for each file calling `createUploadJob` in `hatrac.js` for the `hatracObj`. It keeps track of the upload job progress for each file and once all are done it calls `startUpload`. + - If the file was marked to be skipped, the upload job is marked as complete (and never created) + +5. `startUpload` function calls `startQuededUpload` which will then queue files to be uploaded. This queue is iterated over using `startQuededUpload` which calls the `start` function in `hatrac.js` for the `hatracObj` for all files. A parameter including the `startChunkIdx` is passed to this call for resuming file upload. It keeps track of the upload progress for each file and once all are done it calls `doQueuedJobCompletion`. + - If a `startChunkIdx` is passed, `hatrac.js` will continue a previous file upload job from the `startChunkIdx` instead of the start of the file + - During the file upload process, after each chunk has been completed, update the `lastContiguousChunkMap` with information about the current file upload job in case it is interrupted and might be resumed later + +6. `completeUpload` calls `hatrac.js` to send the upload job closure call to Ermrest for all files. It keeps track of the completed jobs progress for each file and once all are done it sets the url in the row and closes the modal. + - When an upload job is completed, update the `lastContiguousChunkMap` for that upload job in case an interruption occurs while finalizing all of the uploads + +7. The recordedit app listens to the modal close event. It saves the rows that were updated while uploading files by calling ermrest. + +8. During above steps if there is any checksum/Network error, in some cases, all uploads are aborted, the modal closes, and the recordedit app renders an error message. \ No newline at end of file diff --git a/docs/dev-docs/manual-test.md b/docs/dev-docs/manual-test.md index 192bf52f3..7001b6c1e 100644 --- a/docs/dev-docs/manual-test.md +++ b/docs/dev-docs/manual-test.md @@ -141,6 +141,18 @@ In [ErmrestDataUtils](https://github.com/informatics-isi-edu/ErmrestDataUtils), - `cd /var/www/hatrac/js/chaise///` - `ls -al` to list all contents and file sizes +## Test resume file upload + - Test that a file upload process can be resumed in recordedit app when the connection to the server is lost (without refreshing the page) + - NOTE: I suggest modifying the # of retries in ermrestJS so this can be tested quicker. + 1. Go to recordedit for the `upload:file` table created using the script above. + 2. Fill in the for and use a file that is > 25 MB. + 3. Open the Developer console to the network tab. + 4. Submit the form. + 5. Once you see 4 similar request in the console output for uploading chunks, change your network connection from "No throttling" to "Offline". + - This will simulate the connection to the server being lost and chaise will return the user to recordedit with an error alert showing. + 6. Change the network connection back to "No throttling" and click submit again. + 7. Watching the network tab still, verify that the original 4 chunks are not uploaded again and that the file upload process succeeds. + # Testing Session Timed out (and different user) data mutation events The UX currently doesn't update the user when their session state has changed. In some cases a user could log in and navigate to a page that allows create or update, then have their log in status change prior to submitting the data to be mutated. They could have had their session time out (treated as an anonymous user) or changed to a different user entirely. This pertains to create/update in `recordedit`, pure and binary add in `record`, and anywhere that we show tables with shows thatcan be deleted. diff --git a/src/components/modals/upload-progress-modal.tsx b/src/components/modals/upload-progress-modal.tsx index 4b78dbe9c..393365fd4 100644 --- a/src/components/modals/upload-progress-modal.tsx +++ b/src/components/modals/upload-progress-modal.tsx @@ -10,10 +10,14 @@ import useStateRef from '@isrd-isi-edu/chaise/src/hooks/state-ref'; import useRecordedit from '@isrd-isi-edu/chaise/src/hooks/recordedit'; // models -import { FileObject, UploadFileObject } from '@isrd-isi-edu/chaise/src/models/recordedit'; +import { + FileObject, LastChunkMap, + LastChunkObject, UploadFileObject +} from '@isrd-isi-edu/chaise/src/models/recordedit'; // utils import { humanFileSize } from '@isrd-isi-edu/chaise/src/utils/input-utils'; +import { fixedEncodeURIComponent } from '@isrd-isi-edu/chaise/src/utils/uri-utils'; export interface UploadProgressModalProps { /** @@ -36,7 +40,7 @@ export interface UploadProgressModalProps { const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgressModalProps) => { - const { reference } = useRecordedit(); + const { reference, setLastContiguousChunk, lastContiguousChunkRef } = useRecordedit(); const [title, setTitle] = useState(''); @@ -116,7 +120,7 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress let tempFilesCt = 0, tempTotalSize = 0; // Iterate over all rows that are passed as parameters to the modal controller - rows.forEach((row: any) => { + rows.forEach((row: any, rowIdx: number) => { // Create a tuple for the row const tuple: UploadFileObject[] = []; @@ -145,7 +149,7 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress tempFilesCt++; tempTotalSize += row[k].file.size; - tuple.push(createUploadFileObject(row[k], column, row)); + tuple.push(createUploadFileObject(row[k], column, row, rowIdx)); } else { row[k] = (row[k] && row[k].url && row[k].url.length) ? row[k].url : null; } @@ -209,7 +213,11 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress setIsFileExists(true); uploadRowsRef.current.forEach((row: UploadFileObject[]) => { row.forEach((item: UploadFileObject) => { - item.hatracObj.fileExists().then( + let previousJobUrl = null; + if (item.partialUpload) previousJobUrl = lastContiguousChunkRef?.current?.[item.uploadKey].jobUrl; + + // if there is a previous upload job that was tracked, send the url and use it for the upload job + item.hatracObj.fileExists(previousJobUrl).then( () => onFileExistSuccess(item), onError); }); @@ -258,7 +266,7 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress speedIntervalTimer = setInterval(() => { const diff = sizeTransferredRef.current - lastByteTransferredRef.current; setLastByteTransferred(sizeTransferredRef.current); - + if (diff > 0) setSpeed(humanFileSize(diff) + 'ps'); }, 1000); }; @@ -271,7 +279,10 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress const item = queueRef.current.shift(); if (!item) return; - item.hatracObj.start().then( + let startChunkIdx = 0; + if (item.partialUpload) startChunkIdx = lastContiguousChunkRef.current?.[item.uploadKey].lastChunkIdx + 1; + + item.hatracObj.start(startChunkIdx).then( () => onUploadCompleted(item), onError, (size: number) => onProgressChanged(item, size)); @@ -328,10 +339,11 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress * @param {FileObject} data - FileObject for the file column * @param {Ermrest.Column} column - Column Object * @param {Object} row - Json key value Object of row values from the recordedit form + * @param {number} rowIdx - index of the record form this UploadFileObject is associated with (the attached `row`'s index) * @desc * Creates an uploadFile obj to keep track of file and its upload. */ - const createUploadFileObject = (data: FileObject, column: any, row: any): UploadFileObject => { + const createUploadFileObject = (data: FileObject, column: any, row: any, rowIdx: number): UploadFileObject => { const file = data.file; const uploadFileObject: UploadFileObject = { @@ -341,6 +353,10 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress checksumProgress: 0, checksumPercent: 0, checksumCompleted: false, + partialUpload: false, + // a key for storing in `lastContiguousChunk` map made up of the file checksum, column name, and form index + // initialized without a checksum value since none has been calculated yet + uploadKey: `_${column.name}_${rowIdx}`, jobCreateDone: false, fileExistsDone: false, uploadCompleted: false, @@ -352,7 +368,8 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress url: '', column: column, reference: reference, - row: row + row: row, + rowIdx: rowIdx } return uploadFileObject; @@ -400,6 +417,38 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress ufo.checksumCompleted = true; ufo.url = url; setChecksumCompleted((prev: number) => prev + 1); + + // update uploadKey + // use the calculated md5 to see if we have a partial upload in case of a timeout + ufo.uploadKey = `${ufo.hatracObj.hash.md5_base64}_${ufo.column.name}_${ufo.rowIdx}` + // lastContiguousChunk initailized to null, make sure it has been defined (meaning an upload didn't complete) + if (lastContiguousChunkRef?.current) { + const lccMap: LastChunkObject = lastContiguousChunkRef.current[ufo.uploadKey]; + + // the 'jobUrl' we stored in lastContiguousChunkRef is what was returned from the server where each part of the path is url encoded + // do the same for the newly generated url only for comparison + // NOTE: handling the file upload path encoding is done per each folder in the path and is handled by the server + const urlParts = url.split('/'); + urlParts.forEach((part: string, idx: number) => { + urlParts[idx] = fixedEncodeURIComponent(part); + }) + const newUrl = urlParts.join('/'); + + // check for the following: + // - newUrl being part of a tracked partial upload job + // - lastChunkIdx is 0 or greater meaning partial upload job has some chunks uploaded + // - file size for partial upload job matches file to upload + // - there is no upload version set + // NOTE: newUrl and jobUrl should be almost the same, + // with jobUrl having more info appended about the existing uploadJob + if (lccMap?.jobUrl.indexOf(newUrl) > -1 && + lccMap.lastChunkIdx > -1 && + lccMap.fileSize === ufo.size && + !lccMap.uploadVersion + ) { + ufo.partialUpload = true; + } + } } }; @@ -490,6 +539,41 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress if (erred.current || aborted.current) return; + // when the chunks array is created at hatracObj, chunkTracker is initialized with n `empty` values where n is the number of chunks + if (ufo.hatracObj.chunkTracker.length > 0) { + for (let i = 0; i < ufo.hatracObj.chunkTracker.length; i++) { + // once we've found the first null or undefined value, set the lastChunkIdx to the index before the first null/undefined + // this could be index 0 which sets the value to -1, meaning no chunks have been uploaded yet + if (ufo.hatracObj.chunkTracker[i] === null || ufo.hatracObj.chunkTracker[i] === undefined) { + setLastContiguousChunk((prevVal: LastChunkMap) => { + let tempMap: LastChunkMap; + // lastContiguousChunk (prevVal) is null until a chunk has been uploaded + if (prevVal) { + tempMap = { ...prevVal } + } else { + tempMap = {}; + } + + if (!tempMap[ufo.uploadKey]) { + // intiialize the object for the uploadKey + tempMap[ufo.uploadKey] = { + lastChunkIdx: i - 1, + jobUrl: ufo.hatracObj.chunkUrl, + fileSize: ufo.size + } + } else { + // update the last chunk index + tempMap[ufo.uploadKey].lastChunkIdx = i - 1; + } + + return tempMap; + }); + + break; + } + } + } + // This code updates the individual progress bar for uploading file ufo.uploadStarted = true; ufo.completeUploadJob = false; @@ -558,6 +642,15 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress ufo.completeUploadJob = true; ufo.versionedUrl = url; + ufo.partialUpload = false; + setLastContiguousChunk((prevVal: LastChunkMap) => { + const tempMap: LastChunkMap = { ...prevVal } + + // set the uploadVersion to communicate the job has completed + // ensures an upload won't resume and instead be skipped to finished + if (tempMap[ufo.uploadKey]) tempMap[ufo.uploadKey].uploadVersion = url; + return tempMap; + }); // This code updates the main progress bar for job completion progress for all files let progress = 0; diff --git a/src/models/recordedit.ts b/src/models/recordedit.ts index e20f61ba3..bf26b36f2 100644 --- a/src/models/recordedit.ts +++ b/src/models/recordedit.ts @@ -103,7 +103,7 @@ export type RecordeditForeignkeyCallbacks = { * - domainFilterFormNumber: The formNumber that should be used for generating the filteredRef * (this is useful for multi-form input where we're not necessarily have the first form selected) */ - onAttemptToChange?: () => Promise<{allowed: boolean, domainFilterFormNumber?: number}>; + onAttemptToChange?: () => Promise<{ allowed: boolean, domainFilterFormNumber?: number }>; } export interface RecordeditColumnModel { @@ -151,6 +151,8 @@ export interface UploadFileObject { checksumProgress: number; checksumPercent: number; checksumCompleted: boolean; + partialUpload: boolean; + uploadKey: string; skipUploadJob?: boolean; jobCreateDone: boolean; fileExistsDone: boolean; @@ -164,7 +166,8 @@ export interface UploadFileObject { versionedUrl?: string; column: any, reference: any, - row: any + row: any, + rowIdx: number } export interface PrefillObject { @@ -190,6 +193,22 @@ export interface PrefillObject { rowname: any; } +export interface LastChunkMap { + /* key in the form of `${file.md5_base64}_${column_name}_${record_index}` */ + [key: string]: LastChunkObject; +} + +export interface LastChunkObject { + /* the index of the last chunk that was uploaded */ + lastChunkIdx: number; + /* the path to the file being uploaded and it's specific upload job */ + jobUrl: string; + /* the size of the file being uploaded */ + fileSize: number; + /* the path to the file after it's been uploaded and version info is generated */ + uploadVersion?: string; +} + export interface UploadProgressProps { /** * rows of data from recordedit form to get file values from diff --git a/src/providers/recordedit.tsx b/src/providers/recordedit.tsx index 2de27983e..ca513cc46 100644 --- a/src/providers/recordedit.tsx +++ b/src/providers/recordedit.tsx @@ -7,7 +7,7 @@ import useStateRef from '@isrd-isi-edu/chaise/src/hooks/state-ref'; // models import { - appModes, PrefillObject, RecordeditColumnModel, + appModes, LastChunkMap, PrefillObject, RecordeditColumnModel, RecordeditConfig, RecordeditDisplayMode, RecordeditForeignkeyCallbacks, RecordeditModalOptions } from '@isrd-isi-edu/chaise/src/models/recordedit'; import { LogActions, LogReloadCauses, LogStackPaths, LogStackTypes } from '@isrd-isi-edu/chaise/src/models/log'; @@ -98,6 +98,10 @@ export const RecordeditContext = createContext<{ showSubmitSpinner: boolean, resultsetProps?: ResultsetProps, uploadProgressModalProps?: UploadProgressProps, + /* for updating the last contiguous chunk tracking info */ + setLastContiguousChunk: (arg0: any) => void, + /* useRef react hook to current value */ + lastContiguousChunkRef: any, /* max rows allowed to add constant */ MAX_ROWS_TO_ADD: number, /** @@ -224,6 +228,28 @@ export default function RecordeditProvider({ const [showSubmitSpinner, setShowSubmitSpinner] = useState(false); const [resultsetProps, setResultsetProps] = useState(); const [uploadProgressModalProps, setUploadProgressModalProps] = useState(); + /* + * Object for keeping track of each file and their existing upload jobs so we can resume on interruption + * + * For example, we have the following 3 scenarios: + * 1. contiguous offset: 1; chunks in flight with index 2, 3; chunk completed with index 4 (after chunk at index 4 is acknowledged w/ 204) + * - [0, 1, empty, empty, 4] + * 2. contiguous offset: 1; chunks in flight with index 2; chunks completed with index 3, 4 (after chunk at index 3 is acknowledged w/ 204) + * - [0, 1, empty, 3, 4] + * 3. contiguous offset: 4; (after chunk at index 2 is acknowledged w/ 204) + * - [0, 1, 2, 3, 4] + * + * Object structure is as follows where index is the index of the last contiguous chunk that was uploaded. + * { + * `${file.md5_base64}_${column_name}_${record_index}`: { + * lastChunkIdx: index + * jobUrl: uploadJob.hash ( in the form of '/hatrac/path/to/file.png;upload/somehash') + * fileSize: size_in_bytes, + * uploadVersion: versioned_url ( in the form of '/hatrac/path/to/file.png:version') + * } + * } + */ + const [lastContiguousChunk, setLastContiguousChunk, lastContiguousChunkRef] = useStateRef(null); const [tuples, setTuples, tuplesRef] = useStateRef(Array.isArray(initialTuples) ? initialTuples : []); @@ -1064,6 +1090,8 @@ export default function RecordeditProvider({ showSubmitSpinner, resultsetProps, uploadProgressModalProps, + setLastContiguousChunk, + lastContiguousChunkRef, MAX_ROWS_TO_ADD: maxRowsToAdd, // log related: