Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue file upload if recordedit hasn't reloaded #2457

Merged
merged 11 commits into from
May 29, 2024
30 changes: 30 additions & 0 deletions docs/dev-docs/file-upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
### How File Upload works in Chaise

There're a sequence of operations that are performed to upload the files that can be tracked in `upload-progress-modal.tsx`. This component interacts with `hatrac.js` in `ERMrestJS` to communicate with the hatrac server.

1. The component receives an array of rows from the recordedit app. These rows contain objects for files to be uploaded. The code iterates over all rows, looking for `File` objects and creates an `UploadFileObject` type object for each file to be uploaded. It calls `calculateChecksum` if there are any files to upload.

2. `calculateChecksum` calls `calculateChecksum` in `hatrac.js` for the `hatracObj`. It keeps track of checksum calculation progress for each file and once all are done it calls `checkFileExists`.
- After each checksum is completed, use the returned url to check if an existing file upload job exists in the local memory for this file
- If it does, mark this file upload job as a partial upload for continuing upload instead of restarting
- currently the "local memory" is only within the same javascript session and is NOT persisted to local storage (future implementation)

3. `checkFileExists` function checks whether a file already exists calling `fileExists` in `hatrac.js` for the `hatracObj`. A parameter including the `previousJobUrl` is passed to this call for resuming file upload. It keeps track of the `checkFileExists` calls progress for each file and once all are done it calls `createUploadJob`.
- If the file already exists, creating the upload job is skipped and marked as complete. `filesToUploadCt` is reduced by 1
- If there is a 403 returned (job/file exists but the current user can't read it), use the same job with a new version
- If there is a 409 returned, it could mean the namespace already exists
- If this occurs, check if we have an existing job for that namespace we know is partially uploaded

4. `createUploadJob` creates an upload job for each file calling `createUploadJob` in `hatrac.js` for the `hatracObj`. It keeps track of the upload job progress for each file and once all are done it calls `startUpload`.
- If the file was marked to be skipped, the upload job is marked as complete (and never created)

5. `startUpload` function calls `startQuededUpload` which will then queue files to be uploaded. This queue is iterated over using `startQuededUpload` which calls the `start` function in `hatrac.js` for the `hatracObj` for all files. A parameter including the `startChunkIdx` is passed to this call for resuming file upload. It keeps track of the upload progress for each file and once all are done it calls `doQueuedJobCompletion`.
- If a `startChunkIdx` is passed, `hatrac.js` will continue a previous file upload job from the `startChunkIdx` instead of the start of the file
- During the file upload process, after each chunk has been completed, update the `lastContiguousChunkMap` with information about the current file upload job in case it is interrupted and might be resumed later

6. `completeUpload` calls `hatrac.js` to send the upload job closure call to Ermrest for all files. It keeps track of the completed jobs progress for each file and once all are done it sets the url in the row and closes the modal.
- When an upload job is completed, update the `lastContiguousChunkMap` for that upload job in case an interruption occurs while finalizing all of the uploads

7. The recordedit app listens to the modal close event. It saves the rows that were updated while uploading files by calling ermrest.

8. During above steps if there is any checksum/Network error, in some cases, all uploads are aborted, the modal closes, and the recordedit app renders an error message.
12 changes: 12 additions & 0 deletions docs/dev-docs/manual-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,18 @@ In [ErmrestDataUtils](https://github.com/informatics-isi-edu/ErmrestDataUtils),
- `cd /var/www/hatrac/js/chaise/<timestamp_txt-value>/<id-value>/`
- `ls -al` to list all contents and file sizes

## Test resume file upload
- Test that a file upload process can be resumed in recordedit app when the connection to the server is lost (without refreshing the page)
- NOTE: I suggest modifying the # of retries in ermrestJS so this can be tested quicker.
1. Go to recordedit for the `upload:file` table created using the script above.
2. Fill in the for and use a file that is > 25 MB.
3. Open the Developer console to the network tab.
4. Submit the form.
5. Once you see 4 similar request in the console output for uploading chunks, change your network connection from "No throttling" to "Offline".
- This will simulate the connection to the server being lost and chaise will return the user to recordedit with an error alert showing.
6. Change the network connection back to "No throttling" and click submit again.
7. Watching the network tab still, verify that the original 4 chunks are not uploaded again and that the file upload process succeeds.

# Testing Session Timed out (and different user) data mutation events
The UX currently doesn't update the user when their session state has changed. In some cases a user could log in and navigate to a page that allows create or update, then have their log in status change prior to submitting the data to be mutated. They could have had their session time out (treated as an anonymous user) or changed to a different user entirely. This pertains to create/update in `recordedit`, pure and binary add in `record`, and anywhere that we show tables with shows thatcan be deleted.

Expand Down
111 changes: 102 additions & 9 deletions src/components/modals/upload-progress-modal.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,14 @@ import useStateRef from '@isrd-isi-edu/chaise/src/hooks/state-ref';
import useRecordedit from '@isrd-isi-edu/chaise/src/hooks/recordedit';

// models
import { FileObject, UploadFileObject } from '@isrd-isi-edu/chaise/src/models/recordedit';
import {
FileObject, LastChunkMap,
LastChunkObject, UploadFileObject
} from '@isrd-isi-edu/chaise/src/models/recordedit';

// utils
import { humanFileSize } from '@isrd-isi-edu/chaise/src/utils/input-utils';
import { fixedEncodeURIComponent } from '@isrd-isi-edu/chaise/src/utils/uri-utils';

export interface UploadProgressModalProps {
/**
Expand All @@ -36,7 +40,7 @@ export interface UploadProgressModalProps {

const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgressModalProps) => {

const { reference } = useRecordedit();
const { reference, setLastContiguousChunk, lastContiguousChunkRef } = useRecordedit();

const [title, setTitle] = useState<string>('');

Expand Down Expand Up @@ -116,7 +120,7 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
let tempFilesCt = 0,
tempTotalSize = 0;
// Iterate over all rows that are passed as parameters to the modal controller
rows.forEach((row: any) => {
rows.forEach((row: any, rowIdx: number) => {

// Create a tuple for the row
const tuple: UploadFileObject[] = [];
Expand Down Expand Up @@ -145,7 +149,7 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
tempFilesCt++;
tempTotalSize += row[k].file.size;

tuple.push(createUploadFileObject(row[k], column, row));
tuple.push(createUploadFileObject(row[k], column, row, rowIdx));
} else {
row[k] = (row[k] && row[k].url && row[k].url.length) ? row[k].url : null;
}
Expand Down Expand Up @@ -209,7 +213,11 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
setIsFileExists(true);
uploadRowsRef.current.forEach((row: UploadFileObject[]) => {
row.forEach((item: UploadFileObject) => {
item.hatracObj.fileExists().then(
let previousJobUrl = null;
if (item.partialUpload) previousJobUrl = lastContiguousChunkRef?.current?.[item.uploadKey].jobUrl;

// if there is a previous upload job that was tracked, send the url and use it for the upload job
item.hatracObj.fileExists(previousJobUrl).then(
() => onFileExistSuccess(item),
onError);
});
Expand Down Expand Up @@ -258,7 +266,7 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
speedIntervalTimer = setInterval(() => {
const diff = sizeTransferredRef.current - lastByteTransferredRef.current;
setLastByteTransferred(sizeTransferredRef.current);

if (diff > 0) setSpeed(humanFileSize(diff) + 'ps');
}, 1000);
};
Expand All @@ -271,7 +279,10 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
const item = queueRef.current.shift();
if (!item) return;

item.hatracObj.start().then(
let startChunkIdx = 0;
if (item.partialUpload) startChunkIdx = lastContiguousChunkRef.current?.[item.uploadKey].lastChunkIdx + 1;

item.hatracObj.start(startChunkIdx).then(
() => onUploadCompleted(item),
onError,
(size: number) => onProgressChanged(item, size));
Expand Down Expand Up @@ -328,10 +339,11 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
* @param {FileObject} data - FileObject for the file column
* @param {Ermrest.Column} column - Column Object
* @param {Object} row - Json key value Object of row values from the recordedit form
* @param {number} rowIdx - index of the record form this UploadFileObject is associated with (the attached `row`'s index)
* @desc
* Creates an uploadFile obj to keep track of file and its upload.
*/
const createUploadFileObject = (data: FileObject, column: any, row: any): UploadFileObject => {
const createUploadFileObject = (data: FileObject, column: any, row: any, rowIdx: number): UploadFileObject => {
const file = data.file;

const uploadFileObject: UploadFileObject = {
Expand All @@ -341,6 +353,10 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
checksumProgress: 0,
checksumPercent: 0,
checksumCompleted: false,
partialUpload: false,
// a key for storing in `lastContiguousChunk` map made up of the file checksum, column name, and form index
// initialized without a checksum value since none has been calculated yet
uploadKey: `_${column.name}_${rowIdx}`,
jobCreateDone: false,
fileExistsDone: false,
uploadCompleted: false,
Expand All @@ -352,7 +368,8 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
url: '',
column: column,
reference: reference,
row: row
row: row,
rowIdx: rowIdx
}

return uploadFileObject;
Expand Down Expand Up @@ -400,6 +417,38 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress
ufo.checksumCompleted = true;
ufo.url = url;
setChecksumCompleted((prev: number) => prev + 1);

// update uploadKey
// use the calculated md5 to see if we have a partial upload in case of a timeout
ufo.uploadKey = `${ufo.hatracObj.hash.md5_base64}_${ufo.column.name}_${ufo.rowIdx}`
// lastContiguousChunk initailized to null, make sure it has been defined (meaning an upload didn't complete)
if (lastContiguousChunkRef?.current) {
const lccMap: LastChunkObject = lastContiguousChunkRef.current[ufo.uploadKey];

// the 'jobUrl' we stored in lastContiguousChunkRef is what was returned from the server where each part of the path is url encoded
// do the same for the newly generated url only for comparison
// NOTE: handling the file upload path encoding is done per each folder in the path and is handled by the server
const urlParts = url.split('/');
urlParts.forEach((part: string, idx: number) => {
urlParts[idx] = fixedEncodeURIComponent(part);
})
const newUrl = urlParts.join('/');

// check for the following:
// - newUrl being part of a tracked partial upload job
// - lastChunkIdx is 0 or greater meaning partial upload job has some chunks uploaded
// - file size for partial upload job matches file to upload
// - there is no upload version set
// NOTE: newUrl and jobUrl should be almost the same,
// with jobUrl having more info appended about the existing uploadJob
if (lccMap?.jobUrl.indexOf(newUrl) > -1 &&
lccMap.lastChunkIdx > -1 &&
lccMap.fileSize === ufo.size &&
!lccMap.uploadVersion
) {
ufo.partialUpload = true;
}
}
}
};

Expand Down Expand Up @@ -490,6 +539,41 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress

if (erred.current || aborted.current) return;

// when the chunks array is created at hatracObj, chunkTracker is initialized with n `empty` values where n is the number of chunks
if (ufo.hatracObj.chunkTracker.length > 0) {
for (let i = 0; i < ufo.hatracObj.chunkTracker.length; i++) {
// once we've found the first null or undefined value, set the lastChunkIdx to the index before the first null/undefined
// this could be index 0 which sets the value to -1, meaning no chunks have been uploaded yet
if (ufo.hatracObj.chunkTracker[i] === null || ufo.hatracObj.chunkTracker[i] === undefined) {
setLastContiguousChunk((prevVal: LastChunkMap) => {
let tempMap: LastChunkMap;
// lastContiguousChunk (prevVal) is null until a chunk has been uploaded
if (prevVal) {
tempMap = { ...prevVal }
} else {
tempMap = {};
}

if (!tempMap[ufo.uploadKey]) {
// intiialize the object for the uploadKey
tempMap[ufo.uploadKey] = {
lastChunkIdx: i - 1,
jobUrl: ufo.hatracObj.chunkUrl,
fileSize: ufo.size
}
} else {
// update the last chunk index
tempMap[ufo.uploadKey].lastChunkIdx = i - 1;
}

return tempMap;
});

break;
}
}
}

// This code updates the individual progress bar for uploading file
ufo.uploadStarted = true;
ufo.completeUploadJob = false;
Expand Down Expand Up @@ -558,6 +642,15 @@ const UploadProgressModal = ({ rows, show, onSuccess, onCancel }: UploadProgress

ufo.completeUploadJob = true;
ufo.versionedUrl = url;
ufo.partialUpload = false;
setLastContiguousChunk((prevVal: LastChunkMap) => {
const tempMap: LastChunkMap = { ...prevVal }

// set the uploadVersion to communicate the job has completed
// ensures an upload won't resume and instead be skipped to finished
if (tempMap[ufo.uploadKey]) tempMap[ufo.uploadKey].uploadVersion = url;
return tempMap;
});

// This code updates the main progress bar for job completion progress for all files
let progress = 0;
Expand Down
23 changes: 21 additions & 2 deletions src/models/recordedit.ts
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ export type RecordeditForeignkeyCallbacks = {
* - domainFilterFormNumber: The formNumber that should be used for generating the filteredRef
* (this is useful for multi-form input where we're not necessarily have the first form selected)
*/
onAttemptToChange?: () => Promise<{allowed: boolean, domainFilterFormNumber?: number}>;
onAttemptToChange?: () => Promise<{ allowed: boolean, domainFilterFormNumber?: number }>;
}

export interface RecordeditColumnModel {
Expand Down Expand Up @@ -151,6 +151,8 @@ export interface UploadFileObject {
checksumProgress: number;
checksumPercent: number;
checksumCompleted: boolean;
partialUpload: boolean;
uploadKey: string;
skipUploadJob?: boolean;
jobCreateDone: boolean;
fileExistsDone: boolean;
Expand All @@ -164,7 +166,8 @@ export interface UploadFileObject {
versionedUrl?: string;
column: any,
reference: any,
row: any
row: any,
rowIdx: number
}

export interface PrefillObject {
Expand All @@ -190,6 +193,22 @@ export interface PrefillObject {
rowname: any;
}

export interface LastChunkMap {
/* key in the form of `${file.md5_base64}_${column_name}_${record_index}` */
[key: string]: LastChunkObject;
}

export interface LastChunkObject {
/* the index of the last chunk that was uploaded */
lastChunkIdx: number;
/* the path to the file being uploaded and it's specific upload job */
jobUrl: string;
/* the size of the file being uploaded */
fileSize: number;
/* the path to the file after it's been uploaded and version info is generated */
uploadVersion?: string;
}

export interface UploadProgressProps {
/**
* rows of data from recordedit form to get file values from
Expand Down
30 changes: 29 additions & 1 deletion src/providers/recordedit.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import useStateRef from '@isrd-isi-edu/chaise/src/hooks/state-ref';

// models
import {
appModes, PrefillObject, RecordeditColumnModel,
appModes, LastChunkMap, PrefillObject, RecordeditColumnModel,
RecordeditConfig, RecordeditDisplayMode, RecordeditForeignkeyCallbacks, RecordeditModalOptions
} from '@isrd-isi-edu/chaise/src/models/recordedit';
import { LogActions, LogReloadCauses, LogStackPaths, LogStackTypes } from '@isrd-isi-edu/chaise/src/models/log';
Expand Down Expand Up @@ -98,6 +98,10 @@ export const RecordeditContext = createContext<{
showSubmitSpinner: boolean,
resultsetProps?: ResultsetProps,
uploadProgressModalProps?: UploadProgressProps,
/* for updating the last contiguous chunk tracking info */
setLastContiguousChunk: (arg0: any) => void,
/* useRef react hook to current value */
lastContiguousChunkRef: any,
/* max rows allowed to add constant */
MAX_ROWS_TO_ADD: number,
/**
Expand Down Expand Up @@ -224,6 +228,28 @@ export default function RecordeditProvider({
const [showSubmitSpinner, setShowSubmitSpinner] = useState(false);
const [resultsetProps, setResultsetProps] = useState<ResultsetProps | undefined>();
const [uploadProgressModalProps, setUploadProgressModalProps] = useState<UploadProgressProps | undefined>();
/*
* Object for keeping track of each file and their existing upload jobs so we can resume on interruption
*
* For example, we have the following 3 scenarios:
* 1. contiguous offset: 1; chunks in flight with index 2, 3; chunk completed with index 4 (after chunk at index 4 is acknowledged w/ 204)
* - [0, 1, empty, empty, 4]
* 2. contiguous offset: 1; chunks in flight with index 2; chunks completed with index 3, 4 (after chunk at index 3 is acknowledged w/ 204)
* - [0, 1, empty, 3, 4]
* 3. contiguous offset: 4; (after chunk at index 2 is acknowledged w/ 204)
* - [0, 1, 2, 3, 4]
*
* Object structure is as follows where index is the index of the last contiguous chunk that was uploaded.
* {
* `${file.md5_base64}_${column_name}_${record_index}`: {
* lastChunkIdx: index
* jobUrl: uploadJob.hash ( in the form of '/hatrac/path/to/file.png;upload/somehash')
* fileSize: size_in_bytes,
* uploadVersion: versioned_url ( in the form of '/hatrac/path/to/file.png:version')
* }
* }
*/
const [lastContiguousChunk, setLastContiguousChunk, lastContiguousChunkRef] = useStateRef<LastChunkMap | null>(null);

const [tuples, setTuples, tuplesRef] = useStateRef<any[]>(Array.isArray(initialTuples) ? initialTuples : []);

Expand Down Expand Up @@ -1064,6 +1090,8 @@ export default function RecordeditProvider({
showSubmitSpinner,
resultsetProps,
uploadProgressModalProps,
setLastContiguousChunk,
lastContiguousChunkRef,
MAX_ROWS_TO_ADD: maxRowsToAdd,

// log related:
Expand Down
Loading