author | title |
---|---|
Rick Stevens |
CANDLE Project KPP-1 Verification |
The CANDLE challenge problem is to solve large-scale machine learning (ML) problems for two cancer-related pilot applications: (1) predicting drug interactions and (2) predicting cancer phenotypes and treatment trajectories from patient documents. The CANDLE pilot application that involves predicting the state of molecular dynamics simulations is treated as a stretch goal. The CANDLE project has specific strategies to address these challenges. For the drug response problem, unsupervised ML methods are used to capture the complex, nonlinear relationships between the properties of drugs and the properties of the tumors to predict response to treatment and therefore develop a model that can provide treatment recommendations for a given tumor. For the treatment strategy problem, semisupervised ML is used to automatically read and encode millions of clinical reports into a form that can be computed upon. Each problem requires a different approach to the embedded learning problem, all of which are supported with the same scalable deep learning code in CANDLE.
The challenge for exascale manifests in the need to train large numbers of models. One need inherent to each of the pilot applications is producing high-resolution models that cover the space of specific predictions individualized in the precision medicine sense. For example, consider training a model that is specific to a certain drug and individual cancer. Starting with 1000 different cancer cell lines and 1000 different drugs, a leave-one-out strategy to create a high-resolution model for all drug-by-cancers requires approximately 1 million models. Yet, these models are similar enough that using a transfer learning strategy in which weights are shared during training in a way that avoids information leakage can significantly reduce the time needed to train a large set of models.
In practice, speedup related to weight sharing can be discussed in the context
of the challenge problem in terms of work actually done and the naive work
done. Consider work actually done
Several parameters exist when considering accelerated model training via the transfer of weights. These include how many transfer events, how to partition the input data, and how many epochs before a transfer occurs. Additional considerations include what weights to transfer and whether to allow those weights to be updated in subsequent models.
Table: Challenge problem details
Functional requirement Minimum criteria
Physical phenomena and Deep learning neural networks for cancer: feed-forward, associated models auto-encoder, recurrent neural networks.
Numerical approach, Gradient descent of model parameters, optimization and associated models of loss function, network activation function; regularization, and learning rate scaling methods.
Simulation details Large-scale ML solutions will be computed for the three cancer pilots:
Demonstration calculation The computation performed at scale will be standard requirements neural network computations, matrix multiplies, 2D convolutions, pooling, and so on. These will be specifically defined by the models chosen to demonstrate transfer learning. The computations performed at scale will require weight sharing.
List significant I/O, workflow, and/or third party library requirements that need facility support (Note: this is outside of standard MPI and ECP-supported software technologies)
Describe your problem inputs, setup, estimate of resource allocation, and runtime settings
Describe the artifacts produced during the simulation, e.g., output files, and the mechanisms that are used to process the output to verify that capabilities have been demonstrated.
Give evidence that
- The FOM measurement met threshold ($>50$)
- The executed problem met challenge problem minimum criteria