1.Label Embedded Dictionary Learning for Image Classification pdf
Recently, label consistent k-svd(LC-KSVD) algorithm has been successfully applied in image classification. The objective function of LC-KSVD is consisted of reconstruction error, classification error and discriminative sparse codes error with l0-norm sparse regularization term. The l0-norm, however, leads to NP-hard issue. Despite some methods such as orthogonal matching pursuit can help solve this problem to some extent, it is quite difficult to find the optimum sparse solution. To overcome this limitation, we propose a label embedded dictionary learning(LEDL) method to utilise the
$\ell_1$ -norm as the sparse regularization term so that we can avoid the hard-to-optimize problem by solving the convex optimization problem. Alternating direction method of multipliers and blockwise coordinate descent algorithm are then used to optimize the corresponding objective function. Extensive experimental results on six benchmark datasets illustrate that the proposed algorithm has achieved superior performance compared to some conventional classification algorithms.
2.Attack Type Agnostic Perceptual Enhancement of Adversarial Images pdf
Adversarial images are samples that are intentionally modified to deceive machine learning systems. They are widely used in applications such as CAPTHAs to help distinguish legitimate human users from bots. However, the noise introduced during the adversarial image generation process degrades the perceptual quality and introduces artificial colours; making it also difficult for humans to classify images and recognise objects. In this letter, we propose a method to enhance the perceptual quality of these adversarial images. The proposed method is attack type agnostic and could be used in association with the existing attacks in the literature. Our experiments show that the generated adversarial images have lower Euclidean distance values while maintaining the same adversarial attack performance. Distances are reduced by 5.88% to 41.27% with an average reduction of 22% over the different attack and network types.
3.Correction of Electron Back-scattered Diffraction datasets using an evolutionary algorithm pdf
In materials science and particularly electron microscopy, Electron Back-scatter Diffraction (EBSD) is a common and powerful mapping technique for collecting local crystallographic data at the sub-micron scale. The quality of the reconstruction of the maps is critical to study the spatial distribution of phases and crystallographic orientation relationships between phases, a key interest in materials science. However, EBSD data is known to suffer from distortions that arise from several instrument and detector artifacts. In this paper, we present an unsupervised method that corrects those distortions, and enables or enhances phase differentiation in EBSD data. The method uses a segmented electron image of the phases of interest (laths, precipitates, voids, inclusions) gathered using detectors that generate less distorted data, of the same area than the EBSD map, and then searches for the best transformation to correct the distortions of the initial EBSD data. To do so, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is implemented to distort the EBSD until it matches the reference electron image. Fast and versatile, this method does not require any human annotation and can be applied to large datasets and wide areas, where the distortions are important. Besides, this method requires very little assumption concerning the shape of the distortion function. Some application examples in multiphase materials with feature sizes down to 1 $\mu$m are presented, including a Titanium alloy and a Nickel-base superalloy.
4.Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention pdf
Image representations are commonly learned from class labels, which are a simplistic approximation of human image understanding. In this paper we demonstrate that transferable representations of images can be learned without manual annotations by modeling human visual attention. The basis of our analyses is a unique gaze tracking dataset of sonographers performing routine clinical fetal anomaly screenings. Models of sonographer visual attention are learned by training a convolutional neural network (CNN) to predict gaze on ultrasound video frames through visual saliency prediction or gaze-point regression. We evaluate the transferability of the learned representations to the task of ultrasound standard plane detection in two contexts. Firstly, we perform transfer learning by fine-tuning the CNN with a limited number of labeled standard plane images. We find that fine-tuning the saliency predictor is superior to training from random initialization, with an average F1-score improvement of 9.6% overall and 15.3% for the cardiac planes. Secondly, we train a simple softmax regression on the feature activations of each CNN layer in order to evaluate the representations independently of transfer learning hyper-parameters. We find that the attention models derive strong representations, approaching the precision of a fully-supervised baseline model for all but the last layer.
5.Temporal Registration in Application to In-utero MRI Time Series pdf
We present a robust method to correct for motion in volumetric in-utero MRI time series. Time-course analysis for in-utero volumetric MRI time series often suffers from substantial and unpredictable fetal motion. Registration provides voxel correspondences between images and is commonly employed for motion correction. Current registration methods often fail when aligning images that are substantially different from a template (reference image). To achieve accurate and robust alignment, we make a Markov assumption on the nature of motion and take advantage of the temporal smoothness in the image data. Forward message passing in the corresponding hidden Markov model (HMM) yields an estimation algorithm that only has to account for relatively small motion between consecutive frames. We evaluate the utility of the temporal model in the context of in-utero MRI time series alignment by examining the accuracy of propagated segmentation label maps. Our results suggest that the proposed model captures accurately the temporal dynamics of transformations in in-utero MRI time series.
6.COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis pdf
There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks. However, most existing datasets for instructional video analysis have the limitations in diversity and scale,which makes them far from many real-world applications where more diverse activities occur. Moreover, it still remains a great challenge to organize and harness such data. To address these problems, we introduce a large-scale dataset called "COIN" for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated effectively with a series of step descriptions and the corresponding temporal boundaries. Furthermore, we propose a simple yet effective method to capture the dependencies among different steps, which can be easily plugged into conventional proposal-based action detection methods for localizing important steps in instructional videos. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under different evaluation criteria. We expect the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community.
7.Exploit fully automatic low-level segmented PET data for training high-level deep learning algorithms for the corresponding CT data pdf
We present an approach for fully automatic urinary bladder segmentation in CT images with artificial neural networks in this study. Automatic medical image analysis has become an invaluable tool in the different treatment stages of diseases. Especially medical image segmentation plays a vital role, since segmentation is often the initial step in an image analysis pipeline. Since deep neural networks have made a large impact on the field of image processing in the past years, we use two different deep learning architectures to segment the urinary bladder. Both of these architectures are based on pre-trained classification networks that are adapted to perform semantic segmentation. Since deep neural networks require a large amount of training data, specifically images and corresponding ground truth labels, we furthermore propose a method to generate such a suitable training data set from Positron Emission Tomography/Computed Tomography image data. This is done by applying thresholding to the Positron Emission Tomography data for obtaining a ground truth and by utilizing data augmentation to enlarge the dataset. In this study, we discuss the influence of data augmentation on the segmentation results, and compare and evaluate the proposed architectures in terms of qualitative and quantitative segmentation performance. The results presented in this study allow concluding that deep neural networks can be considered a promising approach to segment the urinary bladder in CT images.
8.Active Scene Learning pdf
Sketch recognition allows natural and efficient interaction in pen-based interfaces. A key obstacle to building accurate sketch recognizers has been the difficulty of creating large amounts of annotated training data. Several authors have attempted to address this issue by creating synthetic data, and by building tools that support efficient annotation. Two prominent sets of approaches stand out from the rest of the crowd. They use interim classifiers trained with a small set of labeled data to aid the labeling of the remainder of the data. The first set of approaches uses a classifier trained with a partially labeled dataset to automatically label unlabeled instances. The others, based on active learning, save annotation effort by giving priority to labeling informative data instances. The former is sub-optimal since it doesn't prioritize the order of labeling to favor informative instances, while the latter makes the strong assumption that unlabeled data comes in an already segmented form (i.e. the ink in the training data is already assembled into groups forming isolated object instances). In this paper, we propose an active learning framework that combines the strengths of these methods, while addressing their weaknesses. In particular, we propose two methods for deciding how batches of unsegmented sketch scenes should be labeled. The first method, scene-wise selection, assesses the informativeness of each drawing (sketch scene) as a whole, and asks the user to annotate all objects in the drawing. The latter, segment-wise selection, attempts more precise targeting to locate informative fragments of drawings for user labeling. We show that both selection schemes outperform random selection. Furthermore, we demonstrate that precise targeting yields superior performance. Overall, our approach allows reaching top accuracy figures with up to 30% savings in annotation cost.
9.Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up pdf
Given a training dataset composed of images and corresponding category labels, deep convolutional neural networks show a strong ability in mining discriminative parts for image classification. However, deep convolutional neural networks trained with image level labels only tend to focus on the most discriminative parts while missing other object parts, which could provide complementary information. In this paper, we approach this problem from a different perspective. We build complementary parts models in a weakly supervised manner to retrieve information suppressed by dominant object parts detected by convolutional neural networks. Given image level labels only, we first extract rough object instances by performing weakly supervised object detection and instance segmentation using Mask R-CNN and CRF-based segmentation. Then we estimate and search for the best parts model for each object instance under the principle of preserving as much diversity as possible. In the last stage, we build a bi-directional long short-term memory (LSTM) network to fuze and encode the partial information of these complementary parts into a comprehensive feature for image classification. Experimental results indicate that the proposed method not only achieves significant improvement over our baseline models, but also outperforms state-of-the-art algorithms by a large margin (6.7%, 2.8%, 5.2% respectively) on Stanford Dogs 120, Caltech-UCSD Birds 2011-200 and Caltech 256.
10.SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction pdf
In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful understanding of their social behaviors. These behaviors have been well investigated by plenty of studies, while it is hard to be fully expressed by hand-craft rules. Recent studies based on LSTM networks have shown great ability to learn social behaviors. However, many of these methods rely on previous neighboring hidden states but ignore the important current intention of the neighbors. In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. To effectively extract the social effect of neighbors, we further introduce a social-aware information selection mechanism consisting of an element-wise motion gate and a pedestrian-wise attention to select useful message from neighboring pedestrians. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed SR-LSTM and we achieves state-of-the-art results.
11.Hair Segmentation on Time-of-Flight RGBD Images pdf
Robust segmentation of hair from portrait images remains challenging: hair does not conform to a uniform shape, style or even color; dark hair in particular lacks features. We present a novel computational imaging solution that tackles the problem from both input and processing fronts. We explore using Time-of-Flight (ToF) RGBD sensors on recent mobile devices. We first conduct a comprehensive analysis to show that scattering and inter-reflection cause different noise patterns on hair vs. non-hair regions on ToF images, by changing the light path and/or combining multiple paths. We then develop a deep network based approach that employs both ToF depth map and the RGB gradient maps to produce an initial hair segmentation with labeled hair components. We then refine the result by imposing ToF noise prior under the conditional random field. We collect the first ToF RGBD hair dataset with 20k+ head images captured on 30 human subjects with a variety of hairstyles at different view angles. Comprehensive experiments show that our approach outperforms the RGB based techniques in accuracy and robustness and can handle traditionally challenging cases such as dark hair, similar hair/background, similar hair/foreground, etc.
12.Using DP Towards A Shortest Path Problem-Related Application pdf
The detection of curved lanes is still challenging for autonomous driving systems. Although current cutting-edge approaches have performed well in real applications, most of them are based on strict model assumptions. Similar to other visual recognition tasks, lane detection can be formulated as a two-dimensional graph searching problem, which can be solved by finding several optimal paths along with line segments and boundaries. In this paper, we present a directed graph model, in which dynamic programming is used to deal with a specific shortest path problem. This model is particularly suitable to represent objects with long continuous shape structure, e.g., lanes and roads. We apply the designed model and proposed an algorithm for detecting lanes by formulating it as the shortest path problem. To evaluate the performance of our proposed algorithm, we tested five sequences (including 1573 frames) from the KITTI database. The results showed that our method achieves an average successful detection precision of 97.5%.
13.RAVEN: A Dataset for Relational and Analogical Visual rEasoNing pdf
Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and human intelligence in terms of higher-level vision problems, especially ones involving reasoning. Earlier attempts in equipping machines with high-level reasoning have hovered around Visual Question Answering (VQA), one typical task associating vision and language understanding. In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. Unlike previous works in measuring abstract reasoning using RPM, we establish a semantic link between vision and reasoning by providing structure representation. This addition enables a new type of abstract reasoning by jointly operating on the structure representation. Machine reasoning ability using modern computer vision is evaluated in this newly proposed dataset. Additionally, we also provide human performance as a reference. Finally, we show consistent improvement across all models by incorporating a simple neural module that combines visual understanding and structure reasoning.
14.CE-Net: Context Encoder Network for 2D Medical Image Segmentation pdf
Medical image segmentation is an important step in medical image analysis. With the rapid development of convolutional neural network in image processing, deep learning has been used for medical image segmentation, such as optic disc segmentation, blood vessel detection, lung segmentation, cell segmentation, etc. Previously, U-net based approaches have been proposed. However, the consecutive pooling and strided convolutional operations lead to the loss of some spatial information. In this paper, we propose a context encoder network (referred to as CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation. CE-Net mainly contains three major components: a feature encoder module, a context extractor and a feature decoder module. We use pretrained ResNet block as the fixed feature extractor. The context extractor module is formed by a newly proposed dense atrous convolution (DAC) block and residual multi-kernel pooling (RMP) block. We applied the proposed CE-Net to different 2D medical image segmentation tasks. Comprehensive results show that the proposed method outperforms the original U-Net method and other state-of-the-art methods for optic disc segmentation, vessel detection, lung segmentation, cell contour segmentation and retinal optical coherence tomography layer segmentation.
15.Learning deep neural networks in blind deblurring framework pdf
Recently, end-to-end learning methods based on deep neural network (DNN) have been proven effective for blind deblurring. Without human-made assumptions and numerical algorithms, they are able to restore blurry images with fewer artifacts and better perceptual quality. However, without the theoretical guidance, these methods sometimes generate unreasonable results and often perform worse when the motion is complex. In this paper, for overcoming these drawbacks, we integrate deep convolution neural networks into conventional deblurring framework. Specifically, we build Stacked Estimate Residual Net (SEN) to estimate the motion flow map and Recurrent Prior Generative and Adversarial Net (RP-GAN) to learn an image prior constrained term in half-quadratic splitting algorithm. The generator and discriminators are also designed to be adaptive to the iterative optimization. Comparing with state-of-the-art end-to-end learning based methods, our method restores reasonable details and shows better generalization ability.
16.Graphical Contrastive Losses for Scene Graph Generation pdf
Most scene graph generators use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e.g. multiple cups). The second, Proximal Relationship Ambiguity, arises when multiple subject-predicate-object triplets appear in close proximity with the same predicate, and the model struggles to infer the correct subject-object pairings (e.g. mis-pairing musicians and their instruments). We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph generation problem, collectively termed the Graphical Contrastive Losses. These losses explicitly force the model to disambiguate related and unrelated instances through margin constraints specific to each type of confusion. We further construct a relationship detector, called RelDN, using the aforementioned pipeline to demonstrate the efficacy of our proposed losses. Our model outperforms the winning method of the OpenImages Relationship Detection Challenge by 4.7% (16.5% relative) on the test set. We also show improved results over the best previous methods on the Visual Genome and Visual Relationship Detection datasets.
17.Alternating Phase Projected Gradient Descent with Generative Priors for Solving Compressive Phase Retrieval pdf
The classical problem of phase retrieval arises in various signal acquisition systems. Due to the ill-posed nature of the problem, the solution requires assumptions on the structure of the signal. In the last several years, sparsity and support-based priors have been leveraged successfully to solve this problem. In this work, we propose replacing the sparsity/support priors with generative priors and propose two algorithms to solve the phase retrieval problem. Our proposed algorithms combine the ideas from AltMin approach for non-convex sparse phase retrieval and projected gradient descent approach for solving linear inverse problems using generative priors. We empirically show that the performance of our method with projected gradient descent is superior to the existing approach for solving phase retrieval under generative priors. We support our method with an analysis of sample complexity with Gaussian measurements.
18.Robust Semantic Segmentation By Dense Fusion Network On Blurred VHR Remote Sensing Images pdf
Robust semantic segmentation of VHR remote sensing images from UAV sensors is critical for earth observation, land use, land cover or mapping applications. Several factors such as shadows, weather disruption and camera shakes making this problem highly challenging, especially only using RGB images. In this paper, we propose the use of multi-modality data including NIR, RGB and DSM to increase robustness of segmentation in blurred or partially damaged VHR remote sensing images. By proposing a cascaded dense encoder-decoder network and the SELayer based fusion and assembling techniques, the proposed RobustDenseNet achieves steady performance when the image quality is decreasing, compared with the state-of-the-art semantic segmentation model.
19.Novel quantitative indicators of digital ophthalmoscopy image quality pdf
With the advent of smartphone indirect ophthalmoscopy, teleophthalmology - the use of specialist ophthalmology assets at a distance from the patient - has experienced a breakthrough, promising enormous benefits especially for healthcare in distant, inaccessible or opthalmologically underserved areas, where specialists are either unavailable or too few in number. However, accurate teleophthalmology requires high-quality ophthalmoscopic imagery. This paper considers three feature families - statistical metrics, gradient-based metrics and wavelet transform coefficient derived indicators - as possible metrics to identify unsharp or blurry images. By using standard machine learning techniques, the suitability of these features for image quality assessment is confirmed, albeit on a rather small data set. With the increased availability and decreasing cost of digital ophthalmoscopy on one hand and the increased prevalence of diabetic retinopathy worldwide on the other, creating tools that can determine whether an image is likely to be diagnostically suitable can play a significant role in accelerating and streamlining the teleophthalmology process. This paper highlights the need for more research in this area, including the compilation of a diverse database of ophthalmoscopic imagery, annotated with quality markers, to train the Point of Acquisition error detection algorithms of the future.
20.Stratified Labeling for Surface Consistent Parallax Correction and Occlusion Completion pdf
The light field faithfully records the spatial and angular configurations of the scene, which facilitates a wide range of imaging possibilities. In this work, we propose a Light Field (LF) rendering algorithm which renders high quality novel LF views far outside the range of angular baselines of the given references. A stratified rendering strategy is adopted which parses the scene contents based on stratified disparity layers and across a varying range of spatial granularity. Such stratified methodology proves to help preserve scene content structures over large perspective shifts, and it provides informative clues for inferring the textures of occluded regions. A Generative-Adversarial Network model has been adopted for parallax correction and occlusion completion conditioned on the stratified rendering features. Experiments show that our proposed model can provide more reliable novel view rendering quality at large baseline expansion ratios. Over 3dB quality improvement has been achieved against state-of-the-art LF view rendering algorithms.
21.Synthetic Human Model Dataset for Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction pdf
We introduce a synthetic dataset for evaluating non-rigid 3D human reconstruction based on conventional RGB-D cameras. The dataset consist of seven motion sequences of a single human model. For each motion sequence per-frame ground truth geometry and ground truth skeleton are given. The dataset also contains skinning weights of the human model. More information about the dataset can be found at: this https URL
22.Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning pdf
Our goal in this paper is to discover near duplicate patterns in large collections of artworks. This is harder than standard instance mining due to differences in the artistic media (oil, pastel, drawing, etc), and imperfections inherent in the copying process. The key technical insight is to adapt a standard deep feature to this task by fine-tuning it on the specific art collection using self-supervised learning. More specifically, spatial consistency between neighbouring feature matches is used as supervisory fine-tuning signal. The adapted feature leads to more accurate style-invariant matching, and can be used with a standard discovery approach, based on geometric verification, to identify duplicate patterns in the dataset. The approach is evaluated on several different datasets and shows surprisingly good qualitative discovery results. For quantitative evaluation of the method, we annotated 273 near duplicate details in a dataset of 1587 artworks attributed to Jan Brueghel and his workshop. Beyond artwork, we also demonstrate improvement on localization on the Oxford5K photo dataset as well as on historical photograph localization on the Large Time Lags Location (LTLL) dataset.
23.Understanding Ancient Coin Images pdf
In recent years, a range of problems within the broad umbrella of automatic, computer vision based analysis of ancient coins has been attracting an increasing amount of attention. Notwithstanding this research effort, the results achieved by the state of the art in the published literature remain poor and far from sufficiently well performing for any practical purpose. In the present paper we present a series of contributions which we believe will benefit the interested community. Firstly, we explain that the approach of visual matching of coins, universally adopted in all existing published papers on the topic, is not of practical interest because the number of ancient coin types exceeds by far the number of those types which have been imaged, be it in digital form (e.g. online) or otherwise (traditional film, in print, etc.). Rather, we argue that the focus should be on the understanding of the semantic content of coins. Hence, we describe a novel method which uses real-world multimodal input to extract and associate semantic concepts with the correct coin images and then using a novel convolutional neural network learn the appearance of these concepts. Empirical evidence on a real-world and by far the largest data set of ancient coins, we demonstrate highly promising results.
24.IMEXnet: A Forward Stable Deep Neural Network pdf
Deep convolutional neural networks have revolutionized many machine learning and computer vision tasks. Despite their enormous success, remaining key challenges limit their wider use. Pressing challenges include improving the network's robustness to perturbations of the input images and simplifying the design of architectures that generalize. Another problem relates to the limited "field of view" of convolution operators, which means that very deep networks are required to model nonlocal relations in high-resolution image data. We introduce the IMEXnet that addresses these challenges by adapting semi-implicit methods for partial differential equations. Compared to similar explicit networks such as the residual networks (ResNets) our network is more stable. This stability has been recently shown to reduce the sensitivity to small changes in the input features and improve generalization. The implicit step connects all pixels in the images and therefore addresses the field of view problem, while being comparable to standard convolutions in terms of the number of parameters and computational complexity. We also present a new dataset for semantic segmentation and demonstrate the effectiveness of our architecture using the NYU depth dataset.
25.Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video pdf
Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real-time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky.
26.Characterization of Posidonia Oceanica Seagrass Aerenchyma through Whole Slide Imaging: A Pilot Study pdf
Characterizing the tissue morphology and anatomy of seagrasses is essential to predicting their acoustic behavior. In this pilot study, we use histology techniques and whole slide imaging (WSI) to describe the composition and topology of the aerenchyma of an entire leaf blade in an automatic way combining the advantages of X-ray microtomography and optical microscopy. Paraffin blocks are prepared in such a way that microtome slices contain an arbitrarily large number of cross sections distributed along the full length of a blade. The sample organization in the paraffin block coupled with whole slide image analysis allows high throughput data extraction and an exhaustive characterization along the whole blade length. The core of the work are image processing algorithms that can identify cells and air lacunae (or void) from fiber strand, epidermis, mesophyll and vascular system. A set of specific features is developed to adequately describe the convexity of cells and voids where standard descriptors fail. The features scrutinize the local curvature of the object borders to allow an accurate discrimination between void and cell through machine learning. The algorithm allows to reconstruct the cells and cell membrane features that are relevant to tissue density, compressibility and rigidity. Size distribution of the different cell types and gas spaces, total biomass and total void volume fraction are then extracted from the high resolution slices to provide a complete characterization of the tissue along the leave from its base to the apex.
27.GanDef: A GAN based Adversarial Training Defense for Neural Network Classifier pdf
Machine learning models, especially neural network (NN) classifiers, are widely used in many applications including natural language processing, computer vision and cybersecurity. They provide high accuracy under the assumption of attack-free scenarios. However, this assumption has been defied by the introduction of adversarial examples -- carefully perturbed samples of input that are usually misclassified. Many researchers have tried to develop a defense against adversarial examples; however, we are still far from achieving that goal. In this paper, we design a Generative Adversarial Net (GAN) based adversarial training defense, dubbed GanDef, which utilizes a competition game to regulate the feature selection during the training. We analytically show that GanDef can train a classifier so it can defend against adversarial examples. Through extensive evaluation on different white-box adversarial examples, the classifier trained by GanDef shows the same level of test accuracy as those trained by state-of-the-art adversarial training defenses. More importantly, GanDef-Comb, a variant of GanDef, could utilize the discriminator to achieve a dynamic trade-off between correctly classifying original and adversarial examples. As a result, it achieves the highest overall test accuracy when the ratio of adversarial examples exceeds 41.7%.
28.Deep Learning in Medical Image Registration: A Survey pdf
The establishment of image correspondence through robust image registration is critical to many clinical tasks such as image fusion, organ atlas creation, and tumor growth monitoring, and is a very challenging problem. Since the beginning of the recent deep learning renaissance, the medical imaging research community has developed deep learning based approaches and achieved the state-of-the-art in many applications, including image registration. The rapid adoption of deep learning for image registration applications over the past few years necessitates a comprehensive summary and outlook, which is the main scope of this survey. This requires placing a focus on the different research areas as well as highlighting challenges that practitioners face. This survey, therefore, outlines the evolution of deep learning based medical image registration in the context of both research challenges and relevant innovations in the past few years. Further, this survey highlights future research directions to show how this field may be possibly moved forward to the next level.