1.Weather Influence and Classification with Automotive Lidar Sensors pdf
Lidar sensors are often used in mobile robots and autonomous vehicles to complement camera, radar and ultrasonic sensors for environment perception. Typically, perception algorithms are trained to only detect moving and static objects as well as ground estimation, but intentionally ignore weather effects to reduce false detections. In this work, we present an in-depth analysis of automotive lidar performance under harsh weather conditions, i.e. heavy rain and dense fog. An extensive data set has been recorded for various fog and rain conditions, which is the basis for the conducted in-depth analysis of the point cloud under changing environmental conditions. In addition, we introduce a novel approach to detect and classify rain or fog with lidar sensors only and achieve an mean union over intersection of 97.14 % for a data set in controlled environments. The analysis of weather influences on the performance of lidar sensors and the weather detection are important steps towards improving safety levels for autonomous driving in adverse weather conditions by providing reliable information to adapt vehicle behavior.
2.ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition pdf
Despite the recent success of deep learning models in numerous applications, their widespread use on mobile devices is seriously impeded by storage and computational requirements. In this paper, we propose a novel network compression method called Adaptive Dimension Adjustment Tucker decomposition (ADA-Tucker). With learnable core tensors and transformation matrices, ADA-Tucker performs Tucker decomposition of arbitrary-order tensors. Furthermore, we propose that weight tensors in networks with proper order and balanced dimension are easier to be compressed. Therefore, the high flexibility in decomposition choice distinguishes ADA-Tucker from all previous low-rank models. To compress more, we further extend the model to Shared Core ADA-Tucker (SCADA-Tucker) by defining a shared core tensor for all layers. Our methods require no overhead of recording indices of non-zero elements. Without loss of accuracy, our methods reduce the storage of LeNet-5 and LeNet-300 by ratios of 691 times and 233 times, respectively, significantly outperforming state of the art. The effectiveness of our methods is also evaluated on other three benchmarks (CIFAR-10, SVHN, ILSVRC12) and modern newly deep networks (ResNet, Wide-ResNet).
3.A Weakly Supervised Learning Based Clustering Framework pdf
A weakly supervised learning based clustering framework is proposed in this paper. As the core of this framework, we introduce a novel multiple instance learning task based on a bag level label called unique class count (
$ucc$ ), which is the number of unique classes among all instances inside the bag. In this task, no annotations on individual instances inside the bag are needed during training of the models. We mathematically prove that a perfect$ucc$ classifier, in principle, can be used to perfectly cluster individual instances inside the bags. In other words, perfect clustering of individual instances is possible even when no annotations on individual instances are given during training. We have constructed a neural network based$ucc$ classifier and experimentally shown that the clustering performance of our framework with our$ucc$ classifier is comparable to that of fully supervised learning models. We have also observed that our$ucc$ classifiers can potentially be used for zero-shot learning as they learn better semantic features than fully supervised models for `unseen classes', which have never been input into the models during training.
4.3D Geometric salient patterns analysis on 3D meshes pdf
Pattern analysis is a wide domain that has wide applicability in many fields. In fact, texture analysis is one of those fields, since the texture is defined as a set of repetitive or quasi-repetitive patterns. Despite its importance in analyzing 3D meshes, geometric texture analysis is less studied by geometry processing community. This paper presents a new efficient approach for geometric texture analysis on 3D triangular meshes. The proposed method is a scale-aware approach that takes as input a 3D mesh and a user-scale. It provides, as a result, a similarity-based clustering of texels in meaningful classes. Experimental results of the proposed algorithm are presented for both real-world and synthetic meshes within various textures. Furthermore, the efficiency of the proposed approach was experimentally demonstrated under mesh simplification and noise addition on the mesh surface. In this paper, we present a practical application for semantic annotation of 3D geometric salient texels.
5.Learning with Average Precision: Training Image Retrieval with a Listwise Loss pdf
Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-bound on the essential loss, which does not necessarily result in an optimal mean average precision (mAP). Second, these methods require significant engineering efforts to work well, e.g. special pre-training and hard-negative mining. In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. Using a histogram binning approximation, the AP can be differentiated and thus employed to end-to-end learning. Compared to existing losses, the proposed method considers thousands of images simultaneously at each iteration and eliminates the need for ad hoc tricks. It also establishes a new state of the art on many standard retrieval benchmarks. Models and evaluation scripts have been made available at this https URL
6.Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detection pdf
We introduce a detection framework for dense crowd counting and eliminate the need for the prevalent density regression paradigm. Typical counting models predict crowd density for an image as opposed to detecting every person. These regression methods, in general, fail to localize persons accurate enough for most applications other than counting. Hence, we adopt an architecture that locates every person in the crowd, sizes the spotted heads with bounding box and then counts them. Compared to normal object or face detectors, there exist certain unique challenges in designing such a detection system. Some of them are direct consequences of the huge diversity in dense crowds along with the need to predict boxes contiguously. We solve these issues and develop our LSC-CNN model, which can reliably detect heads of people across sparse to dense crowds. LSC-CNN employs a multi-column architecture with top-down feedback processing to better resolve persons and produce refined predictions at multiple resolutions. Interestingly, the proposed training regime requires only point head annotation, but can estimate approximate size information of heads. We show that LSC-CNN not only has superior localization than existing density regressors, but outperforms in counting as well. The code for our approach is available at this https URL.
7.Impoved RPN for Single Targets Detection based on the Anchor Mask Net pdf
Common target detection is usually based on single frame images, which is vulnerable to affected by the similar targets in the image and not applicable to video images. In this paper , anchor mask is proposed to add the prior knowledge for target detection and an anchor mask net is designed to impove the RPN performance for single target detection. Tested in the VOT2016, the model perform better.
8.A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks pdf
Acceleration of convolutional neural network has received increasing attention during the past several years. Among various acceleration techniques, filter pruning has its inherent merit by effectively reducing the number of convolution filters. However, most filter pruning methods resort to tedious and time-consuming layer-by-layer pruning-recovery strategy to avoid a significant drop of accuracy. In this paper, we present an efficient filter pruning framework to solve this problem. Our method accelerates the network in one-step pruning-recovery manner with a novel optimization objective function, which achieves higher accuracy with much less cost compared with existing pruning methods. Furthermore, our method allows network compression with global filter pruning. Given a global pruning rate, it can adaptively determine the pruning rate for each single convolutional layer, while these rates are often set as hyper-parameters in previous approaches. Evaluated on VGG-16 and ResNet-50 using ImageNet, our approach outperforms several state-of-the-art methods with less accuracy drop under the same and even much fewer floating-point operations (FLOPs).
9.Bicameral Structuring and Synthetic Imagery for Jointly Predicting Instance Boundaries and Nearby Occlusions from a Single Image pdf
Oriented boundary detection is a challenging task aimed at both delineating category-agnostic object instances and inferring their spatial layout from a single RGB image. State-of-the-art deep convolutional networks for this task rely on two independent streams that predict boundaries and occlusions respectively, although both require similar local and global cues, and occlusions cause boundaries. We therefore propose a fully convolutional bicameral structuring, composed of two cascaded decoders sharing one deep encoder, linked altogether by skip connections to combine local and global features, for jointly predicting instance boundaries and their unoccluded side. Furthermore, state-of-the-art datasets contain real images with few instances and occlusions mostly due to objects occluding the background, thereby missing meaningful occlusions between instances. For evaluating the missing scenario of dense piles of objects as well, we introduce synthetic data (Mikado), which extensibly contains more instances and inter-instance occlusions per image than the PASCAL Instance Occlusion Dataset (PIOD), the COCO Amodal dataset (COCOA), and the Densely Segmented Supermarket Amodal dataset (D2SA). We show that the proposed network design outperforms the two-stream baseline and alternative archiectures for oriented boundary detection on both PIOD and Mikado, and the amodal segmentation approach on COCOA as well. Our experiments on D2SA also show that Mikado is plausible in the sense that it enables the learning of performance-enhancing representations transferable to real data, while drastically reducing the need of hand-made annotations for finetuning.
10.Locality Preserving Joint Transfer for Domain Adaptation pdf
Domain adaptation aims to leverage knowledge from a well-labeled source domain to a poorly-labeled target domain. A majority of existing works transfer the knowledge at either feature level or sample level. Recent researches reveal that both of the paradigms are essentially important, and optimizing one of them can reinforce the other. Inspired by this, we propose a novel approach to jointly exploit feature adaptation with distribution matching and sample adaptation with landmark selection. During the knowledge transfer, we also take the local consistency between samples into consideration, so that the manifold structures of samples can be preserved. At last, we deploy label propagation to predict the categories of new instances. Notably, our approach is suitable for both homogeneous and heterogeneous domain adaptation by learning domain-specific projections. Extensive experiments on five open benchmarks, which consist of both standard and large-scale datasets, verify that our approach can significantly outperform not only conventional approaches but also end-to-end deep models. The experiments also demonstrate that we can leverage handcrafted features to promote the accuracy on deep features by heterogeneous adaptation.
11.Using colorization as a tool for automatic makeup suggestion pdf
Colorization is the method of converting an image in grayscale to a fully color image. There are multiple methods to do the same. Old school methods used machine learning algorithms and optimization techniques to suggest possible colors to use. With advances in the field of deep learning, colorization results have improved consistently with improvements in deep learning architectures. The latest development in the field of deep learning is the emergence of generative adversarial networks (GANs) which is used to generate information and not just predict or classify. As part of this report, 2 architectures of recent papers are reproduced along with a novel architecture being suggested for general colorization. Following this, we propose the use of colorization by generating makeup suggestions automatically on a face. To do this, a dataset consisting of 1000 images has been created. When an image of a person without makeup is sent to the model, the model first converts the image to grayscale and then passes it through the suggested GAN model. The output is a generated makeup suggestion. To develop this model, we need to tweak the general colorization model to deal only with faces of people.
12.Neural Illumination: Lighting Prediction for Indoor Environments pdf
This paper addresses the task of estimating the light arriving from all directions to a 3D point observed at a selected pixel in an RGB image. This task is challenging because it requires predicting a mapping from a partial scene observation by a camera to a complete illumination map for a selected position, which depends on the 3D location of the selection, the distribution of unobserved light sources, the occlusions caused by scene geometry, etc. Previous methods attempt to learn this complex mapping directly using a single black-box neural network, which often fails to estimate high-frequency lighting details for scenes with complicated 3D geometry. Instead, we propose "Neural Illumination" a new approach that decomposes illumination prediction into several simpler differentiable sub-tasks: 1) geometry estimation, 2) scene completion, and 3) LDR-to-HDR estimation. The advantage of this approach is that the sub-tasks are relatively easy to learn and can be trained with direct supervision, while the whole pipeline is fully differentiable and can be fine-tuned with end-to-end supervision. Experiments show that our approach performs significantly better quantitatively and qualitatively than prior work.
13.A sparse annotation strategy based on attention-guided active learning for 3D medical image segmentation pdf
3D image segmentation is one of the most important and ubiquitous problems in medical image processing. It provides detailed quantitative analysis for accurate disease diagnosis, abnormal detection, and classification. Currently deep learning algorithms are widely used in medical image segmentation, most algorithms trained models with full annotated datasets. However, obtaining medical image datasets is very difficult and expensive, and full annotation of 3D medical image is a monotonous and time-consuming work. Partially labelling informative slices in 3D images will be a great relief of manual annotation. Sample selection strategies based on active learning have been proposed in the field of 2D image, but few strategies focus on 3D images. In this paper, we propose a sparse annotation strategy based on attention-guided active learning for 3D medical image segmentation. Attention mechanism is used to improve segmentation accuracy and estimate the segmentation accuracy of each slice. The comparative experiments with three different strategies using datasets from the developing human connectome project (dHCP) show that, our strategy only needs 15% to 20% annotated slices in brain extraction task and 30% to 35% annotated slices in tissue segmentation task to achieve comparative results as full annotation.
14.Neural Multi-Scale Self-Supervised Registration for Echocardiogram Dense Tracking pdf
Echocardiography has become routinely used in the diagnosis of cardiomyopathy and abnormal cardiac blood flow. However, manually measuring myocardial motion and cardiac blood flow from echocardiogram is time-consuming and error-prone. Computer algorithms that can automatically track and quantify myocardial motion and cardiac blood flow are highly sought after, but have not been very successful due to noise and high variability of echocardiography. In this work, we propose a neural multi-scale self-supervised registration (NMSR) method for automated myocardial and cardiac blood flow dense tracking. NMSR incorporates two novel components: 1) utilizing a deep neural net to parameterize the velocity field between two image frames, and 2) optimizing the parameters of the neural net in a sequential multi-scale fashion to account for large variations within the velocity field. Experiments demonstrate that NMSR yields significantly better registration accuracy than state-of-the-art methods, such as advanced normalization tools (ANTs) and VoxelMorph, for both myocardial and cardiac blood flow dense tracking. Our approach promises to provide a fully automated method for fast and accurate analyses of echocardiograms.
15.Boosting CNN beyond Label in Inverse Problems pdf
Convolutional neural networks (CNN) have been extensively used for inverse problems. However, their prediction error for unseen test data is difficult to estimate a priori since the neural networks are trained using only selected data and their architecture are largely considered a blackbox. This poses a fundamental challenge to neural networks for unsupervised learning or improvement beyond the label. In this paper, we show that the recent unsupervised learning methods such as Noise2Noise, Stein's unbiased risk estimator (SURE)-based denoiser, and Noise2Void are closely related to each other in their formulation of an unbiased estimator of the prediction error, but each of them are associated with its own limitations. Based on these observations, we provide a novel boosting estimator for the prediction error. In particular, by employing combinatorial convolutional frame representation of encoder-decoder CNN and synergistically combining it with the batch normalization, we provide a close form formulation for the unbiased estimator of the prediction error that can be minimized for neural network training beyond the label. Experimental results show that the resulting algorithm, what we call Noise2Boosting, provides consistent improvement in various inverse problems under both supervised and unsupervised learning setting.
16.DeepView: View Synthesis with Learned Gradient Descent pdf
We present a novel approach to view synthesis using multiplane images (MPIs). Building on recent advances in learned gradient descent, our algorithm generates an MPI from a set of sparse camera viewpoints. The resulting method incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high depth complexity. We show that our method achieves high-quality, state-of-the-art results on two datasets: the Kalantari light field dataset, and a new camera array dataset, Spaces, which we make publicly available.
17.Using Discriminative Methods to Learn Fashion Compatibility Across Datasets pdf
Determining whether a pair of garments are compatible with each other is a challenging matching problem. Past works explored various embedding methods for learning such a relationship. This paper introduces using discriminative methods to learn compatibility, by formulating the task as a simple binary classification problem. We evaluate our approach using an established dataset of outfits created by non-experts and demonstrated an improvement of ~2.5% on established metrics over the state-of-the-art method. We introduce three new datasets of professionally curated outfits and show the consistent performance of our approach on expert-curated datasets. To facilitate comparing across outfit datasets, we propose a new metric which, unlike previously used metrics, is not biased by the average size of outfits. We also demonstrate that compatibility between two types of items can be query indirectly, and such query strategy yield improvements.
18.Content-aware Density Map for Crowd Counting and Density Estimation pdf
Precise knowledge about the size of a crowd, its density and flow can provide valuable information for safety and security applications, event planning, architectural design and to analyze consumer behavior. Creating a powerful machine learning model, to employ for such applications requires a large and highly accurate and reliable dataset. Unfortunately the existing crowd counting and density estimation benchmark datasets are not only limited in terms of their size, but also lack annotation, in general too time consuming to implement. This paper attempts to address this very issue through a content aware technique, uses combinations of Chan-Vese segmentation algorithm, two-dimensional Gaussian filter and brute-force nearest neighbor search. The results shows that by simply replacing the commonly used density map generators with the proposed method, higher level of accuracy can be achieved using the existing state of the art models.
19.Pose Guided Fashion Image Synthesis Using Deep Generative Model pdf
Generating a photorealistic image with intended human pose is a promising yet challenging research topic for many applications such as smart photo editing, movie making, virtual try-on, and fashion display. In this paper, we present a novel deep generative model to transfer an image of a person from a given pose to a new pose while keeping fashion item consistent. In order to formulate the framework, we employ one generator and two discriminators for image synthesis. The generator includes an image encoder, a pose encoder and a decoder. The two encoders provide good representation of visual and geometrical context which will be utilized by the decoder in order to generate a photorealistic image. Unlike existing pose-guided image generation models, we exploit two discriminators to guide the synthesis process where one discriminator differentiates between generated image and real images (training samples), and another discriminator verifies the consistency of appearance between a target pose and a generated image. We perform end-to-end training of the network to learn the parameters through back-propagation given ground-truth images. The proposed generative model is capable of synthesizing a photorealistic image of a person given a target pose. We have demonstrated our results by conducting rigorous experiments on two data sets, both quantitatively and qualitatively.
20.Hardware Aware Neural Network Architectures using FbNet pdf
We implement a differentiable Neural Architecture Search (NAS) method inspired by FBNet for discovering neural networks that are heavily optimized for a particular target device. The FBNet NAS method discovers a neural network from a given search space by optimizing over a loss function which accounts for accuracy and target device latency. We extend this loss function by adding an energy term. This will potentially enhance the ``hardware awareness" and help us find a neural network architecture that is optimal in terms of accuracy, latency and energy consumption, given a target device (Raspberry Pi in our case). We name our trained child architecture obtained at the end of search process as Hardware Aware Neural Network Architecture (HANNA). We prove the efficacy of our approach by benchmarking HANNA against two other state-of-the-art neural networks designed for mobile/embedded applications, namely MobileNetv2 and CondenseNet for CIFAR-10 dataset. Our results show that HANNA provides a speedup of about 2.5x and 1.7x, and reduces energy consumption by 3.8x and 2x compared to MobileNetv2 and CondenseNet respectively. HANNA is able to provide such significant speedup and energy efficiency benefits over the state-of-the-art baselines at the cost of a tolerable 4-5% drop in accuracy.
21.PolSAR Image Classification based on Polarimetric Scattering Coding and Sparse Support Matrix Machine pdf
POLSAR image has an advantage over optical image because it can be acquired independently of cloud cover and solar illumination. PolSAR image classification is a hot and valuable topic for the interpretation of POLSAR image. In this paper, a novel POLSAR image classification method is proposed based on polarimetric scattering coding and sparse support matrix machine. First, we transform the original POLSAR data to get a real value matrix by the polarimetric scattering coding, which is called polarimetric scattering matrix and is a sparse matrix. Second, the sparse support matrix machine is used to classify the sparse polarimetric scattering matrix and get the classification map. The combination of these two steps takes full account of the characteristics of POLSAR. The experimental results show that the proposed method can get better results and is an effective classification method.
22.High Speed and High Dynamic Range Video with an Event Camera pdf
Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality (> 20%), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos (> 5,000 frames per second) of high-speed phenomena (e.g. a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. We also demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data.
23.Expressing Visual Relationships via Language pdf
Describing images with text is a fundamental problem in vision-language research. Current studies in this domain mostly focus on single image captioning. However, in various real applications (e.g., image editing, difference interpretation, and retrieval), generating relational captions for two images, can also be very useful. This important problem has not been explored mostly due to lack of datasets and effective models. To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions. We then propose a new relational speaker model based on an encoder-decoder architecture with static relational attention and sequential multi-head attention. We also extend the model with dynamic relational attention, which calculates visual alignment while decoding. Our models are evaluated on our newly collected and two public datasets consisting of image pairs annotated with relationship sentences. Experimental results, based on both automatic and human evaluation, demonstrate that our model outperforms all baselines and existing methods on all the datasets.
24.Multiclass segmentation as multitask learning for drusen segmentation in retinal optical coherence tomography pdf
Automated drusen segmentation in retinal optical coherence tomography (OCT) scans is relevant for understanding age-related macular degeneration (AMD) risk and progression. This task is usually performed by segmenting the top/bottom anatomical interfaces that define drusen, the outer boundary of the retinal pigment epithelium (OBRPE) and the Bruch's membrane (BM), respectively. In this paper we propose a novel multi-decoder architecture that tackles drusen segmentation as a multitask problem. Instead of training a multiclass model for OBRPE/BM segmentation, we use one decoder per target class and an extra one aiming for the area between the layers. We also introduce connections between each class-specific branch and the additional decoder to increase the regularization effect of this surrogate task. We validated our approach on private/public data sets with 166 early/intermediate AMD Spectralis, and 200 AMD and control Bioptigen OCT volumes, respectively. Our method consistently outperformed several baselines in both layer and drusen segmentation evaluations.
25.Differentiable probabilistic models of scientific imaging with the Fourier slice theorem pdf
Scientific imaging techniques such as optical and electron microscopy and computed tomography (CT) scanning are used to study the 3D structure of an object through 2D observations. These observations are related to the original 3D object through orthogonal integral projections. For common 3D reconstruction algorithms, computational efficiency requires the modeling of the 3D structures to take place in Fourier space by applying the Fourier slice theorem. At present, it is unclear how to differentiate through the projection operator, and hence current learning algorithms can not rely on gradient based methods to optimize 3D structure models. In this paper we show how back-propagation through the projection operator in Fourier space can be achieved. We demonstrate the validity of the approach with experiments on 3D reconstruction of proteins. We further extend our approach to learning probabilistic models of 3D objects. This allows us to predict regions of low sampling rates or estimate noise. A higher sample efficiency can be reached by utilizing the learned uncertainties of the 3D structure as an unsupervised estimate of the model fit. Finally, we demonstrate how the reconstruction algorithm can be extended with an amortized inference scheme on unknown attributes such as object pose. Through empirical studies we show that joint inference of the 3D structure and the object pose becomes more difficult when the ground truth object contains more symmetries. Due to the presence of for instance (approximate) rotational symmetries, the pose estimation can easily get stuck in local optima, inhibiting a fine-grained high-quality estimate of the 3D structure.
26.An Attention-Guided Deep Regression Model for Landmark Detection in Cephalograms pdf
Cephalometric tracing method is usually used in orthodontic diagnosis and treat-ment planning. In this paper, we propose a deep learning based framework to au-tomatically detect anatomical landmarks in cephalometric X-ray images. We train the deep encoder-decoder for landmark detection, and combine global landmark configuration with local high-resolution feature responses. The proposed frame-work is based on 2-stage u-net, regressing the multi-channel heatmaps for land-mark detection. In this framework, we embed attention mechanism with global stage heatmaps, guiding the local stage inferring, to regress the local heatmap patches in a high resolution. Besides, the Expansive Exploration strategy im-proves robustness while inferring, expanding the searching scope without in-creasing model complexity. We have evaluated our framework in the most wide-ly-used public dataset of landmark detection in cephalometric X-ray images. With less computation and manually tuning, our framework achieves state-of-the-art results.
27.Deep Learning Enhanced Extended Depth-of-Field for Thick Blood-Film Malaria High-Throughput Microscopy pdf
Fast accurate diagnosis of malaria is still a global health challenge for which automated digital-pathology approaches could provide scalable solutions amenable to be deployed in low-to-middle income countries. Here we address the problem of Extended Depth-of-Field (EDoF) in thick blood film microscopy for rapid automated malaria diagnosis. High magnification oil-objectives (100x) with large numerical aperture are usually preferred to resolve the fine structural details that help separate true parasites from distractors. However, such objectives have a very limited depth-of-field requiring the acquisition of a series of images at different focal planes per field of view (FOV). Current EDoF techniques based on multi-scale decompositions are time consuming and therefore not suited for high-throughput analysis of specimens. To overcome this challenge, we developed a new deep learning method based on Convolutional Neural Networks (EDoF-CNN) that is able to rapidly perform the extended depth-of-field while also enhancing the spatial resolution of the resulting fused image. We evaluated our approach using simulated low-resolution z-stacks from Giemsa-stained thick blood smears from patients presenting with Plasmodium falciparum malaria. The EDoF-CNN allows speed-up of our digital-pathology acquisition platform and significantly improves the quality of the EDoF compared to the traditional multi-scaled approaches when applied to lower resolution stacks corresponding to acquisitions with fewer focal planes, large camera pixel binning or lower magnification objectives (larger FOV). We use the parasite detection accuracy of a deep learning model on the EDoFs as a concrete, task-specific measure of performance of this approach.
28.Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss pdf
Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains.
29.Active Scene Understanding via Online Semantic Reconstruction pdf
We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of the volumetric depth fusion framework (e.g., KinectFusion) and performs real-time voxel-based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.
30.A Conditional Random Field Model for Context Aware Cloud Detection in Sky Images pdf
A conditional random field (CRF) model for cloud detection in ground based sky images is presented. We show that very high cloud detection accuracy can be achieved by combining a discriminative classifier and a higher order clique potential in a CRF framework. The image is first divided into homogeneous regions using a mean shift clustering algorithm and then a CRF model is defined over these regions. The various parameters involved are estimated using training data and the inference is performed using Iterated Conditional Modes (ICM) algorithm. We demonstrate how taking spatial context into account can boost the accuracy. We present qualitative and quantitative results to prove the superior performance of this framework in comparison with other state of the art methods applied for cloud detection.
31.Cardiac Segmentation from LGE MRI Using Deep Neural Network Incorporating Shape and Spatial Priors pdf
Cardiac segmentation from late gadolinium enhancement MRI is an important task in clinics to identify and evaluate the infarction of myocardium. The automatic segmentation is however still challenging, due to the heterogeneous intensity distributions and indistinct boundaries in the images. In this paper, we propose a new method, based on deep neural networks (DNN), for fully automatic segmentation. The proposed network, referred to as SRSCN, comprises a shape reconstruction neural network (SRNN) and a spatial constraint network (SCN). SRNN aims to maintain a realistic shape of the resulting segmentation. It can be pre-trained by a set of label images, and then be embedded into a unified loss function as a regularization term. Hence, no manually designed feature is needed. Furthermore, SCN incorporates the spatial information of the 2D slices. It is formulated and trained with the segmentation network via the multi-task learning strategy. We evaluated the proposed method using 45 patients and compared with two state-of-the-art regularization schemes, i.e., the anatomically constraint neural network and the adversarial neural network. The results show that the proposed SRSCN outperformed the conventional schemes, and obtained a Dice score of 0.758(std=0.227) for myocardial segmentation, which compares with 0.757(std=0.083) from the inter-observer variations.
32.Learning Personalized Attribute Preference via Multi-task AUC Optimization pdf
Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understand and predict personalized attribute annotations. Regarding the attribute preference learning for each annotator as a specific task, we first propose a multi-level task parameter decomposition to capture the evolution from a highly popular opinion of the mass to highly personalized choices that are special for each person. Meanwhile, for personalized learning methods, ranking prediction is much more important than accurate classification. This motivates us to employ an Area Under ROC Curve (AUC) based loss function to improve our model. On top of the AUC-based loss, we propose an efficient method to evaluate the loss and gradients. Theoretically, we propose a novel closed-form solution for one of our non-convex subproblem, which leads to provable convergence behaviors. Furthermore, we also provide a generalization bound to guarantee a reasonable performance. Finally, empirical analysis consistently speaks to the efficacy of our proposed method.
33.4D CNN for semantic segmentation of cardiac volumetric sequences pdf
We propose a 4D convolutional neural network (CNN) for the segmentation of retrospective ECG-gated cardiac CT, a series of single-channel volumetric data over time. While only a small subset of volumes in the temporal sequence are annotated, we define a sparse loss function on available labels to allow the network to leverage unlabeled images during training and generate a fully segmented sequence. We investigate the accuracy of the proposed 4D network to predict temporally consistent segmentations and compare with traditional 3D segmentation approaches. We demonstrate the feasibility of the 4D CNN and establish its performance on cardiac 4D CCTA.
34.The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers pdf
Understanding if classifiers generalize to out-of-sample datasets is a central problem in machine learning. Microscopy images provide a standardized way to measure the generalization capacity of image classifiers, as we can image the same classes of objects under increasingly divergent, but controlled factors of variation. We created a public dataset of 132,209 images of mouse cells, COOS-7 (Cells Out Of Sample 7-Class). COOS-7 provides a classification setting where four test datasets have increasing degrees of covariate shift: some images are random subsets of the training data, while others are from experiments reproduced months later and imaged by different instruments. We benchmarked a range of classification models using different representations, including transferred neural network features, end-to-end classification with a supervised deep CNN, and features from a self-supervised CNN. While most classifiers perform well on test datasets similar to the training dataset, all classifiers failed to generalize their performance to datasets with greater covariate shifts. These baselines highlight the challenges of covariate shifts in image data, and establish metrics for improving the generalization capacity of image classifiers.
35.An IoT Based Framework For Activity Recognition Using Deep Learning Technique pdf
Activity recognition is the ability to identify and recognize the action or goals of the agent. The agent can be any object or entity that performs action that has end goals. The agents can be a single agent performing the action or group of agents performing the actions or having some interaction. Human activity recognition has gained popularity due to its demands in many practical applications such as entertainment, healthcare, simulations and surveillance systems. Vision based activity recognition is gaining advantage as it does not require any human intervention or physical contact with humans. Moreover, there are set of cameras that are networked with the intention to track and recognize the activities of the agent. Traditional applications that were required to track or recognize human activities made use of wearable devices. However, such applications require physical contact of the person. To overcome such challenges, vision based activity recognition system can be used, which uses a camera to record the video and a processor that performs the task of recognition. The work is implemented in two stages. In the first stage, an approach for the Implementation of Activity recognition is proposed using background subtraction of images, followed by 3D- Convolutional Neural Networks. The impact of using Background subtraction prior to 3D-Convolutional Neural Networks has been reported. In the second stage, the work is further extended and implemented on Raspberry Pi, that can be used to record a stream of video, followed by recognizing the activity that was involved in the video. Thus, a proof-of-concept for activity recognition using small, IoT based device, is provided, which can enhance the system and extend its applications in various forms like, increase in portability, networking, and other capabilities of the device.
36.Visual Navigation by Generating Next Expected Observations pdf
We propose a novel approach to visual navigation in unknown environments where the agent is guided by conceiving the next observations it expects to see after taking the next best action. This is achieved by learning a variational Bayesian model that generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view. Our approach predicts the next best action based on the current observation and NEO. Our generative model is learned through optimizing a variational objective encompassing two key designs. First, the latent distribution is conditioned on current observations and target view, supporting model-based, target-driven navigation. Second, the latent space is modeled with a Mixture of Gaussians conditioned on the current observation and next best action. Our use of mixture-of-posteriors prior effectively alleviates the issue of over-regularized latent space, thus facilitating model generalization in novel scenes. Moreover, the NEO generation models the forward dynamics of the agent-environment interaction, which improves the quality of approximate inference and hence benefits data efficiency. We have conducted extensive evaluations on both real-world and synthetic benchmarks, and show that our model outperforms the state-of-the-art RL-based methods significantly in terms of success rate, data efficiency, and cross-scene generalization.
37.Equivariant neural networks and equivarification pdf
We provide a process to modify a neural network to an equivariant one, which we call {\em equivarification}. As an illustration, we build an equivariant neural network for image classification by equivarifying a convolutional neural network.
38.Enforcing temporal consistency in Deep Learning segmentation of brain MR images pdf
Longitudinal analysis has great potential to reveal developmental trajectories and monitor disease progression in medical imaging. This process relies on consistent and robust joint 4D segmentation. Traditional techniques are dependent on the similarity of images over time and the use of subject-specific priors to reduce random variation and improve the robustness and sensitivity of the overall longitudinal analysis. This is however slow and computationally intensive as subject-specific templates need to be rebuilt every time. The focus of this work to accelerate this analysis with the use of deep learning. The proposed approach is based on deep CNNs and incorporates semantic segmentation and provides a longitudinal relationship for the same subject. The proposed approach is based on deep CNNs and incorporates semantic segmentation and provides a longitudinal relationship for the same subject. The state of art using 3D patches as inputs to modified Unet provides results around
${0.91 \pm 0.5}$ Dice and using multi-view atlas in CNNs provide around the same results. In this work, different models are explored, each offers better accuracy and fast results while increasing the segmentation quality. These methods are evaluated on 135 scans from the EADC-ADNI Harmonized Hippocampus Protocol. Proposed CNN based segmentation approaches demonstrate how 2D segmentation using prior slices can provide similar results to 3D segmentation while maintaining good continuity in the 3D dimension and improved speed. Just using 2D modified sagittal slices provide us a better Dice and longitudinal analysis for a given subject. For the ADNI dataset, using the simple UNet CNN technique gives us${0.84 \pm 0.5}$ and while using modified CNN techniques on the same input yields${0.89 \pm 0.5}$ . Rate of atrophy and RMS error are calculated for several test cases using various methods and analyzed.
39.Signatures in Shape Analysis: an Efficient Approach to Motion Identification pdf
Signatures provide a succinct description of certain features of paths in a reparametrization invariant way. We propose a method for classifying shapes based on signatures, and compare it to current approaches based on the SRV transform and dynamic programming.