Skip to content
This repository has been archived by the owner on Apr 21, 2024. It is now read-only.

Latest commit

 

History

History
57 lines (57 loc) · 36.3 KB

20190114.md

File metadata and controls

57 lines (57 loc) · 36.3 KB

ArXiv cs.CV --Mon, 14 Jan 2019

1.A Biologically Inspired Visual Working Memory for Deep Networks pdf

The ability to look multiple times through a series of pose-adjusted glimpses is fundamental to human vision. This critical faculty allows us to understand highly complex visual scenes. Short term memory plays an integral role in aggregating the information obtained from these glimpses and informing our interpretation of the scene. Computational models have attempted to address glimpsing and visual attention but have failed to incorporate the notion of memory. We introduce a novel, biologically inspired visual working memory architecture that we term the Hebb-Rosenblatt memory. We subsequently introduce a fully differentiable Short Term Attentive Working Memory model (STAWM) which uses transformational attention to learn a memory over each image it sees. The state of our Hebb-Rosenblatt memory is embedded in STAWM as the weights space of a layer. By projecting different queries through this layer we can obtain goal-oriented latent representations for tasks including classification and visual reconstruction. Our model obtains highly competitive classification performance on MNIST and CIFAR-10. As demonstrated through the CelebA dataset, to perform reconstruction the model learns to make a sequence of updates to a canvas which constitute a parts-based representation. Classification with the self supervised representation obtained from MNIST is shown to be in line with the state of the art models (none of which use a visual attention mechanism). Finally, we show that STAWM can be trained under the dual constraints of classification and reconstruction to provide an interpretable visual sketchpad which helps open the 'black-box' of deep learning.

2.Individual common dolphin identification via metric embedding learning pdf

Photo-identification (photo-id) of dolphin individuals is a commonly used technique in ecological sciences to monitor state and health of individuals, as well as to study the social structure and distribution of a population. Traditional photo-id involves a laborious manual process of matching each dolphin fin photograph captured in the field to a catalogue of known individuals.
We examine this problem in the context of open-set recognition and utilise a triplet loss function to learn a compact representation of fin images in a Euclidean embedding, where the Euclidean distance metric represents fin similarity. We show that this compact representation can be successfully learnt from a fairly small (in deep learning context) training set and still generalise well to out-of-sample identities (completely new dolphin individuals), with top-1 and top-5 test set (37 individuals) accuracy of $90.5\pm2$ and $93.6\pm1$ percent. In the presence of 1200 distractors, top-1 accuracy dropped by $12%$; however, top-5 accuracy saw only a $2.8%$ drop

3.An Optical Frontend for a Convolutional Neural Network pdf

Large-scale artificial neural networks have demonstrated significantly better performance over other machine learning algorithms. However, the computational complexity of these networks precludes real-time operation. The ability of light to perform convolutions with extremely low-power and high speed make light a potential candidate to speed up complicated neural networks. In this paper we report an architecture of optical frontend for a neural network. We analyzed the performance of the whole hybrid optical-electronic neural network, and found that an optical frontend is beneficial when the number of pixels is large and exceeds 1000.

4.Weightless Neural Network with Transfer Learning to Detect Distress in Asphalt pdf

The present paper shows a solution to the problem of automatic distress detection, more precisely the detection of holes in paved roads. To do so, the proposed solution uses a weightless neural network known as Wisard to decide whether an image of a road has any kind of cracks. In addition, the proposed architecture also shows how the use of transfer learning was able to improve the overall accuracy of the decision system. As a verification step of the research, an experiment was carried out using images from the streets at the Federal University of Tocantins, Brazil. The architecture of the developed solution presents a result of 85.71% accuracy in the dataset, proving to be superior to approaches of the state-of-the-art.

5.A General Optimization-based Framework for Global Pose Estimation with Multiple Sensors pdf

Accurate state estimation is a fundamental problem for autonomous robots. To achieve locally accurate and globally drift-free state estimation, multiple sensors with complementary properties are usually fused together. Local sensors (camera, IMU, LiDAR, etc) provide precise pose within a small region, while global sensors (GPS, magnetometer, barometer, etc) supply noisy but globally drift-free localization in a large-scale environment. In this paper, we propose a sensor fusion framework to fuse local states with global sensors, which achieves locally accurate and globally drift-free pose estimation. Local estimations, produced by existing VO/VIO approaches, are fused with global sensors in a pose graph optimization. Within the graph optimization, local estimations are aligned into a global coordinate. Meanwhile, the accumulated drifts are eliminated. We evaluate the performance of our system on public datasets and with real-world experiments. Results are compared against other state-of-the-art algorithms. We highlight that our system is a general framework, which can easily fuse various global sensors in a unified pose graph optimization. Our implementations are open source\footnote{this https URL}.

6.A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors pdf

Nowadays, more and more sensors are equipped on robots to increase robustness and autonomous ability. We have seen various sensor suites equipped on different platforms, such as stereo cameras on ground vehicles, a monocular camera with an IMU (Inertial Measurement Unit) on mobile phones, and stereo cameras with an IMU on aerial robots. Although many algorithms for state estimation have been proposed in the past, they are usually applied to a single sensor or a specific sensor suite. Few of them can be employed with multiple sensor choices. In this paper, we proposed a general optimization-based framework for odometry estimation, which supports multiple sensor sets. Every sensor is treated as a general factor in our framework. Factors which share common state variables are summed together to build the optimization problem. We further demonstrate the generality with visual and inertial sensors, which form three sensor suites (stereo cameras, a monocular camera with an IMU, and stereo cameras with an IMU). We validate the performance of our system on public datasets and through real-world experiments with multiple sensors. Results are compared against other state-of-the-art algorithms. We highlight that our system is a general framework, which can easily fuse various sensors in a pose graph optimization. Our implementations are open source\footnote{this https URL}.

7.Image Disentanglement and Uncooperative Re-Entanglement for High-Fidelity Image-to-Image Translation pdf

Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain. This is challenging, especially when there are no example translations available as supervision. Adversarial cycle consistency was recently proposed as a solution, with beautiful and creative results, yielding much follow-up work. However, augmented reality applications cannot readily use such techniques to provide users with compelling translations of real scenes, because the translations do not have high-fidelity constraints. In other words, current models are liable to change details that should be preserved: while re-texturing a face, they may alter the face's expression in an unpredictable way. In this paper, we introduce the problem of high-fidelity image-to-image translation, and present a method for solving it. Our main insight is that low-fidelity translations typically escape a cycle-consistency penalty, because the back-translator learns to compensate for the forward-translator's errors. We therefore introduce an optimization technique that prevents the networks from cooperating: simply train each network only when its input data is real. Prior works, in comparison, train each network with a mix of real and generated data. Experimental results show that our method accurately disentangles the factors that separate the domains, and converges to semantics-preserving translations that prior methods miss.

8.Background Subtraction in Real Applications: Challenges, Current Models and Future Directions pdf

Computer vision applications based on videos often require the detection of moving objects in their first step. Background subtraction is then applied in order to separate the background and the foreground. In literature, background subtraction is surely among the most investigated field in computer vision providing a big amount of publications. Most of them concern the application of mathematical and machine learning models to be more robust to the challenges met in videos. However, the ultimate goal is that the background subtraction methods developed in research could be employed in real applications like traffic surveillance. But looking at the literature, we can remark that there is often a gap between the current methods used in real applications and the current methods in fundamental research. In addition, the videos evaluated in large-scale datasets are not exhaustive in the way that they only covered a part of the complete spectrum of the challenges met in real applications. In this context, we attempt to provide the most exhaustive survey as possible on real applications that used background subtraction in order to identify the real challenges met in practice, the current used background models and to provide future directions. Thus, challenges are investigated in terms of camera, foreground objects and environments. In addition, we identify the background models that are effectively used in these applications in order to find potential usable recent background models in terms of robustness, time and memory requirements.

9.CSGAN: Cyclic-Synthesized Generative Adversarial Networks for Image-to-Image Transformation pdf

The primary motivation of Image-to-Image Transformation is to convert an image of one domain to another domain. Most of the research has been focused on the task of image transformation for a set of pre-defined domains. Very few works are reported that actually developed a common framework for image-to-image transformation for different domains. With the introduction of Generative Adversarial Networks (GANs) as a general framework for the image generation problem, there is a tremendous growth in the area of image-to-image transformation. Most of the research focuses over the suitable objective function for image-to-image transformation. In this paper, we propose a new Cyclic-Synthesized Generative Adversarial Networks (CSGAN) for image-to-image transformation. The proposed CSGAN uses a new objective function (loss) called Cyclic-Synthesized Loss (CS) between the synthesized image of one domain and cycled image of another domain. The performance of the proposed CSGAN is evaluated on two benchmark image-to-image transformation datasets, including CUHK Face dataset and CMP Facades dataset. The results are computed using the widely used evaluation metrics such as MSE, SSIM, PSNR, and LPIPS. The experimental results of the proposed CSGAN approach are compared with the latest state-of-the-art approaches such as GAN, Pix2Pix, DualGAN, CycleGAN and PS2GAN. The proposed CSGAN technique outperforms all the methods over CUHK dataset and exhibits the promising and comparable performance over Facades dataset in terms of both qualitative and quantitative measures. The code is available at this https URL.

10.DIVE: A spatiotemporal progression model of brain pathology in neurodegenerative disorders pdf

Here we present DIVE: Data-driven Inference of Vertexwise Evolution. DIVE is an image-based disease progression model with single-vertex resolution, designed to reconstruct long-term patterns of brain pathology from short-term longitudinal data sets. DIVE clusters vertex-wise biomarker measurements on the cortical surface that have similar temporal dynamics across a patient population, and concurrently estimates an average trajectory of vertex measurements in each cluster. DIVE uniquely outputs a parcellation of the cortex into areas with common progression patterns, leading to a new signature for individual diseases. DIVE further estimates the disease stage and progression speed for every visit of every subject, potentially enhancing stratification for clinical trials or management. On simulated data, DIVE can recover ground truth clusters and their underlying trajectory, provided the average trajectories are sufficiently different between clusters. We demonstrate DIVE on data from two cohorts: the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Dementia Research Centre (DRC), UK, containing patients with Posterior Cortical Atrophy (PCA) as well as typical Alzheimer's disease (tAD). DIVE finds similar spatial patterns of atrophy for tAD subjects in the two independent datasets (ADNI and DRC), and further reveals distinct patterns of pathology in different diseases (tAD vs PCA) and for distinct types of biomarker data: cortical thickness from Magnetic Resonance Imaging (MRI) vs amyloid load from Positron Emission Tomography (PET). Finally, DIVE can be used to estimate a fine-grained spatial distribution of pathology in the brain using any kind of voxelwise or vertexwise measures including Jacobian compression maps, fractional anisotropy (FA) maps from diffusion imaging or other PET measures. DIVE source code is available online: this https URL

11.Feature Fusion for Robust Patch Matching With Compact Binary Descriptors pdf

This work addresses the problem of learning compact yet discriminative patch descriptors within a deep learning framework. We observe that features extracted by convolutional layers in the pixel domain are largely complementary to features extracted in a transformed domain. We propose a convolutional network framework for learning binary patch descriptors where pixel domain features are fused with features extracted from the transformed domain. In our framework, while convolutional and transformed features are distinctly extracted, they are fused and provided to a single classifier which thus jointly operates on convolutional and transformed features. We experiment at matching patches from three different datasets, showing that our feature fusion approach outperforms multiple state-of-the-art approaches in terms of accuracy, rate, and complexity.

12.Retrieving Similar E-Commerce Images Using Deep Learning pdf

In this paper, we propose a deep convolutional neural network for learning the embeddings of images in order to capture the notion of visual similarity. We present a deep siamese architecture that when trained on positive and negative pairs of images learn an embedding that accurately approximates the ranking of images in order of visual similarity notion. We also implement a novel loss calculation method using an angular loss metrics based on the problems requirement. The final embedding of the image is combined representation of the lower and top-level embeddings. We used fractional distance matrix to calculate the distance between the learned embeddings in n-dimensional space. In the end, we compare our architecture with other existing deep architecture and go on to demonstrate the superiority of our solution in terms of image retrieval by testing the architecture on four datasets. We also show how our suggested network is better than the other traditional deep CNNs used for capturing fine-grained image similarities by learning an optimum embedding.

13.FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction pdf

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e.g., image-level, region-level, and pixel-level are diverging. Generally, network structures designed specifically for image classification are directly used as default backbone structure for other tasks including detection and segmentation, but there is seldom backbone structure designed under the consideration of unifying the advantages of networks designed for pixel-level or region-level predicting tasks, which may require very deep features with high resolution. Towards this goal, we design a fish-like network, called FishNet. In FishNet, the information of all resolutions is preserved and refined for the final task. Besides, we observe that existing works still cannot \emph{directly} propagate the gradient information from deep layers to shallow layers. Our design can better handle this problem. Extensive experiments have been conducted to demonstrate the remarkable performance of the FishNet. In particular, on ImageNet-1k, the accuracy of FishNet is able to surpass the performance of DenseNet and ResNet with fewer parameters. FishNet was applied as one of the modules in the winning entry of the COCO Detection 2018 challenge. The code is available at this https URL.

14.LGAN: Lung Segmentation in CT Scans Using Generative Adversarial Network pdf

Lung segmentation in computerized tomography (CT) images is an important procedure in various lung disease diagnosis. Most of the current lung segmentation approaches are performed through a series of procedures with manually empirical parameter adjustments in each step. Pursuing an automatic segmentation method with fewer steps, in this paper, we propose a novel deep learning Generative Adversarial Network (GAN) based lung segmentation schema, which we denote as LGAN. Our proposed schema can be generalized to different kinds of neural networks for lung segmentation in CT images and is evaluated on a dataset containing 220 individual CT scans with two metrics: segmentation quality and shape similarity. Also, we compared our work with current state of the art methods. The results obtained with this study demonstrate that the proposed LGAN schema can be used as a promising tool for automatic lung segmentation due to its simplified procedure as well as its good performance.

15.Segmentation of Levator Hiatus Using Multi-Scale Local Region Active contours and Boundary Shape Similarity Constraint pdf

In this paper, a multi-scale framework with local region based active contour and boundary shape similarity constraint is proposed for the segmentation of levator hiatus in ultrasound images. In this paper, we proposed a multiscale active contour framework to segment levator hiatus ultrasound images by combining the local region information and boundary shape similarity constraint. In order to get more precisely initializations and reduce the computational cost, Gaussian pyramid method is used to decompose the image into coarse-to-fine scales. A localized region active contour model is firstly performed on the coarsest scale image to get a rough contour of the levator hiatus, then the segmentation result on the coarse scale is interpolated into the finer scale image as the initialization. The boundary shape similarity between different scales is incorporate into the local region based active contour model so that the result from coarse scale can guide the contour evolution at finer scale. By incorporating the multi-scale and boundary shape similarity, the proposed method can precisely locate the levator hiatus boundaries despite various ultrasound image artifacts. With a data set of 90 levator hiatus ultrasound images, the efficiency and accuracy of the proposed method are validated by quantitative and qualitative evaluations (TP, FP, Js) and comparison with other two state-of-art active contour segmentation methods (C-V model, DRLSE model).

16.Color Recognition for Rubik's Cube Robot pdf

In this paper, we proposed three methods to solve color recognition of Rubik's cube, which includes one offline method and two online methods. Scatter balance & extreme learning machine (SB-ELM), a offline method, is proposed to illustrate the efficiency of training based method. We also point out the conception of color drifting which indicates offline methods are always ineffectiveness and can not work well in continuous change circumstance. By contrast, dynamic weight label propagation is proposed for labeling blocks color by known center blocks color of Rubik's cube. Furthermore, weak label hierarchic propagation, another online method, is also proposed for unknown all color information but only utilizes weak label of center block in color recognition. We finally design a Rubik's cube robot and construct a dataset to illustrate the efficiency and effectiveness of our online methods and to indicate the ineffectiveness of offline method by color drifting in our dataset.

17.Hand Segmentation and Fingertip Tracking from Depth Camera Images Using Deep Convolutional Neural Network and Multi-task SegNet pdf

Hand segmentation and fingertip detection play an indispensable role in hand gesture-based human-machine interaction systems. In this study, we propose a method to discriminate hand components and to locate fingertips in RGB-D images. The system consists of three main steps: hand detection using RGB images providing regions which are considered as promising areas for further processing, hand segmentation, and fingertip detection using depth image and our modified SegNet, a single lightweight architecture that can process two independent tasks at the same time. The experimental results show that our system is a promising method for hand segmentation and fingertip detection which achieves a comparable performance while model complexity is suitable for real-time applications.

18.Analyzing Periodicity and Saliency for Adult Video Detection pdf

Content-based adult video detection plays an important role in preventing pornography. However, existing methods usually rely on single modality and seldom focus on multi-modality semantics representation. Addressing at this problem, we put forward an approach of analyzing periodicity and saliency for adult video detection. At first, periodic patterns and salient regions are respective-ly analyzed in audio-frames and visual-frames. Next, the multi-modal co-occurrence semantics is described by combining audio periodicity with visual saliency. Moreover, the performance of our approach is evaluated step by step. Experimental results show that our approach obviously outper-forms some state-of-the-art methods.

19.DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition pdf

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is noisy and has substantially reduced resolution, which makes it a less discriminative motion representation. To remedy these issues, we propose a lightweight generator network, which reduces noises in motion vectors and captures fine motion details, achieving a more Discriminative Motion Cue (DMC) representation. Since optical flow is a more accurate motion representation, we train the DMC generator to approximate flow using a reconstruction loss and a generative adversarial loss, jointly with the downstream action classification task. Extensive evaluations on three action recognition benchmarks (HMDB-51, UCF-101, and a subset of Kinetics) confirm the effectiveness of our method. Our full system, consisting of the generator and the classifier, is coined as DMC-Net which obtains high accuracy close to that of using flow and runs two orders of magnitude faster than using optical flow at inference time.

20.Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture pdf

This paper addresses the problem of interpolating visual textures. We formulate the problem of texture interpolation by requiring (1) by-example controllability and (2) realistic and smooth interpolation among an arbitrary number of texture samples. To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and reprojected back onto the image domain, thus ensuring both intuitive control and realistic results. We show several additional applications including texture brushing and texture dissolve, and show our method outperforms a number of baselines according to a comprehensive suite of metrics as well as a user study.

21.Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors pdf

We present a method to infer 3D pose and shape of vehicles from a single image. To tackle this ill-posed problem, we optimize two-scale projection consistency between the generated 3D hypotheses and their 2D pseudo-measurements. Specifically, we use a morphable wireframe model to generate a fine-scaled representation of vehicle shape and pose. To reduce its sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse representation which improves robustness. We also integrate three task priors, including unsupervised monocular depth, a ground plane constraint as well as vehicle shape priors, with forward projection errors into an overall energy function.

22.Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification pdf

The phenomenon of Adversarial Examples is attracting increasing interest from the Machine Learning community, due to its significant impact to the security of Machine Learning systems. Adversarial examples are similar (from a perceptual notion of similarity) to samples from the data distribution, that "fool" a machine learning classifier. For computer vision applications, these are images with carefully crafted but almost imperceptible changes, that are misclassified. In this work, we characterize this phenomenon under an existing taxonomy of threats to biometric systems, in particular identifying new attacks for Offline Handwritten Signature Verification systems. We conducted an extensive set of experiments on four widely used datasets: MCYT-75, CEDAR, GPDS-160 and the Brazilian PUC-PR, considering both a CNN-based system and a system using a handcrafted feature extractor (CLBP). We found that attacks that aim to get a genuine signature rejected are easy to generate, even in a limited knowledge scenario, where the attacker does not have access to the trained classifier nor the signatures used for training. Attacks that get a forgery to be accepted are harder to produce, and often require a higher level of noise - in most cases, no longer "imperceptible" as previous findings in object recognition. We also evaluated the impact of two countermeasures on the success rate of the attacks and the amount of noise required for generating successful attacks.

23.Unsupervised Moving Object Detection via Contextual Information Separation pdf

We propose an adversarial contextual model for detecting moving objects in images. A deep neural network is trained to predict the optical flow in a region using information from everywhere else but that region (context), while another network attempts to make such context as uninformative as possible. The result is a model where hypotheses naturally compete with no need for explicit regularization or hyper-parameter tuning. Although our method requires no supervision whatsoever, it outperforms several methods that are pre-trained on large annotated datasets. Our model can be thought of as a generalization of classical variational generative region-based segmentation, but in a way that avoids explicit regularization or solution of partial differential equations at run-time.

24.RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free pdf

Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. Nevertheless single-shot detectors are immensely popular in embedded vision applications. This paper brings single-shot detectors up to the same level as current two-stage techniques. We do this by improving training for the state-of-the-art single-shot detector, RetinaNet, in three ways: integrating instance mask prediction for the first time, making the loss function adaptive and more stable, and including additional hard examples in training. We call the resulting augmented network RetinaMask. The detection component of RetinaMask has the same computational cost as the original RetinaNet, but is more accurate. COCO test-dev results are up to 41.4 mAP for RetinaMask-101 vs 39.1mAP for RetinaNet-101, while the runtime is the same during evaluation. Adding Group Normalization increases the performance of RetinaMask-101 to 41.7 mAP. Code is at:this https URL

25.CT-GAN: Malicious Tampering of 3D Medical Imagery using Deep Learning pdf

In 2018, clinics and hospitals were hit with numerous attacks leading to significant data breaches and interruptions in medical services. An attacker with access to medical records can do much more than hold the data for ransom or sell it on the black market. In this paper, we show how an attacker can use deep learning to add or remove evidence of medical conditions from volumetric (3D) medical scans. An attacker may perform this act in order to stop a political candidate, sabotage research, commit insurance fraud, perform an act of terrorism, or even commit murder. We implement the attack using a 3D conditional GAN and show how the framework (CT-GAN) can be automated. Although the body is complex and 3D medical scans are very large, CT-GAN achieves realistic results and can be executed in milliseconds. To evaluate the attack, we focus on injecting and removing lung cancer from CT scans. We show how three expert radiologists and a state-of-the-art deep learning AI could not differentiate between tampered and non-tampered scans. We also evaluate state-of-the-art countermeasures and propose our own. Finally, we discuss the possible attack vectors on modern radiology networks and demonstrate one of the attack vectors on an active CT scanner.

26.Disease Knowledge Transfer across Neurodegenerative Diseases pdf

We introduce Disease Knowledge Transfer (DKT), a novel technique for transferring biomarker information between related neurodegenerative diseases. DKT infers robust multimodal biomarker trajectories in rare neurodegenerative diseases even when only limited, unimodal data is available, by transferring information from larger multimodal datasets from common neurodegenerative diseases. DKT is a joint-disease generative model of biomarker progressions, which exploits biomarker relationships that are shared across diseases. As opposed to current deep learning approaches, DKT is interpretable, which allows us to understand underlying disease mechanisms. Here we demonstrate DKT on Alzheimer's disease (AD) variants and its ability to predict trajectories for multimodal biomarkers in Posterior Cortical Atrophy (PCA), in lack of such data from PCA subjects. For this we train DKT on a combined dataset containing subjects with two distinct diseases and sizes of data available: 1) a larger, multimodal typical AD (tAD) dataset from the TADPOLE Challenge, and 2) a smaller unimodal Posterior Cortical Atrophy (PCA) dataset from the Dementia Research Centre (DRC) UK, for which only a limited number of Magnetic Resonance Imaging (MRI) scans are available. We first show that DKT estimates plausible multimodal trajectories in PCA that agree with previous literature. We further validate DKT in two situations: (1) on synthetic data, showing that it can accurately estimate the ground truth parameters and (2) on 20 DTI scans from controls and PCA patients, showing that it has favourable predictive performance compared to standard approaches. While we demonstrated DKT on Alzheimer's variants, we note DKT is generalisable to other forms of related neurodegenerative diseases. Source code for DKT is available online: this https URL.

27.How Can We Make GAN Perform Better in Single Medical Image Super-Resolution? A Lesion Focused Multi-Scale Approach pdf

Single image super-resolution (SISR) is of great importance as a low-level computer vision task. The fast development of Generative Adversarial Network (GAN) based deep learning architectures realises an efficient and effective SISR to boost the spatial resolution of natural images captured by digital cameras. However, the SISR for medical images is still a very challenging problem. This is due to (1) compared to natural images, in general, medical images have lower signal to noise ratios, (2) GAN based models pre-trained on natural images may synthesise unrealistic patterns in medical images which could affect the clinical interpretation and diagnosis, and (3) the vanilla GAN architecture may suffer from unstable training and collapse mode that can also affect the SISR results. In this paper, we propose a novel lesion focused SR (LFSR) method, which incorporates GAN to achieve perceptually realistic SISR results for brain tumour MRI images. More importantly, we test and make comparison using recently developed GAN variations, e.g., Wasserstein GAN (WGAN) and WGAN with Gradient Penalty (WGAN-GP), and propose a novel multi-scale GAN (MS-GAN), to achieve a more stabilised and efficient training and improved perceptual quality of the super-resolved results. Based on both quantitative evaluations and our designed mean opinion score, the proposed LFSR coupled with MS-GAN has performed better in terms of both perceptual quality and efficiency.

28.A Fast Randomized Method to Find Homotopy Classes for Socially-Aware Navigation pdf

We introduce and show preliminary results of a fast randomized method that finds a set of K paths lying in distinct homotopy classes. We frame the path planning task as a graph search problem, where the navigation graph is based on a Voronoi diagram. The search is biased by a cost function derived from the social force model that is used to generate and select the paths. We compare our method to Yen's algorithm, and empirically show that our approach is faster to find a subset of homotopy classes. Furthermore our approach computes a set of more diverse paths with respect to the baseline while obtaining a negligible loss in path quality.