Tutorials


Monday, April 16, 09:00 - 12:30

T-1 Blind Audio Source Separation on Tensor Representation
T-2 New Estimation Theory Approaches for Cybersecurity for Sensor Systems and the Internet of Things
T-3 From iterative algorithms to deep learning
T-4 Recent Advances in Nonconvex Methods for High-Dimensional Estimation
T-5 Super-resolution imaging in microscopy and ultrasound
T-6 Differential Geometry for Engineers
T-7 Generative Adversarial Network and its Applications on Speech Signal and Natural Language Processing

Monday, April 16, 13:30 - 17:00

T-9 Model-based Speech and Audio Processing
T-10 Algorithms, Circuits and Systems for Neuromorphic and Compressive Sensing Computing
T-11 Natural and augmented listening for VR/AR
T-12 Signals and Systems Meets Machine Learning in Biomedical Image Analysis
T-13 Replacing/Enhancing Iterative Algorithms with Deep Neural Networks
T-15 Change and Anomaly Detection in Signal, Images, and General Data Streams
T-16 Signal recovery from damaged and deleted digital files



Monday, April 16, 09:00 - 12:30


T-1 Blind Audio Source Separation on Tensor Representation

Hiroshi Sawada, Nobutaka Ono, Hirokazu Kameoka, Daichi Kitamura

This tutorial describes several important methods for blind source separation of audio signals in an integrated manner. We start with two basic methods for matrix representation, i.e., independent component analysis (ICA) and nonnegative matrix factorization (NMF). Then we move to their extensions to tensor representation, i.e., independent vector analysis (IVA) and multi-channel NMF (MNMF). Finally, we present independent low-rank matrix analysis (ILRMA), which integrates IVA and MNMF in a clever way. Attendees of this tutorial are expected to learn generative and mixing models of audio signals, optimization techniques, and practical methods for real-world audio source separation.


T-2 New Estimation Theory Approaches for Cybersecurity for Sensor Systems and the Internet of Things

Rick S. Blum

The tutorial will focus on new methods for securing sensor systems and IoT systems using signal processing (estimation and detection theory) ideas. The IoT brings a tremendous increase in the number of sensors and controlled devices connected to the internet. In order to allow rapid reconfiguration and changes in control based on recently sensed data, new security methods are being investigated which can complement existing approaches to provide multiple layers of protection. One such approach, which we have been investigating [1-5], involves using the sensors available in IoT and sensor system applications, along with any possible models of the physical system being monitored (called cyber physical system theory) to find cyber-attacks on these IoT sensor systems. This proposal will describe recent work in this area by our team and by others. The hope is that this could lead to more research by the signal processing community that could contribute to future secure systems.


T-3 From iterative algorithms to deep learning

Ami Wiesel

Deep neural network and machine learning have recently revolutionized all fields of engineering. The most significant progress is probably in image processing and speech, but these trends are penetrating other applications as wireless communication, radar and tomography. The switch from classical signal processing to modern deep learning involves many challenges. The first is technical and simple, yet daunting for experienced researchers. This is the transition to modern numerical toolboxes which are more suitable for learning tasks, e.g., the migration from Matlab to Python. A more fundamental change is that the applications mentioned above all have a well understood physical model that should be exploited in the design. This leads to a switch from model based design to data driven design, and, more importantly the shift to architecture based design. Another significant change is that standard machine learning methods are not suitable for parametric signal processing, and must be re-learned from scratch each time the parameters change. These challenges led researchers to consider hybrid approaches that exploit the benefits of both worlds: the network’s architectures are based on unfolding iterative signal processing algorithms, while deep learning allows more degrees of freedom and more expression power. Iterations are transformed into layers, and algorithms are succeeded by networks (see also the related concept of Recurrent Neural Networks). The resulting architectures allow a single training for multiple parametric models and achieve state of the art accuracy vs complexity tradeoffs. The goal of this tutorial is to introduce these ideas to engineers with a strong background in classical signal processing and zero experience in python, tensorflow and deep learning in general.


T-4 Recent Advances in Nonconvex Methods for High-Dimensional Estimation

Yuejie Chi, Yuxin Chen, Yue M. Lu

In this proposal, we will start by introducing classical algorithms for convex optimization and discussing why they are inefficient for large-scale problems. We will then use two canonical problems --- low-rank matrix estimation and phase retrieval --- to motivate nonconvex formulations for signal estimation, and discuss the hidden convexity in such problems. Further we will discuss gradient-based first-order algorithms for solving these problems, including batch, stochastic, and accelerated gradient descent. We will also talk about the important role of initialization for these problems.


T-5 Super-resolution imaging in microscopy and ultrasound

Yonina C. Eldar, Mordechai Segev, Oren Solomon

For more than a century, the wavelength of light was considered to be a fundamental limit on the spatial resolution of optical imaging. Particularly in light microscopy, this limit, known as Abbe’s diffraction limit, places a fundamental constraint on the ability to image sub-cellular organelles with high resolution. In 2014, the Nobel prize was awarded to the three pioneers of super-resolution fluorescence microscopy, William H. Moerner, Stefan Hell and Eric Betzig for their groundbreaking achievements for imaging beyond the diffraction limit. Microscopy techniques such as STED, PALM and STORM, manage to recover sub-wavelength information, by relying on fluorescence imaging, yielding images with a tenfold increase in the spatial resolution over the diffraction limit. Specifically, PALM and STORM are of great interest to the signal processing community, since they alter the basic acquisition mechanism of widefield microscopy to exploit the inherent structural information of isolated diffraction limited spots in each frame and sequentially localizing them. By accumulating these localizations, a super-resolved image is constructed. These techniques open the doorway to image the dynamics of life in nanometric resolution and have sparked great interest in the broad scientific community. In this tutorial we will present the basic fundamental limitation known as the diffraction limit, and cover the key principles of these ground breaking imaging techniques. We will point out the structural priors which are used for different signal processing techniques in super-resolution microscopy as well as outline current limitations and challenges. Furthermore, we will cover recent advances in high-density superresolution imaging which aims at increasing the temporal resolution of the reconstruction process by exploiting statistical information, sparse recovery and optimization techniques. Thus, signal processing methods have a vital role in the advancement of super resolution techniques. Lastly, we will present additional applications of these ideas to the field of contrast enhanced ultrasound imaging, in which a tenfold increase in the spatial resolution was also recently achieved. This advancement in ultrasound imaging paves the way to imaging of the capillaries and hemodynamic changes in living patients with resolution which have never been seen before. Here again signal processing techniques are crucial to achieve these advancements.


T-6 Differential Geometry for Engineers

Jonathan H. Manton

This tutorial endeavours to introduce differential geometry simply, efficiently, rigorously and engagingly. It is tailored for a signal processing audience; not only will signal processing applications be introduced along the way to demonstrate the relevance of differential geometry, but conversely, the applications themselves will be used to motivate the development of concepts in differential geometry. An integral part of the tutorial is teaching how to do calculations involving manifolds, including how to compute derivatives, tangent spaces, curvature and so forth. At a higher level, the different ways of working with manifolds will be explained (e.g., extrinsic coordinates versus local coordinates), and ways for extending algorithms from Euclidean space to manifolds will be discussed.


T-7 Generative Adversarial Network and its Applications on Speech Signal and Natural Language Processing

Hung-yi Lee,Yu Tsao

Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.

The tutorial includes two parts. The first part provides a thorough review of GAN. We will first introduce GAN to newcomers and describe why it is powerful in generating objects with sophisticated structures, for example, images, sentences, and speech. Then, we will introduce the approaches that aim to improve the training procedure and the variants of GAN beyond simply generating random objects. The second part of this tutorial will focus on the applications of GAN on speech and natural language. Although most techniques related to GAN are developed on image generation today, GAN can also generate speech. However, speech signals are temporal sequences which have very different nature from images. We will describe how to apply GAN on speech signal processing, including text-to-speech synthesis, voice conversion, speech enhancement, and domain adversarial training on speech-related tasks. The major challenge for applying GAN on natural language is its discrete nature (words are usually represented by one-hot encodings), which makes the original GAN fails. We will review a series of approaches dealing with this problem, and finally demonstrate the applications of GAN on chat-bot, abstractive summarization, and text style transformation.


Monday, April 16, 13:30 - 17:00


T-9 Model-based Speech and Audio Processing

Mads Græsbøll Christensen, Jesper Rindom Jensen, Jesper Kjær Nielsen

Parametric speech and audio models have been around for many years, but have always had their detractors. Two common arguments against such models are that estimating their parameters is often too difficult and that the models do not take the complicated nature of real signals into account. While models indeed often are a simplified mathematical description of a complicated physical process, their main strength lies in that they explicitly encapsulate our assumptions about the problem under study. This allows us to understand and ascertain under which conditions a model-based method can be expected to work, something which is very hard in, e.g., deep learning where the mapping from the input to the output is a black-box. Other advantages of the model-based approach are that we can easily modify a model to take more complex phenomena into account, derive methods that are robust to noise and work well in situations where training data are scarce or very expensive, and get a very good estimation accuracy in applications where the fine details matter (e.g., in the diagnosis of the Parkinson's disease from noisy speech).

The tutorial will cover an introduction to model-based speech and audio processing and a presentation and discussion of a variety of speech and audio models (harmonic model, harmonic chirp model, auto-regressive model, voiced-unvoiced-noise, etc.) that have appeared over the years. The tutorial also covers a brief introduction to commonly used estimation principles such as maximum likelihood, filtering, subspace, and Bayesian methods as well as methods for model/order selection (AIC, MDL, BIC, subspace, etc.). It will be shown how these principles can be used for different speech and audio models. Parameter estimation bounds and their practical implications will also be discussed. Based on recent advances, it will then be demonstrated how model-based speech and audio processing can be used to solve a number of problems and in many different applications. These include hearing aids, diagnosis of illnesses, multi-channel processing, speech analysis and coding, distortion-less noise reduction, echo cancellation, and noise statistics estimation.


T-10 Algorithms, Circuits and Systems for Neuromorphic and Compressive Sensing Computing

Profs. Subhanshu Gupta and Vishal Saxena

This tutorial combines an overview of recent advances in energy-efficient Neuromorphic Computing circuits and systems, and sensor interfaces using compressive sensing (CS) architecture for embedded deep learning applications. Neuromorphic algorithms are gaining wider interest due to the recent emergence of specialized chip architectures and specialized nanoscale devices for ultra-low power implementation of cognitive computing. The tutorial will present system level architectures encompassing machine-learning based circuit-design techniques. Case studies will be presented for Neuromorphic System-on-a-chip (NeuSoC) and CS architectures with applications to statistical learning followed by recent advances in the area of intelligent sensor nodes for Internet-of-Things and next-generation communications. After attending this tutorial, the audience will gain understanding of:

  1. Emerging approaches and recent trends in energy-efficient deep learning using Spiking Neural Networks. Mixed-signal Neuromorphic computing architectures and algorithms for embedded deep learning with brain-like energy efficiency.
  2. Fundamental design principles for ultra-low-power compressed sensing (CS) sensor interfaces and applications to statistical and neuromorphic learning.
  3. Emerging devices behavior to system-level NeuSoC design: device modeling, neural circuit motifs, spike-based learning algorithms, and event-driven simulation of hardware spiking neural networks.


T-11 Natural and augmented listening for VR/AR

Woon-Seng Gan, Jianjun He, Rishabh Ranjan, Rishabh Gupta

This tutorial aims to equip the participants with basic and advanced signal processing techniques that can be used in VR/AR applications to create a natural and augmented listening experience using headsets.
This tutorial is divided into 5 sections and cover following topics:

  • Introduction to spatial audio, fundamentals in natural listening, and emerging audio applications
  • Audio Reproduction in the New VR/AR world: This section will cover audio processing, rendering and auralization for various audio formats, including object-based, channel-based, and ambisonics. Special focus will be given on role of acoustic modelling based on 3D models in VR/AR environment for immersive audio.
  • Binaural Rendering of 3D audio in headsets. Binaural audio has evolved to be the main audio reproduction methods for VR/AR. In this section, we will cover how we can recreate the most natural binaural sound over headphones using various signal processing techniques. Topics covered include: HRTF measurements and individualization, headphones equalization, head tracking, and finally integrating these techniques into 3D audio headphones for VR applications.
  • Augmented reality audio in Headsets: Natural listening in augmented reality scenario will be covered. One of the primary goals of augmented reality audio is the fusion of virtual sources with real sources and environment. Techniques covered are hearing through of headphones, adaptive equalization of headphones, acoustic environment identification and rendering, active noise control with integrated 3D audio for headsets, and assistive listening devices.
  • Summary of the tutorial and point out the future trends of immersive audio in headsets. Demos will be presented.


T-12 Signals and Systems Meets Machine Learning in Biomedical Image Analysis

Aly Farag, Asem Ali and Amal Farag

Signals and systems have progressed for decades in diverse applications involving multidimensional signals. There has been a solid foundation in theory and algorithms for signals and systems which made an integral part in the engineering, science and mathematical curriculum. Inference of patterns description have been studied in remote sensing, underground geological explorations, biomedicine and biometrics, in addition to the domain of communications with all of its exponential and never-ending novelties and achievements. Lately, machine learning has made significant progress in inferring patterns, detection of objects and in decision making. Such progress, while significant and in some application; e.g., facial biometrics and computer-assisted diagnosis (CAD), are game changing, are hard to duplicate and nearly impossible to disseminate among researchers. There is a risk of making scientific discovers judged by their results at the expense of their methodologies. This tutorial will explore the linkage of signals & systems and modern machine learning in several problems that the two approaches have been used separately to tackle them. The fact that successes of machine learning are dependent upon availability of large and discriminative data and modern compute engines, will be demonstrated to serve the traditional signals and systems models as well in various applications of biomedical image analysis facial biometrics. Specifically, the tutorial will cover three domains: a) registration of large dimensional data; b) emotion modeling from facial biometrics; and c) nodule detection and classification in Stage 1a of lung cancer as shown in chest CT. The presenters are expert in signals and systems and machine learning, have conducted several successful projects in these domains, and will share a unique experience covering the rigor of signals & systems and the algorithmic complexity of modern machine learning methods, including convolutional neural networks (CNN), support vector machines (SVM) and several traditional statistical pattern analysis methods. Three modalities of imaging will be used in this tutorial: videos of facial biometrics, CT scans of the human chest from various studies for lung cancer screening, and combination of video and thermal imaging for emotion models in a tele- learning system developed by the presenters. Expected background is familiarity of signal models, statistical inference, image representation, variational calculus and neural networks.


T-13 Replacing/Enhancing Iterative Algorithms with Deep Neural Networks

David Wipf

The iterations of many traditional parameter estimation algorithms or model-based inference routines are comprised of some type of fixed filter cascaded with a thresholding nonlinearity, which collectively resemble a typical neural network layer. Examples include iterative soft/hard-thresholding for sparse estimation or compressive sensing (Gregor and Lecun, 2010; Wang et al., 2016), nonlinear reaction diffusion processes for signal/image restoration (Chen and Pock, 2015), proximal descent schemes for generic structured inverse problems (Sprechmann et al., 2015), and mean-field inference in Markov random fields (Hershey et al., 2014; Zheng et al., 2015). In these examples and many others like them, a lengthy sequence of algorithm iterations can be viewed as a deep network with fixed, model-based layer weights. It is therefore quite natural to examine the degree to which a learned network might act as a viable surrogate for conventional approaches with similar structure in domains where ample training data is available.

The underlying objective of this tutorial is to explore the diverse benefits and ramifications of this `unfolding' viewpoint, drawing on ideas from both the original communities from which such iterative algorithms arose, and the rapidly expanding body of explicit DNN research. For example, the possibility of a reduced computational budget is readily apparent when a ceiling is imposed on the number of learned layers. Indeed very recent results, of both empirical and theoretical flavors, have precisely quantified the degree to which learned iterations/layers can lead to a substantial reduction in overall complexity over a range of inverse problems (Sprechmann et al., 2015; Giryes et al., 2016). But there is also the important companion issue of estimation accuracy. More specifically, it is now well-established that in model-based parameter estimation regimes where exact recovery guarantees exist, strong conditions must be placed on the observation process to ensure that iterative algorithms with fixed filters are not led astray to bad, possibly even useless solutions. Common instances include RIP constraints for compressive sensing (Candes and Tao, 2005) or null-space properties for low-rank matrix recovery from underdetermined systems (Candes and Recht, 2009). However, if we instead learn independently-adaptable weights for each layer, successful recovery is often achievable long after existing iterative algorithms completely fail (Xin et al., 2016; He et al., 2017).

This unfolding perspective also suggests prescriptions for initializing application-specific DNNs, as well as domains where substituting discriminatively learned weights are likely to be generally advantageous. For instance, prior to the emergence of DNN classification systems, a state-of-the-art, highly-influential face recognition method was built using sparse representations formed across a pre-defined dictionary of face images (Wright et al., 2009). The resulting recognition pipeline can be exactly reinterpreted as a deep network with manually-constructed, dictionary-dependent layers. Of course given the present availability of huge face databases, in hindsight it was inevitable that learned layer-wise weights, possibly initialized using the aforementioned hand-crafted structure, could only improve performance, and today nearly all competitive systems rely on such discriminative training. Extrapolating forward it seems undeniable that the magisteria of conventional iterative, model-based approaches will be further supplanted with deep learning-based alternatives. Understanding when and where this is likely to happen, as well as quantifying the potential for improvement, is therefore an extremely useful enterprise as will be explored in this tutorial.


T-15 Change and Anomaly Detection in Signal, Images, and General Data Streams

Giacomo Boracchi

Change and anomaly detection problems are ubiquitous in engineering. The prompt detection of changes and anomalies is often a primary concern, as they provide precious information for understanding the dynamics of a monitored process, and for activating suitable countermeasures. Changes, for instance, might indicate an unforeseen evolution of the process generating the data, or a fault in a machinery. Anomalies are typically considered the most informative samples in a stream, as for instance arrhythmias in ECG tracing or frauds in a stream of credit card transactions. Not surprisingly, detection problems in time series/images/videos have been widely investigated in the signal processing community, in application scenarios that range from quality inspection to health monitoring.

The tutorial presents a rigorous formulation of change and anomaly detection problems that fits many signal/image analysis techniques and applications, including sequential monitoring and detection by classification. The tutorial describes in detail the most important approaches in the literature, following the machine-learning perspective of supervised, semi-supervised and unsupervised monitoring tasks. Particular emphasis will be given to: i) issues raising in multivariate settings, where the popular approach of monitoring the log-likelihood will be demonstrated to lose power when data-dimension increases, and ii) change/anomaly detection methods that use learned models, which are often adopted to handle signals and images. The tutorial also illustrates how advanced learned models, like convolutional sparse representation and structured dictionaries, as well as domain-adaptation techniques, can be used to enhance detection algorithms. Finally, best practices for designing suitable experimental testbed will be discussed.

The tutorial is accompanied by various examples where change/anomaly detection algorithms are applied to solve real world problems. These include ECG monitoring in wearable devices, image analysis to detect defects in industrial manufacturing, and fraud detection in credit card transactions.


T-16 Signal recovery from damaged and deleted digital files

James (Jim) H. Jones, Jr.

Data of interest to the signal processing community are frequently stored as digital files. Under circumstances of inadvertent or deliberate damage or deletion, only parts of the digital files of interest may be available. In this tutorial, participants will examine digital media with partial data files, find and extract fragments of interest, and analyze those fragments for useful contents and reconstruction potential. Data files will include audio, video, sensor, and image recordings in various formats. Attendees will be provided with the necessary software and data files, and the students will be guided through the processes of examination, recovery, and interpretation.



Important Dates

The deadline for submitting proposals has passed.

Proposals due: October 15, 2017

Proposal Decisions: November 30, 2017

Homer Chen
National Taiwan University

E: homer@ntu.edu.tw

Chungyong Lee
Yonsei University

E: cylee@yonsei.ac.kr