Online: Microsoft Teams; link here
Exposition of the CVC papers and workshops presented at the Conference on Computer Vision and Pattern Recognition (CVPR) 2021 and the 3D-DLAD workshop at the IEEE – Intelligent Vehicles Symposium (IVS) 2021.
Schedule:
12:30 – 12:45 – “Slimmable compressive autoencoders for practical neural image compression” – F. Yang et al (presented by Fei Yang)
12:45 – 13:00 – “DANICE: Domain adaptation without forgetting in neural image compression”, CVPR Workshop – Katakol et al (presented by Luis Herranz)
13:00 – 13:15 – “Continual learning in cross-modal retrieval”, CVPR Workshop – K. Wang et al (presented by Kai Wang)
13:15 – 13:30 – “Thermal Image Super-Resolution Challenge”, CVPR Workshop – A. Sappa et al
13:30 – 13:45 – “Monocular Depth Estimation with Virtual-world Supervision” – 3D-DLAD workshop at ISV – A. López
Abstracts:
- “Slimmable compressive autoencoders for practical neural image compression”
Neural image compression leverages deep neural networks to outperform traditional image codecs in rate-distortion performance. However, the resulting models are also heavy, computationally demanding and generally optimized for a single rate, limiting their practical use. Focusing on practical image compression, we propose slimmable compressive autoencoders (SlimCAEs), where rate (R) and distortion (D) are jointly optimized for different capacities. Once trained, encoders and decoders can be executed at different capacities, leading to different rates and complexities. We show that a successful implementation of SlimCAEs requires suitable capacity-specific RD tradeoffs. Our experiments show that SlimCAEs are highly flexible models that provide excellent rate-distortion performance, variable rate, and dynamic adjustment of memory, computational cost and latency, thus addressing the main requirements of practical image compression.
- “DANICE: Domain adaptation without forgetting in neural image compression”
Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data. This data-driven nature enables new potential functionalities. In this paper, we study the adaptability of codecs to custom domains of interest. We show that NIC codecs are transferable and that they can be adapted with relatively few target domain images. However, naive adaptation interferes with the solution optimized for the original source domain, resulting in forgetting the original coding capabilities in that domain, and may even break the compatibility with previously encoded bitstreams. Addressing these problems, we propose Codec Adaptation without Forgetting (CAwF), a framework that can avoid these problems by adding a small amount of custom parameters, where the source codec remains embedded and unchanged during the adaptation process. Experiments demonstrate its effectiveness and provide useful insights on the characteristics of catastrophic interference in NIC.
- “Continual learning in cross-modal retrieval”
Multimodal representations and continual learning are two areas closely related to human intelligence. The former considers the learning of shared representation spaces where information from different modalities can be compared and integrated (we focus on cross-modal retrieval between language and visual representations). The latter studies how to prevent forgetting a previously learned task when learning a new one. While humans excel in these two aspects, deep neural networks are still quite limited. In this paper, we propose a combination of both problems into a continual cross-modal retrieval setting, where we study how the catastrophic interference caused by new tasks impacts the embedding spaces and their cross-modal alignment required for effective retrieval. We propose a general framework that decouples the training, indexing and querying stages. We also identify and study different factors that may lead to forgetting, and propose tools to alleviate it. We found that the indexing stage pays an important role and that simply avoiding reindexing the database with updated embedding networks can lead to significant gains. We evaluated our methods in two image-text retrieval datasets, obtaining significant gains with respect to the fine tuning baseline.
- “Thermal Image Super-Resolution Challenge”
This paper presents results from the second Thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the same thermal image dataset considered during the first challenge has been used; only mid-resolution (MR) and highresolution (HR) sets have been considered. The dataset consists of 951 training images and 50 testing images for each resolution. A set of 20 images for each resolution is kept aside for evaluation. The two evaluation methodologies proposed for the first challenge are also considered in this opportunity. The first evaluation task consists of measuring the PSNR and SSIM between the obtained SR image and the corresponding ground truth (i.e., the HR thermal image downsampled by four). The second evaluation also consists of measuring the PSNR and SSIM, but in this case, considers the ×2 SR obtained from the given MR thermal image; this evaluation is performed between the SR image with respect to the semi-registered HR image, which has been acquired with another camera. The results outperformed those from the first challenge, thus showing an improvement in both evaluation metrics.
- “Monocular Depth Estimation with Virtual-world Supervision”
Depth information is essential for on-board perception in autonomous driving. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. In this talk, we present MonoDEVS, our MDE approach to train on virtual-world supervision and real-world SfM self-supervision (thus, on monocular sequences)