Gan-based zero-shot learning 论文整理

2023-12-28 08:52:58

1 Feature Generating Networks for Zero-Shot Learning

Suffering from the extreme training data imbalance between seen and unseen classes, most ofexisting state-of-the- art approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task. To circum- vent the need for labeled examples of unseen classes, we propose a novel generative adversarial network (GAN) that synthesizes CNN features conditioned on class-level semantic information, offering a shortcut directly from a semantic descriptor ofa class to a class-conditional feature distribution. Our proposed approach, pairing a Wasserstein GAN with a classification loss, is able to generate sufficiently discriminative CNN features to train softmax classifiers or any multimodal embedding method. Our experimental results demonstrate a significant boost in accuracy over the state of the art on five challenging datasets – CUB, FLO, SUN, AWA and ImageNet – in both the zero-shot learning and general- ized zero-shot learning settings.

2 Adversarial Zero-Shot Learning with Semantic Augmentation

In situations in which labels are expensive or difficult to ob- tain, deep neural networks for object recognition often suffer to achieve fair performance. Zero-shot learning is dedicated to this problem. It aims to recognize objects of unseen classes by transferring knowledge from seen classes via a shared intermediate representation. Using the manifold structure of seen training samples is widely regarded as important to learn a robust mapping between samples and the intermediate rep- resentation, which is crucial for transferring the knowledge. However, their irregular structures, such as the lack in vari- ation of samples for certain classes and highly overlapping clusters of different classes, may result in an inappropriate mapping. Additionally, in a high dimensional mapping space, the hubness problem may arise, in which one of the unseen classes has a high possibility to be assigned to samples of dif- ferent classes. To mitigate such problems, we use a genera- tive adversarial network to synthesize samples with specified semantics to cover a higher diversity of given classes and in- terpolated semantics of pairs of classes. We propose a simple yet effective method for applying the augmented semantics to the hinge loss functions to learn a robust mapping. The pro- posed method was extensively evaluated on small- and large- scale datasets, showing a significant improvement over state- of-the-art methods.

3 Semantically Aligned Bias Reducing Zero Shot Learning Akanksha

Zero shot learning (ZSL) aims to recognize unseen
classes by exploiting semantic relationships between seen and unseen classes. Two major problems faced by ZSL al- gorithms are the hubness problem and the bias towards the seen classes. Existing ZSL methods focus on only one of these problems in the conventional and generalized ZSL set- ting. In this work, we propose a novel approach, Semanti- cally Aligned Bias Reducing (SABR) ZSL, which focuses on solving both the problems. It overcomes the hubness prob- lem by learning a latent space that preserves the semantic relationship between the labels while encoding the discrim- inating information about the classes. Further, we also pro- pose ways to reduce bias ofthe seen classes through a sim- ple cross-validation process in the inductive setting and a novel weak transfer constraint in the transductive setting. Extensive experiments on three benchmark datasets suggest that the proposed model significantly outperforms existing state-of-the-art algorithms by ∼1.5-9% in the conventional ZSL setting and by ∼2-14% in the generalized ZSL for both the inductive and transductive settings.

4 Multi-modal Cycle-consistent Generalized Zero-Shot Learning

In generalized zero shot learning (GZSL), the set of classes are split into seen and unseen classes, where training relies on the seman- tic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes. Current methods address GZSL by learning a transformation from the visual to the semantic space, exploring the assumption that the distribution of classes in the semantic and visual spaces is relatively similar. Such methods tend to transform unseen test- ing visual representations into one of the seen classes’ semantic features instead of the semantic features of the correct unseen class, resulting in low accuracy GZSL classification. Recently, generative adversarial net- works (GAN) have been explored to synthesize visual representations of the unseen classes from their semantic features - the synthesized rep- resentations of the seen and unseen classes are then used to train the GZSL classifier. This approach has been shown to boost GZSL classi- fication accuracy, but there is one important missing constraint: there is no guarantee that synthetic visual representations can generate back their semantic feature in a multi-modal cycle-consistent manner. This missing constraint can result in synthetic visual representations that do not represent well their semantic features, which means that the use of this constraint can improve GAN-based approaches. In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. Once our model is trained with this multi-modal cycle-consistent semantic compatibility, we can then syn- thesize more representative visual representations for the seen and, more importantly, for the unseen classes. Our proposed approach shows the best GZSL classification results in the field in several publicly available datasets.

5 Gradient Matching Generative Networks for Zero-Shot Learning

Zero-shot learning (ZSL) is one of the most promising
problems where substantial progress can potentially be achieved through unsupervised learning, due to distribu- tional differences between supervised and zero-shot classes. For this reason, several works investigate the incorporation of discriminative domain adaptation techniques into ZSL, which, however, lead to modest improvements in ZSL ac- curacy. In contrast, we propose a generative model that can naturally learn from unsupervised examples, and syn- thesize training examples for unseen classes purely based on their class embeddings, and therefore, reduce the zero- shot learning problem into a supervised classification task. The proposed approach consists of two important compo- nents: (i) a conditional Generative Adversarial Network that learns to produce samples that mimic the characteristics of unsupervised data examples, and (ii) the Gradient Matching (GM) loss that measures the quality ofthe gradient signal obtained from the synthesized examples. Using our GM loss formulation, we enforce the generator to produce examples from which accurate classifiers can be trained. Experimental results on several ZSL benchmark datasets show that our approach leads to significant improvements over the state of the art in generalized zero-shot classification.

6 EZSL-GAN: EEG-based Zero-Shot Learning approach using a Generative Adversarial Network

Recent studies show that deep neural network can
be effective for learning EEG-based classification network. In particular, Recurrent Neural Networks (RNN) show competitive performance to learn the sequential information of the EEG signals. However, none of the previous approaches considers recognizing the unknown EEG signals which have never been seen in the training dataset. In this paper, we first propose a new scheme for Zero-Shot EEG signal classification. Our EZSL-GAN has three parts. The first part is an EEG encoder network that generates 128-dim of EEG features using a Gated Recurrent Unit (GRU). The second part is a Generative Adversarial Network (GAN) that can tackle the problem for recognizing unknown EEG labels with a knowledge base. The third part is a simple classification network to learn unseen EEG signals from the fake EEG features which are generated from the learned Generator. We evaluate our method on the EEG dataset evoked from 40 classes visual object stimuli. The experimental results show that our EEG encoder achieves an accuracy of 95.89%. Furthermore, our Zero-Shot EEG classification method reached an accuracy of 39.65% for the ten untrained EEG classes. Our experiments demonstrate that unseen EEG labels can be recognized by the knowledge base.

7 SR-GAN: SEMANTIC RECTIFYING GENERATIVE ADVERSARIAL NETWORK FOR ZERO-SHOT LEARNING

The existing Zero-Shot learning (ZSL) methods may suffer from the vague class attributes that are highly overlapped for different classes. Unlike these methods that ignore the dis- crimination among classes, in this paper, we propose to clas- sify unseen image by rectifying the semantic space guided by the visual space. First, we pre-train a Semantic Rectifying Network (SRN) to rectify semantic space with a semantic loss and a rectifying loss. Then, a Semantic Rectifying Generative Adversarial Network (SR-GAN) is built to generate plausi- ble visual feature of unseen class from both semantic feature and rectified semantic feature. To guarantee the effectiveness of rectified semantic features and synthetic visual features, a pre-reconstruction and a post reconstruction networks are proposed, which keep the consistency between visual feature and semantic feature. Experimental results demonstrate that our approach significantly outperforms the state-of-the-arts on four benchmark datasets.

8 Visual Data Synthesis via GAN for Zero-Shot Video Classification

Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seen- to-unseen correlation via learning a projection be- tween visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distri- bution, and commonly suffer from the information degradation issue caused by “heterogeneity gap”. In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video fea- ture of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Cor- relation to overcome information degradation is- sue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot syn- thesis procedure with robust guidance signals. Ex- perimental results on four video datasets demon- strate that our approach can improve the zero-shot video classification performance significantly.

9 VHEGAN: VARIATIONAL HETERO-ENCODER RANDOMIZED GAN FOR ZERO-SHOT LEARNING

To extract and relate visual and linguistic concepts from images and textual descriptions for text-based zero-shot learning (ZSL), we develop variational heteroencoder (VHE) that decodes text via a deep probabilisitic topic model, the variational posterior of whose local latent variables is encoded from an image via a Weibull distribution based inference network. To further improve VHE and add an image generator, we propose VHE randomized generative adversarial net (VHEGAN) that exploits the synergy between VHE and GAN through their shared latent space. After training with a hybrid stochastic-gradient MCMC/variational inference/stochastic gradient descent inference algorithm, VHEGAN can be used in a variety of settings, such as text generation/retrieval conditioning on an image, image generation/retrieval conditioning on a document/image, and generation of text-image pairs. The efficacy of VHEGAN is demonstrated quantitatively with experiments on both conventional and generalized ZSL tasks, and qualitatively on (conditional) image and/or text generation/retrieval.

码农公寓

相关文章