rnn应用

Weather Recognition plays an important role in our daily lives and many computer vision applications. However, recognizing the weather conditions from a single image remains challenging and has not been studied thoroughly. Generally, most previous works treat weather recognition as a single-label classifica- tion task, namely, determining whether an image belongs to a specific weather class or not. This treat- ment is not always appropriate, since more than one weather conditions may appear simultaneously in a single image. To address this problem, we make the first attempt to view weather recognition as a multi- label classification task, i.e., assigning an image more than one labels according to the displayed weather conditions. Specifically, a CNN–RNN based multi-label classification approach is proposed in this paper. The convolutional neural network (CNN) is extended with a channel-wise attention model to extract the most correlated visual features. The Recurrent Neural Network (RNN) further processes the features and excavates the dependencies among weather classes. Finally, the weather labels are predicted step by step. Besides, we construct two datasets for the weather recognition task and explore the relationships among different weather conditions. Experimental results demonstrate the superiority and effectiveness of the proposed approach. The new constructed datasets will be available at

1. Introduction

 

The weather conditions influence our daily lives and production in many ways [1], such as wearing, traveling, solar technologies and so on. Therefore, acquiring weather conditions automatically   is important to a variety of human activities. A possible solution to weather recognition is utilizing various of hardwares. While these hardware equipments are usually expensive and need professionals to maintain. An alternative scheme is to recognize weather con- ditions from color images using computer vision techniques [2,3]. Nowadays, surveillance cameras are ubiquitous, which makes the computer vision solution feasible. Apart from the guiding signif- icance to our daily lives, weather recognition is also an impor-    tant function to many other computer vision applications [4–7], such as image retrieval [8], image restoration [9], and the relia- bility improvement of outdoor surveillance systems [3]. Robotic vi- sion [10,11] and vehicle assistant driving systems [12,13] can also benefit from the results of weather recognition. Thus, we can   draw

a simple conclusion that weather recognition from outdoor images has great research significance.

1.1. Motivation and overview

 

Although weather recognition is of remarkable value, only a few researches have been published to tackle this problem. Several pre- vious works [12,14–16] concentrated on recognizing weather con- ditions from images captured by in-vehicle cameras. Several other papers [1,17,18] exploited weather recognition from single outdoor images. All of these works referred to weather recognition as a single-label classification task (the weather label means weather category in this paper), namely, determining whether an image be- longs to a specific weather category or   not.

However, it is not always appropriate to view weather recogni- tion as a single-label classification problem. There are mainly two reasons to explain this inappropriateness. The first reason can be summarized as uncertainty, i.e., the class boundaries among some weather categories are ambiguous essentially. As can be seen from Fig. 1, the  changes from  Fig. 1(a)–(f) demonstrate that there are    a series of states between a pure sunny weather (like Fig. 1(a))   and an obvious cloudy weather (as illustrated in Fig. 1(f)). It  is hard to determine whether the category is sunny or cloudy whe

referring to an intermediate weather state like Fig. 1(c), (d) and (e) [2]. Thus, the uncertainty of such boundary samples causes the dif- ficulty to determine ground-truth labels even from the perspective of human beings, and few previous works present solutions to this problem. The second drawback of treating weather recognition as    a single-label classification task can be summarized as incomplete- ness, namely, a single weather label may not describe the weather conditions comprehensively for a given image. For example, the vi- sual effect of haze is obvious in Fig. 1(g), (h) and (i). Nevertheless, it can be seen from the comparisons among these three images    that Fig. 1(g) seems more sunny while Fig. 1(h) seems more over- cast, and Fig 1(i) seems snowy. Therefore, only a haze label cannot reveal the differences among these three   images.

Motivated by the aforementioned two reasons, we propose to view weather recognition as a multi-label classification problem, i.e., assigning multi-labels to an image according to the displayed weather conditions. Specifically, it is achieved by a CNN–RNN ar- chitecture. The intuition lies in two aspects. On one hand, most       of the previous works focused on exploiting hand-crafted weather features [1], [20], while these features did not achieve desired re- sults in the weather recognition task. Inspired by the great success of Convolutional Neural Network (CNN) in these years, we utilize CNN as the weather feature extractor. On the other hand, labels exhibit strong co-occurrence dependencies in weather domain. For example, snowy and cloudy usually occur together while rainy and sunny almost never co-occur. Inspired by the success of Recurrent Neural Network (RNN) in dependency modeling [21,22], we pro- pose to use RNN to model the dependencies among labels and pre- dict weather labels step by step. In such a way, when predicting subsequent labels, the network can refer to the previous hidden states that incorporate the historical information   implicitly.

For weather recognition, different image regions have different importances when predicting labels. As shown in Fig. 2, the blue sky is crucial for judging a sunny day, and snow on the ground is significant for estimating the snowy weather. Lu et al. [2] also em- phasized that such weather cues are critical. Therefore, it is nec- essary to make the weather cues discriminative and preserve the spatial information of the image. To achieve this goal, a channel- wise  attention  model  is  designed  to  exploit  more discriminative

features for the weather recognition task. Besides, we use convo- lutional Long Short-Term Memory (LSTM) [23] instead of vanilla RNN in our CNN–RNN architecture to preserve the spatial infor- mation. Convolutional LSTM uses convolution operations in both state-to-state and input-to-state transformations, which captures better spatio-temporal information than fully connected LSTM (FC- LSTM) [23].

In addition, considering that there  are  lacking  of  datasets  in the weather recognition task, two new datasets are constructed in this paper, where the first consists of about 8K images from seven weather categories, it is transformed from an existing transient at- tribute dataset [19].  The second is built from scratch containing 10K images from five weather categories.

1.1. Contributions

 

In summary, there are three main contributions of this work:

(1) We propose to treat weather recognition as a multi-label classification task by analyzing the drawbacks of classifying images with a single weather label and the co-occurrence relationships among different weather  conditions.

(2) We present a CNN–RNN architecture to tackle the multi- label weather classification task. It is composed of a CNN to extract features, a channel-wise attention model to recali- brate feature responses, and a convolutional LSTM to model the relationships among different weather   labels.

(3) We build a new multi-label weather  classification  dataset and transform an existing transient attribute dataset [19] for the weather recognition task. The datasets will be available on the project  website.

1.2. Organization

 

The remainder of this paper is in the following structure: In Section 2, some related works about weather recognition are re- viewed. In Section 3, we describe the proposed approach in detail. In Section 4, we first present the construction of the new multi- label weather image dataset and the modification of the transient attribute dataset [19]. Then, we analyze the experimental results on these two datasets. In Section 5, the conclusion of this paper is drawn.

2. Related work

 

We roughly classify the weather recognition works into two subcategories in this paper. One category focuses on designing hand-crafted weather features, and another  category  attempts  to use CNNs to solve the weather recognition  task.

2.1. Weather recognition with hand-crafted features

 

Many vehicle assistant driving systems use  weather  recogni- tion to improve the road safety. For example, they can set speed limit in extreme weather conditions, automatically open the wiper in a rainy day and so forth. Hand-crafted features are popular in these works. Kurihata et al. [12,24] proposed that rain drops are strong cues for the presence of  rainy  weather  and  developed  a rain feature to detect rain drops on the windshield. Roser et al.    [15] defined several regions of interest (ROI) and developed various types of histogram features for rainy weather recognition. Yan et al. [13] utilized gradient amplitude histogram, HSV color histogram as well as road information for the classification task among sunny, cloudy and rainy categories. Besides, several methods are proposed specially for fog detection, Hautiére et al. [14] used Koschmieder’s Law [25] to detect the presence of  fog and estimate  the visibil-   ity distance. Bronte et al. [26] utilized many techniques, includ-   ing a Sobel based sunny-foggy detector, edge binarization, hough line detection, vanishing point detection and road/sky segmenta- tion. Gallen et al. [27] focused on night fog detection by detecting backscattered veil caused by the vehicle ego lights or halos around the street lights. Pavli et al. [16,28] transformed images into fre- quency domain and detected the presence of fog through training different scaled and oriented Gabor filters in the power spectrum. Although the aforementioned approaches have shown good perfor- mance, they are usually limited to the in-vehicle perspective and cannot be applied to wider range of   applications.

There are also several researches devoted to weather recogni- tion from common outdoor images. Li et al. [29] proposed a pho- tometric stereo-based approach to estimate weather condition  of

a given site. Zhao et al. [9] pointed out that pixel-wise inten- sities of dynamic weather conditions (rainy, snowy, etc.) fluctu- ate over time while static weather conditions (sunny, foggy, etc.) almost stay unchanged. They proposed a two-stage classification scheme which first distinguishes between the two conditions then utilizes several spatio-temporal and chromatic features to further estimate the weather category. In [17], several global features were extracted for weather classification, such as inflection point in- formation, power spectral slope, edge gradient energy, saturation, contrast and image noise. Li et al. [18] also utilized several fea- tures in [17], and constructed a decision tree according to the dis- tance between features. Except for regular global features, [1] pro- posed multiple weather cues including reflection, shadow and sky descriptor for two-class weather recognition. They also exploited a collaborative learning strategy in which voters closer to the test image have more weights. Zhang et al. [20,30] proposed the sunny feature, rainy feature, snowy feature and haze feature individually for each weather class as well as two global features. Furthermore, a multiple kernel learning approach is proposed in [30] to fuse these features. In [31], both spatial appearance and temporal dy- namics were investigated on short video clips which can recognize several weather types.

Although researchers have elaborately designed many features for weather recognition, the features are usually limited to specific perspectives or weather classes, and cannot be applied to wider range of applications.

1.1. Weather recognition with CNNs

 

In recent years, convolutional neural networks have shown overwhelming performance in a variety of computer vision tasks, such as image classification [32], object detection [33], semantic segmentation [34], etc. Several excellent architectures of CNNs are proposed including AlexNet [32], VGGNet [35] and ResNet [36], which outperform the traditional approaches to a large extent. In- spired by the great success of CNNs, a few of works attempt to apply CNNs to weather recognition task. Elhoseiny et al. [3] di- rectly fine-tuned AlexNet [32] on a two-class weather classifica- tion dataset released by Lu et al. [1], and achieved a better result. Lu et al. [2] combined hand-crafted weather features with CNNs extracted features, and further improved the classification perfor- mance. While as discussed in [2], there is no closed boundaries among weather classes. Multiple weather conditions may appear simultaneously. Therefore, all the above approaches suffer from the information loss when they treat weather recognition as a single label classification problem. Li et al. [37] proposed to use auxil- iary semantic segmentation of weather cues to comprehensively describe the weather conditions. This strategy can alleviate the problem of information loss, while the segmentation mask is not intuitive for humans.

2. Our approach

 

In this paper, to comprehensively describe the weather condi- tions, we propose to treat weather recognition as a multi-label classification problem. Furthermore, a CNN–RNN model is devel- oped for this task, which formulates  the  multi-label  classifica-  tion as a step-wise prediction. Fig. 3 demonstrates the architec-   ture of the proposed approach. It mainly composes of three parts, i.e., the basic CNN, a channel-wise attention model and a con- volutional LSTM. The CNN extracts the preliminary features of a given outdoor image. Specifically, the first five groups of convolu- tional/pooling layers of VGGNet [35] are utilized in this paper. The channel-wise attention model adaptively calculates the channel- wise attention weights and recalibrates the feature responses. The convolutional LSTM uses visual features and the hidden state to

predict weather labels one by one, which implicitly models the co- occurrence dependency among labels by maintaining context infor- mation in internal memory   states.

3.1. The convolutional LSTM in the CNN–RNN architecture

 

The Recurrent Neural Networks, especially LSTM, has recently achieved overwhelming success in sequence modeling tasks, such as image/video captioning [38] and neural  machine  translation [39]. Without loss of generality, the LSTM can be formulated as follows [40].

it = σ (Wiwxt + Uihht−1 + bi ), ft σ (Wfwxt Ufhht−1 + b f ), ot σ (Wow xt Uoh ht−1 + bo ),

gt = tanh(Wgw xt Ugh ht−1 + bg ), ct  = ft ct−1 + it gt ,

ht = ot ◦ tanh ct ,(1)

where the subscript t indicates the tth step of LSTM, xt  denotes  the input data, ht stands for the hidden state, ct  is the cell state. it,  ft and ot are input gate, forget gate and output gate of the LSTM, respectively. Ws , Us and bs are weights and biases to be learned.   σ , tanh and ◦ represent the sigmoid function, hyperbolic tangent function and element-wise multiplication, respectively. As shown in Eq. (1), at each step, the data xt and the previous hidden state ht−1 is taken as the input of current LSTM unit, and the historical information are recorded in the hidden state ht, such that LSTM  can exploit the temporal  dependency.

Although the standard LSTM has demonstrated its powerful ca- pability in sequence modeling tasks, the spatial information is ig- nored when processing images [23]. As can be seen from Eq. (1),

fully connections are used in state-to-state and input-to-state transformations. Generally, if the input image data xt RW × H × C , it will be flattened to an 1D-vector before input to the LSTM. While

this process will suffer from the loss of spatial information. To overcome this drawback, the convolutional LSTM is employed in our approach [23], which can be formulated as  follows.

it  = σ (Wiw 0 xt + Uih 0 ht−1 + bi ), ft σ (Wfw xt Ufh ht−1 + b f ), ot σ (Wow xt Uoh ht−1 + bo ),

gt = tanh(Wgw xt Ugh ht−1 + bg ),

ct  = ft ct−1 + it gt ,

ht = ot ◦ tanh ct ,(2)

where 0 denotes the convolution operator and other symbols are the same with Eq. (1). It should be noted that the input feature       xt, cell state ct, hidden state ht and gates it, ft, ot  of convolu- tional LSTM are all 3D tensors, and convolution operations are used in state-to-state and input-to-state transformations. Therefore, the spatial information of features are preserved in this way. Further- more, the convolution operation actually has implicit spatial at- tention mechanism, since regions corresponding to the target la-  bel usually have higher activation responses. In the experiment,   we also find that the convolutional LSTM pays attention to several critical regions for weather label prediction, and achieves better re- sults than common LSTM with or without spatial attention   model.

3.1. Channel-wise attention model in the CNN–RNN architecture

 

Usually, different regions will be activated in disparate chan- nels of the feature map, and different image regions have different importance when estimating various weather conditions. In our CNN–RNN architecture, each step of the convolutional LSTM will predict one weather label. Inspired by Hu et al. [41], we propose    a channel-wise attention model for the CNN–RNN architecture to adaptively recalibrate the feature responses when predicting differ- ent weather labels. The illustration of the proposed channel-wise attention model is shown in Fig.  4.

As discussed in [41], exploiting global information is a popu-  lar method in feature engineering works. To  calculate the atten- tion weight of each feature map channel, we adopt the similar strategy, i.e., global average pooling is used to generate channel- wise statistics which can be viewed as a descriptor of the channel- wise global spatial information. While different from [41], in our multi-label weather classification task, we want to adaptively ob- tain the channel-wise attention weights according to the previous predicted weather label. So we also take into account the channel- wise statistics information encoded in the hidden state of the con- volutional LSTM. The two kinds of statistics information are formu- lated as follows.

where N denotes the number of training samples, pi,t indicates the ground-truth label of the ith sample on the tth weather class, and

pi,t   is  the  corresponding  predicted  label.  Finally,  the  total  loss  is

formulated as follows,

T

Loss = 「 losst ,(9)

t=1

where T represents the number of all weather   classes.

3.4.  Training details

1 W    H

「 「

The open source library tensorflow is used to implement    the

ak fa (xk ) W

× i=1   j=1

1 W

xk (i, j),(3)

H

proposed approach. To accelerate the convergence, we adopt a two stage training strategy. In the first stage, the basic CNN of our ap- proach (i.e., the first five groups of convolutional/pooling layers   of

「 「

dk fa (ht−1,k ) WH

i=1  j=1

ht−1,k (i, j),(4)

VGGNet [35]) is trained. Specifically, we transform VGGNet into a multi-label classification framework by replacing the output    layer

where xk and ht−1,k denote the visual feature and the  previous  hidden state of  the  convolutional  LSTM  at  the  kth  channel  (k = 1, 2, ..., C),  respectively.  fa  represents  the  global  average  pooling

function, ak and dk  denote the statistics information of visual fea-  ture and hidden state at the kth channel. W and  H stand  for  the  width and height of visual features. It should be noted that, in our approach, the visual features and hidden states are in the same di- mension.

After the statistics information of the visual features and hid- den states is obtained, the channel-wise attention weights are cal- culated by

with T neurons (T represents the number of weather  classes),  and  train it with multi-label sigmoid cross-entropy loss function. The pre-trained VGGNet model on ImageNet Large Scale Visual Recog- nition Challenge (ILSVRC) is used for fine-tune. In the second stage, we remove the fully connection layers of VGGNet, and fix the other parameters. Then, the convolutional LSTM and channel-wise atten- tion model are trained from scratch based on the CNN extracted features. Xavier initialization method is employed in  this  stage.  Adam [43] optimization approach is used to minimize  loss  func-  tions in both two stages where the first and second momentum are      set to 0.9 and 0.999, respectively. To avoid overfitting, the dropout

[44] operation is used after the fully connection layers in both    two

zk σ (w2δ(w1[ak , dk ] + b) b),(5)

stages,  and L2

regularization is also employed for all weight pa-

where ws and bs are weights and biases to be learned, δ represents

the ReLU [42] function that is utilized to learn the non-linear map- ping, [ · , · ] is the concatenation operation, σ indicates the sigmoid function  which  normalizes  the  attention  weight  to  the  range  of 0–

1. Finally, the recalibrated features are obtained by rescaling the original features with attention   weights,

C

x˜    = 「 zkxk .(6)

k=1

3.3. Inference

 

In this paper, the  weather  labels  are  predicted  in  a  fixed path. Practically, the order of other weather labels are set accord- ing to their co-occurrence relationships, details are depicted in Section 4.2.

In each step of the convolutional LSTM, the 3D hidden state is flattened to a 1D vector, then it is used  to  predict  the  weather label.

pt σ (wpht bp ),(7)

where pt ∈ [0, 1] is the output probability of the tth weather label,      ht  is the flattened hidden state, wp  and bp  are the learned weight   and bias.

The loss of each prediction step is determined by the following function.

rameters. We set the dropout ratio and weight of L2  regularization to 0.5 and 0.0005 during the entire training process. The learning rate is initialized as 0.0001 and drops by a factor of 10  after the  loss is stable. Besides, we also attempt to fine-tune all parameters after the second training stage, i.e., unfix the parameters of the ba- sic CNN, while experiments prove that this strategy cannot bring performance improvements.

Before training, each sample is resized into a 256  × 256 image.

Random flip, random crop and random noise are used for data aug- mentation. We adopt the stochastic mini-batch training strategy, images are randomly shuffled and they constitute mini-batches of size 50 before each training epoch. Table 1 shows the detailed shapes of several critical components of the proposed CNN–RNN architecture. Besides, the shapes of all biases can be easily inferred.

4. Experiments

 

Since this is the first work to treat weather recognition as a multi-label classification problem, there are  no  existing  datasets for this task. Therefore, to evaluate the proposed approach, we construct two datasets where one is the modification of the tran- sient attribute dataset [19] and another one is created from scratch. In this section, we first introduce the construction procedure and details of the two datasets. Then, the co-occurrence relationships among weather labels are explored. Finally, the evaluation  metrics,

1  Ncomparison  approaches and  experimental results  are  presented in

losst  = −   「 pi,t log ~pi,t  (1 − pi,t ) log(1 − ~pi,t ),(8)

4.1. Dataset description

 

4.1.1. The transient attribute dataset

The first dataset is transformed from an existing transient attribute dataset [19] which was originally erected for outdoor scenes understanding and editing. Although the transient attribute dataset is not specially designed for weather recognition, this dataset presents many appealing properties. First, images are cap- tured across many outdoor scenes including mountains, cities, towns and urban sceneries. Images in this dataset are of different scales and views, which enhances the diversity across scenes. Sec- ond, images are selected elaborately to ensure they exhibit various appearances of the same scene. Moreover, the authors of [19] de- fined 40 transient attributes for this dataset including weather re- lated attributes (e.g., ‘sunny’, ‘rain’, ‘fog’, etc.). For each image, the weather related attributes  are  annotated  non-exclusively,  which  is important for our multi-label weather recognition experiments. Several examples of the  transient attribute  dataset are  illustrated  in Fig. 5.

For weather recognition, six weather related attributes among  all 40 transient attributes are selected, i.e., ‘sunny’, ‘cloudy’, ‘fog’, ‘snow’, ‘moist’ and ‘rain’, others are ignored in our experiments. Besides, we find that there exists a few examples in which all weather attribute strengths are very low. Some of them are cap- tured at dawn and dusk, others do not show obvious features cor- responding to any weather categories. Therefore, we add an ‘other’ class to represent those examples where every attribute strength    is  lower  than  0.5.  It  is  noteworthy  that  the  strength  lower than

0.5 indicates the annotation workers do not think the image ex- hibits the corresponding attribute. In this paper, for the weather recognition task, weather attributes greater and lower than 0.5 are set to 1 and 0, respectively. Finally, the dataset contains seven weather classes and 8571  images in total. The detailed statistics    of the dataset are displayed in Table   2.

4.1.2. The multi-label weather classification dataset

To further evaluate the proposed  approach,  we  construct  a new dataset from scratch, which contains 10,000 images from 5 weather classes, i.e., ‘sunny’, ‘cloudy’, ‘foggy’, ‘rainy’ and ‘snowy’. All images are elaborately selected from Internet. Compared  to other weather recognition datasets, our  dataset  has  the  follow-  ing advantages. First, most of the existing datasets focus on only two or three weather classes, while our dataset covers all common weather conditions in the daily life. Second, the new constructed dataset contains many different scenes including cities, villages, ur- ban areas and so on, as depicted in Fig. 6. In addition, this dataset also exhibits different scales and views. Third, in our dataset, the weather labels are not mutually exclusive, which can provide more weather information.

The annotation of multi-label weather classification dataset was completed by a crowd-sourced task. The annotation workers are asked to determine weather attribute strengths non-exclusively for a given image, and the range of strengths is from 0 to 1, in which

0.5 is a demarcation point. Weather attribute strength lower than

0.5 indicates that the image cannot be judged as the corresponding weather condition (even if the image contains corresponding at- tribute). In this dataset, an image is annotated by at least five workers, and the average value of each attribute strength is se- lected as the result. To ensure the effectiveness of the annotation task, we also calculate the variance of each attribute strength for      a given image. If the variance is bigger than a threshold, the re-  sult will be re-determined by discussion. Finally, to generate the weather labels, all attribute strengths greater than or equal to 0.5  are set to 1,  others are set to   0.

Fig. 7 shows the weather label distribution on the two experi- ment datasets. The detailed statistics can also be found at Table 2. In both datasets, cloudy is the class with large number of samples. This is because that cloudy usually co-occurs with other weather conditions. Apart from cloudy, the new constructed dataset is more

balanced than the transient attribute dataset. Besides, it can be ob- served from Table 2 that over half samples have multiple weather labels in both of the two datasets, which also verifies the validity

i and jQ represents all the samples in the dataset. conc(i, j) and

I(i) are indicator functions which are defined as  follows,

of taking weather recognition as a multi-label classification   task.

conc(i, j) =

1, Arr(i)0.5   Arr( j)0.5

0,otherweise ,(11)

4.2. Co-occurrence  relationships

 

We have qualitatively argued that more than one weather con- ditions may occur simultaneously in one image. The quantitative

I(i) =

1, Arr(i) ≥ 0.5

0, otherweise

 

,(12)

analysis of co-occurrence relationships among different weather conditions is also conducted according to the following equation,

}, conc(i, j)

R(i, j) Q,(10)

}, I(i)

Q

 

where both i and j denote a kind of weather condition, R(i, j) indi- cates the measurement of the co-occurrence relationship between

where Arr(i) denotes the attribute strength of weather condition i,

∧ represents the conjunction symbols. In summary, Eq. (10) indi- cates the ratio between co-occurrence number of the two weather conditions and the occurrence number of weather condition i   over

all images. Therefore, }, j R(i, j) and }, j R( j, i) represent the influ-

ence and dependence of label i to others, respectively. To exploit the dependencies when predicting the weather labels, it is natural for us to predict the most influential label first and the dependent label last. Based on this, the following equation is utilized to rank

the weather labels,

},

OR =

},N

n=1

},K

i=1

f ( pn,i, pn,i )

 

,(15)

j R(i, j)

},NK

r = },

j R( j, i)

.(13)

n=1 i=1 pn,i

where  N  denotes  the  number  of  samples  in  the  dataset,  K rep-

Obviously, the label with a higher score of r should rank   first.

The analytical result is depicted in Fig. 8, from which we can simply draw the following conclusions. First, in accordance with our intuition, there are stronger co-occurrence relationships among different weather conditions, such as rainy and cloudy, snowy    and

resents  the  number  of  weather  classes,  pn, i  and  ~pn,i  indicate  the

ground-truth label and predicted label of the nth sample on the      ith weather class, respectively. f( · ) is an indicator function which is defined as follows,

1,p p

foggy, etc. The corresponding samples are usually near the cate- gory boundary. In this paper, we propose to use the combination

f ( p, p) =

0, otherwise

.(16)

of labels to represent these samples. Second, there are indeed la- bel dependencies in the weather recognition task. It is necessary    to consider this problem when predicting multiple weather labels.   In this paper, the convolutional LSTM is employed to capture the dependencies among different weather labels, and the labels are predicted step by step. According to Eq. (13), the order of   weather

labels is fixed as moist  →  cloudy  →  others  →  sunny  → snowy

→  foggy  →     rainy on the transient attribute dataset, and cloudy

→ sunny  → foggy  → rainy  → snowy on our multi-label  weather

classification dataset. Practically, we have also tried several other label orders, they get comparable performance, and the above two achieve the best in most  occasions.

4.2. Evaluation metrics and comparison approaches

 

Per-class precision and recall are first computed as evaluation metrics. Per-class means that for a given weather label, the predic- tion result is true as long as the current label is correctly predicted. Then, the average precision (AP) and average recall (AR) are calcu- lated, which are defined as the average values of per-class preci- sion and recall, respectively.

Besides, sample-wise evaluation metrics are also adopted, which are defined as overall precision (OP) and overall recall (OR).

Finally, the F1 scores (including AF1 and OF1) are computed, which are the harmonic mean of precision and   recall.

Since there are no other multi-label weather recognition ap- proaches, we compare with the multi-label version  of  AlexNet  [32] and VGGNet [35]. To verify the effectiveness of convolu- tional LSTM and channel-wise attention model in this paper, we also compare with some other CNN–RNN frameworks, including CNN–LSTM, CNN–LSTM with spatial attention model (CLA), CNN– GRU with spatial attention model (CGA), CNN-ConvLSTM without channel-wise attention model. Besides, two widely used general multi-label approaches are also employed as comparison methods, i.e., ML-KNN [45] and ML-ARAM [46]. ML-KNN proposed a multi- label lazy learning method that adapts the traditional K-nearest neighbor (KNN) algorithm to the multi-label purpose. ML-ARAM extended the Adaptive Resonance Associative Map neural network for multi-label classification tasks. In our experiment, we test these two approaches using the implementations of the popular scikit- multilearn library. For fair comparisons, all CNN–RNN frameworks use the same CNN (i.e., VGGNet) with our approach. Features input to ML-KNN and ML-ARAM are also extracted by VGGNet (the last fully connection layer) pre-trained on two experimental datasets. The proposed approach are referred to as CNN-Att-ConvLSTM.

4.3. Results on the transient attribute dataset

 

N   KFor  the  transient  attribute  dataset,  1000  images  are     randomly

}, }, f ( pn,i, p˜n,i )

n=1 i=1

selected for testing, another 1000 images are selected for valida-

OP =

,(14)

N · K

tion, and the remains are for training. The experimental result is

shown in Table 3, from which we can see that the  proposed  ap-  proach CNN-Att-ConvLSTM achieves the best results on OP, OR and OF1, and comparable results with the  state-of-the-arts  on  AP,  AR  and AF1. CNN–LSTM with spatial attention model (CLA) also gets good results. While without spatial attention model, CNN–LSTM suffers from serious performance degradation. This indicates the importance of some key regions in the weather recognition task. To evaluate the influences of LSTM in the CNN–RNN framework, we also test CNN–GRU with spatial attention model (CGA), and find   CGA achieves almost the same results with CLA. CNN-ConvLSTM also gets similar results with CLA, which denotes the effectiveness     of convolutional LSTM in information extraction of key regions. Overall, the proposed approach perform better than  multi-label  version of AlexNet, VGGNet, the general multi-label approaches ML-KNN, ML-ARAM, and other CNN–RNN methods, which proves the  superiority  of  our approach.

For per-class result, all these methods perform worse on ‘rainy’ and ‘other’ classes. This is because that most images in transient attribute dataset present distant views. It is difficult to recognize  the rainy weather from such distant views. In addition, samples of ‘other’ class are very rare, and can be easily misclassified as sunny or cloudy in this  dataset.

4.2. Results on the multi-label weather classification dataset

 

For multi-label weather classification dataset, 2000 images are randomly selected for testing, 1000 images for validation, and the remains for training. As presented in Table 4, CNN-Att-ConvLSTM performs the best on almost all the evaluation metrics, which demonstrates the effectiveness of the proposed approach   again.

To analyze the effectiveness of our approach, some weather recognition examples are presented in Fig. 9. It includes the clas- sification results, activation maps and attention weights from our approach. The results of VGGNet are utilized for comparison, since our approach also uses it as the deep feature extractor.

Specifically, our approach works well on the above three images. From the selected activation maps and their attention weights, we can see that our approach can attend to the most correlated weather cues when predicting different weather labels, while the results of VGGNet are not so satisfactory. For example, the first image is annotated as sunny and foggy,  correspondingly the blue sky, the bright area and the region of haze have stronger responses in our activation maps, and the attention weights of cor- responding activation maps are relatively high when predicting dif- ferent labels. However, the ground is activated by VGGNet mistak- enly, which leads to the wrong label, i.e., rainy. Besides, our ap- proach fails on the rest two images, where the fourth image is an- notated as sunny and cloudy, which means an intermediate state between sunny and cloudy. However, only the cloud regions are activated, and the sunny label is lost in our approach. It is mainly because the sunny label is a little ambiguous. The fifth image is annotated as cloudy and rainy.  However,  due to the wet ground    is not so obvious, it is mis-classified as cloudy and foggy in our approach. Overall, the results in Fig. 9 indicate that our approach performs well in most cases, but sometimes fails when the anno- tation is ambiguous and the weather cues are not obvious. It is reasonable since our approach is just based on the visual features, and maybe better performance can be achieved with other modal- ity information, such as humidity, which can be taken into consid- eration in our future  work.

4. Conclusion

 

Considering that more than one  weather  conditions  may  oc- cur simultaneously in one image, we firstly analyze the drawbacks of taking weather recognition as a single label classification task,

and propose a multi-label classification framework for the weather recognition task. It allows one image to belong to multiple weather categories, which can provide more comprehensive description of the weather conditions. Specifically, it is a CNN–RNN architecture, where CNN is extended with a channel-wise attention model to extract the most correlated visual features, and a convolutional LSTM is utilized to predict the weather labels step by step, mean- while, maintaining the spatial information of the visual feature. Be- sides, we build two datasets for the weather recognition task to  make up the problem of lacking training data. Practically, the ex- perimental results have verified the effectiveness of the proposed approach.

In the future work, we plan to introduce the distribution pre- diction task for weather recognition [47–50], which cannot only classify the image with multi-labels, but also predict the strengths of different weather class, so as to describe the weather   conditions

more comprehensively. Besides, other modality information, such as humidity and temperature, can also be utilized in  the  future work.

上一篇:C# MyNewQueue 消息队列


下一篇:wicket基础应用(1)--使用wicket对表单中的数据进行验证