Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

原文  https://arxiv.org/abs/1908.10084

Abstract

STS semantic textual similarity

BERT结构不适合语义相似搜索,非监督的任务聚类等

SBERT Sentence-BERT

finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.

finding which of the over 40 million existent questions of Quora is the most similar for a new question could be modeled as a pair-wise comparison with BERT, however, answering a single query would require over 50 hours

1 Introduction

By using optimized index structures, finding the most similar Quora question can be reduced from 50 hours to a few milliseconds (Johnson et al., 2017).

Quora一个新问题要50小时才能找到答案,SBERT降到毫秒

3 Model

SBERT adds a pooling operation to the output of BERT / RoBERTa to derive a fixed sized sentence embedding

增加了池化层,固定embedding的尺寸

We experiment with three pooling strategies: Using the output of the CLS-token, computing the mean of all output vectors (MEAN strategy), and computing a max-over-time of the output vectors (MAX-strategy). The default configuration is MEAN.

三种池化策略

3.1 Training Details

We fine-tune SBERT with a 3-way softmax classifier objective function for one epoch. We used a batch-size of 16, Adam optimizer with learning rate 2e−5, and a linear learning rate warm-up over 10% of the training data. Our default pooling strategy is MEAN.

三种方式和fine-tune参数

4.1 Unsupervised STS

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

SBERT和其他模型在STS任务结果对比,SRoBERT相对于SBERT提高有限

4.2 Supervised STS

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

4.3 Argument Facet Similarity

AFS Argument Facet Similarity

STS data is usually descriptive, while AFS data are argumentative excerpts from dialogs. To be considered similar, arguments must not only make similar claims, but also provide a similar reasoning.

AFS数据判断相似比STS数据更难

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

6 Ablation Study

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

classification任务用 (u,v,|u-v|) 在softmax classifier中效果比较好

When trained with the classification objective function on NLI data, the pooling strategy has a rather minor impact. The impact of the concatenation mode is much larger.

When trained with the regression objective function, we observe that the pooling strategy has a large impact.

不同任务对应不同的影响

7 Computational Efficiency

For improved computation of sentence embeddings, we implemented a smart batching strategy: Sentences with similar lengths are grouped together and are only padded to the longest element in a mini-batch. This drastically reduces computational overhead from padding tokens.

降低计算成本使用策略,相似长度的放在一起,只和mini-batch中对齐

上一篇:STS中没有spring 没有spring Bean configuration file 输入bean没有出现spring相关文件(转载)


下一篇:STS下载和安装