(ElasticsSearch学习)歌词检索Demo的实现:二. 搭建spring boot+spring data+jest+elasticsearch环境,实现歌词的全文检索

1.说明

本文主要讲解如何使用Spring Boot快速搭建Web框架,结合Spring Data 和 Jest 快速实现对阿里云ElasticSearch的全文检索功能。
主要使用组件:
Spring Boot Starter:可以帮助我们快速的搭建spring mvc 环境
Jest:一种rest访问es的客户端
elasticsearch:全文检索
spring data elasticsearch:结合spring data
thymeleaf:web前端模版框架
jquery:js框架
bootstrap:前端样式框架

2.项目Maven配置

以下为项目Maven配置,尤其需要注意各个组件的版本,以及注释部分。
各个组件的某些版本组合下回出现各种异常,以下maven为测试可通过的一个版本。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.lewis</groupId>
    <artifactId>esweb</artifactId>
    <version>0.1</version>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <!--必须用2.0+,否则会有一个类
        Caused by: java.lang.NoSuchMethodError: org.elasticsearch.common.settings.Settings.settingsBuilder()Lorg/elasticsearch/common/settings/Settings$Builder;
        -->
        <version>2.0.0.M7</version>
    </parent>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <!--
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        -->

        <!--不可使用version 5.3.3,会有一个类的方法找不到-->
        <dependency>
            <groupId>io.searchbox</groupId>
            <artifactId>jest</artifactId>
            <version>5.3.2</version>
        </dependency>

        <!--必须用5.0+,否则会有一个类找不到org/elasticsearch/node/NodeValidationException-->
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>5.3.3</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.data</groupId>
            <artifactId>spring-data-elasticsearch</artifactId>
            <version>3.0.0.RELEASE</version>
        </dependency>

        <dependency>
            <groupId>com.github.vanroy</groupId>
            <artifactId>spring-boot-starter-data-jest</artifactId>
            <version>3.0.0.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>

        <!--
        不需要引用
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        -->

        <!--spring boot elasticsearch 缺少的jar,需要单独引入-->
        <dependency>
            <groupId>net.java.dev.jna</groupId>
            <artifactId>jna</artifactId>
            <version>4.5.1</version>
        </dependency>

        <!--webjars 前端框架,整体管理前端js框架-->
        <dependency>
            <groupId>org.webjars</groupId>
            <artifactId>jquery</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.webjars</groupId>
            <artifactId>bootstrap</artifactId>
            <version>4.0.0</version>
        </dependency>

        <!--When using Spring Boot version 1.3 or higher, it will automatically detect the webjars-locator library on the classpath and use it to automatically resolve the version of any WebJar assets for you. In order to enable this feature, you will need to add the webjars-locator library as a dependency of your application in the pom.xml file-->
        <dependency>
            <groupId>org.webjars</groupId>
            <artifactId>webjars-locator</artifactId>
            <version>0.30</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <fork>true</fork>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

创建完成后,项目目录结构如下:
(ElasticsSearch学习)歌词检索Demo的实现:二. 搭建spring boot+spring data+jest+elasticsearch环境,实现歌词的全文检索

3.Spring Starter配置

  1. 需使用SpringBootApplication启动
  2. 需禁用ElasticsearchAutoConfiguration,ElasticsearchDataAutoConfiguration,否则会有异常
  3. HighLightJestSearchResultMapper Bean留待下面解释,主要为了解决spring data不支持elasticsearch检索highlight问题,此处为该Bean的注册
@SpringBootApplication
@EnableAutoConfiguration(exclude = {ElasticsearchAutoConfiguration.class, ElasticsearchDataAutoConfiguration.class})
public class App {

    public static void main(String[] args) throws Exception {
        SpringApplication.run(App.class, args);
    }

    @Bean
    public HighLightJestSearchResultMapper highLightJestSearchResultMapper(){
        return new HighLightJestSearchResultMapper();
    }

}

3.Entity配置

a) 歌曲Entity如下:

通过对Class进行Document注解,实现与ElasticSearch中的Index和Type一一对应。
该类在最终与ES返回结果映射时,仅映射其中_source部分。即如下图部分(highlight另说,后面单独处理了):
(ElasticsSearch学习)歌词检索Demo的实现:二. 搭建spring boot+spring data+jest+elasticsearch环境,实现歌词的全文检索

@Document(indexName = "songs",type = "sample",shards = 1, replicas = 0, refreshInterval = "-1")
public class Song extends HighLightEntity{

    @Id
    private Long id;

    private String name;
    private String href;
    private String lyric;
    private String singer;
    private String album;

    public Song(Long id, String name, String href, String lyric, String singer, String album, Map<String, List<String>> highlight) {
       //省略
    }

    public Song() {
    }
    //getter setter 省略...
}

b) 为了解决Spring data elasticsearch问题,此处增加一个抽象类:HighLightEntity,其他Entity需要继承该类。

package org.leiws.esweb.entity;

import java.io.Serializable;
import java.util.List;
import java.util.Map;

public abstract class HighLightEntity implements Serializable{

    private Map<String, List<String>> highlight;

    public Map<String, List<String>> getHighlight() {
        return highlight;
    }

    public void setHighlight(Map<String, List<String>> highlight) {
        this.highlight = highlight;
    }
}

4.Repository配置

package org.leiws.esweb.repository;
import org.leiws.esweb.entity.Song;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
public interface SongRepository extends ElasticsearchRepository<Song,Long> {
}

5.Service配置

a) 接口

package org.leiws.esweb.service;

import org.leiws.esweb.entity.Song;
import org.springframework.data.domain.Page;

import java.util.List;

/**
 * The interface Song service.
 */
public interface SongService {

    /**
     * Search song list.
     *
     * @param pNum     the p num
     * @param pSize    the p size
     * @param keywords the keywords
     * @return the list
     */
    public Page<Song> searchSong(Integer pNum, Integer pSize, String keywords);
}

b) 实现类

该类实现了具体如何分页,如何查询等

package org.leiws.esweb.service.impl;

import com.github.vanroy.springdata.jest.JestElasticsearchTemplate;
import org.apache.log4j.Logger;
import org.elasticsearch.common.lucene.search.function.FiltersFunctionScoreQuery;
import org.elasticsearch.index.query.MatchPhraseQueryBuilder;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder;
import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.leiws.esweb.entity.Song;
import org.leiws.esweb.repository.HighLightJestSearchResultMapper;
import org.leiws.esweb.repository.SongRepository;
import org.leiws.esweb.service.SongService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SearchQuery;
import org.springframework.stereotype.Service;
import static org.elasticsearch.index.query.QueryBuilders.functionScoreQuery;
import static org.elasticsearch.index.query.QueryBuilders.matchPhraseQuery;

import java.util.List;

@Service
public class SongServiceImpl implements SongService{
    private static final Logger LOGGER = Logger.getLogger(SongServiceImpl.class);
    /* 分页参数 */
    private final static Integer PAGE_SIZE = 12;          // 每页数量
    private final static Integer DEFAULT_PAGE_NUMBER = 0; // 默认当前页码

    /* 搜索模式 */
    private final static String SCORE_MODE_SUM = "sum"; // 权重分求和模式
    private final static Float  MIN_SCORE = 10.0F;      // 由于无相关性的分值默认为 1 ,设置权重分最小值为 10

    @Autowired
    SongRepository songRepository;

    @Autowired
    JestElasticsearchTemplate jestElasticsearchTemplate;

    @Autowired
    HighLightJestSearchResultMapper jestSearchResultMapper;

    @Override
    public Page<Song> searchSong(Integer pNum, Integer pSize, String keywords) {
        // 校验分页参数
        if (pSize == null || pSize <= 0) {
            pSize = PAGE_SIZE;
        }

        if (pNum == null || pNum < DEFAULT_PAGE_NUMBER) {
            pNum = DEFAULT_PAGE_NUMBER;
        }

        LOGGER.info("\n searchCity: searchContent [" + keywords + "] \n ");
        // 构建搜索查询
        SearchQuery searchQuery = getCitySearchQuery(pNum,pSize,keywords);
        LOGGER.info("\n searchCity: searchContent [" + keywords + "] \n DSL  = \n " + searchQuery.getQuery().toString());
//        Page<Song> cityPage = songRepository.search(searchQuery);
        Page<Song> cityPage = jestElasticsearchTemplate.queryForPage(searchQuery,Song.class,jestSearchResultMapper);
        return cityPage;
    }
    /**
     * 根据搜索词构造搜索查询语句
     *
     * 代码流程:
     *      - 权重分查询
     *      - 短语匹配
     *      - 设置权重分最小值
     *      - 设置分页参数
     *
     * @param pNum 当前页码
     * @param pSize 每页大小
     * @param searchContent 搜索内容
     * @return
     */
    private SearchQuery getCitySearchQuery(Integer pNum, Integer pSize,String searchContent) {

        /* elasticsearch 2.4.6 版本写法
        FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery()
                .add(QueryBuilders.boolQuery().should(QueryBuilders.matchQuery("lyric", searchContent)),
                        ScoreFunctionBuilders.weightFactorFunction(1000))
                .scoreMode(SCORE_MODE_SUM).setMinScore(MIN_SCORE);
        */


        FunctionScoreQueryBuilder.FilterFunctionBuilder[] functions = {
                new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                        matchPhraseQuery("lyric", searchContent),
                        ScoreFunctionBuilders.weightFactorFunction(1000))
        };
        FunctionScoreQueryBuilder functionScoreQueryBuilder =
                functionScoreQuery(functions).scoreMode(FiltersFunctionScoreQuery.ScoreMode.SUM).setMinScore(MIN_SCORE);

        // 分页参数
//        Pageable pageable = new PageRequest(pNum, pSize);
        Pageable pageable = PageRequest.of(pNum, pSize);

        //高亮提示
        HighlightBuilder.Field highlightField =  new HighlightBuilder.Field("lyric")
                .preTags(new String[]{"<font color='red'>", "<b>", "<em>"})
                .postTags(new String[]{"</font>", "</b>", "</em>"})
                .fragmentSize(15)
                .numOfFragments(5)

                //highlightQuery必须单独设置,否则在使用FunctionScoreQuery时,highlight配置不生效,返回结果无highlight元素
                //官方解释:Highlight matches for a query other than the search query. This is especially useful if you use a rescore query because those are not taken into account by highlighting by default.
                .highlightQuery(matchPhraseQuery("lyric", searchContent));

        return new NativeSearchQueryBuilder()
                .withPageable(pageable)
    //            .withSourceFilter(new FetchSourceFilter(new String[]{"name","singer","lyric"},new String[]{}))
                .withHighlightFields(highlightField)
                .withQuery(functionScoreQueryBuilder).build();
    }
}

c) 解决Spring Data ElasticSearch不支持Highlight的问题

通过自定义实现一个如下的JestSearchResultMapper,解决无法Highlight的问题

package org.leiws.esweb.repository;

//import 省略
public class HighLightJestSearchResultMapper extends DefaultJestResultsMapper {

    private EntityMapper entityMapper;
    private MappingContext<? extends ElasticsearchPersistentEntity<?>, ElasticsearchPersistentProperty> mappingContext;

    public HighLightJestSearchResultMapper() {
        this.entityMapper = new DefaultEntityMapper();
        this.mappingContext = new SimpleElasticsearchMappingContext();
    }

    public HighLightJestSearchResultMapper(MappingContext<? extends ElasticsearchPersistentEntity<?>, ElasticsearchPersistentProperty> mappingContext, EntityMapper entityMapper) {
        this.entityMapper = entityMapper;
        this.mappingContext = mappingContext;
    }

    public EntityMapper getEntityMapper() {
        return entityMapper;
    }

    public void setEntityMapper(EntityMapper entityMapper) {
        this.entityMapper = entityMapper;
    }

    @Override
    public <T> AggregatedPage<T> mapResults(SearchResult response, Class<T> clazz) {
        return mapResults(response, clazz, null);
    }

    @Override
    public <T> AggregatedPage<T> mapResults(SearchResult response, Class<T> clazz, List<AbstractAggregationBuilder> aggregations) {
        LinkedList<T> results = new LinkedList<>();
        for (SearchResult.Hit<JsonObject, Void> hit : response.getHits(JsonObject.class)) {
            if (hit != null) {
                T result = mapSource(hit.source, clazz);
                HighLightEntity highLightEntity = (HighLightEntity) result;
                highLightEntity.setHighlight(hit.highlight);
                results.add((T) highLightEntity);
            }
        }

        String scrollId = null;
        if (response instanceof ExtendedSearchResult) {
            scrollId = ((ExtendedSearchResult) response).getScrollId();
        }

        return new AggregatedPageImpl<>(results, response.getTotal(), response.getAggregations(), scrollId);
    }

    private  <T> T mapSource(JsonObject source, Class<T> clazz) {
        String sourceString = source.toString();
        T result = null;
        if (!StringUtils.isEmpty(sourceString)) {
            result = mapEntity(sourceString, clazz);
            setPersistentEntityId(result, source.get(JestResult.ES_METADATA_ID).getAsString(), clazz);
        } else {
            //TODO(Fields results) : Map Fields results
            //result = mapEntity(hit.getFields().values(), clazz);
        }
        return result;
    }

    private <T> T mapEntity(String source, Class<T> clazz) {
        if (isBlank(source)) {
            return null;
        }
        try {
            return entityMapper.mapToObject(source, clazz);
        } catch (IOException e) {
            throw new ElasticsearchException("failed to map source [ " + source + "] to class " + clazz.getSimpleName(), e);
        }
    }
    private <T> void setPersistentEntityId(Object entity, String id, Class<T> clazz) {

        ElasticsearchPersistentEntity<?> persistentEntity = mappingContext.getRequiredPersistentEntity(clazz);
        ElasticsearchPersistentProperty idProperty = persistentEntity.getIdProperty();

        // Only deal with text because ES generated Ids are strings !
        if (idProperty != null) {
            if (idProperty.getType().isAssignableFrom(String.class)) {
                persistentEntity.getPropertyAccessor(entity).setProperty(idProperty, id);
            }
        }
    }
}

上面类的大部分代码来源于:DefaultJestResultsMapper
重点修改部分为:

@Override
    public <T> AggregatedPage<T> mapResults(SearchResult response, Class<T> clazz, List<AbstractAggregationBuilder> aggregations) {
        LinkedList<T> results = new LinkedList<>();
        for (SearchResult.Hit<JsonObject, Void> hit : response.getHits(JsonObject.class)) {
            if (hit != null) {
                T result = mapSource(hit.source, clazz);
                HighLightEntity highLightEntity = (HighLightEntity) result;
                highLightEntity.setHighlight(hit.highlight);
                results.add((T) highLightEntity);
            }
        }

        String scrollId = null;
        if (response instanceof ExtendedSearchResult) {
            scrollId = ((ExtendedSearchResult) response).getScrollId();
        }

        return new AggregatedPageImpl<>(results, response.getTotal(), response.getAggregations(), scrollId);
    }

6.Controller

相对简单,如普通的Spring Controller

@Controller
@RequestMapping(value = "/search")
public class SearchController {

    @Autowired
    SongService songService;

    /**
     * Song list string.
     *
     * @param map the map
     * @return the string
     */
    @RequestMapping(method = RequestMethod.GET)
    public String songList(@RequestParam(value = "pNum") Integer pNum,
                           @RequestParam(value = "pSize", required = false) Integer pSize,
                           @RequestParam(value = "keywords") String keywords,ModelMap map){
       map.addAttribute("pageSong",songService.searchSong(pNum,pSize,keywords));
       return "songList";
    }
}

7.前端页面thymeleaf模版

存放目录为:resources/templates/songList.html

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org" lang="en">
<head>
    <meta charset="UTF-8"/>
    <title>Title</title>
    <link rel='stylesheet' href='/webjars/bootstrap/css/bootstrap.min.css'>
    <script src="/webjars/jquery/jquery.min.js"></script>
    <script src="/webjars/bootstrap/js/bootstrap.min.js"></script>
</head>
<body>
<form action="/search" class="px-5 py-3" >
    <div class="input-group">
        <input name="keywords" type="text" class="form-control" placeholder="歌词搜索,请输入歌词内容" aria-label="歌词搜索,请输入歌词内容" aria-describedby="basic-addon2">
        <div class="input-group-append">
            <button class="btn btn-outline-secondary" type="button">搜索</button>
        </div>
        <input type="hidden" name="pNum" value="0"/>
    </div>
</form>
<div class="alert alert-light" role="alert">
    为您找到0个结果:
</div>
<ul class="list-group">
    <li th:each="song : ${pageSong.content}" class="list-group-item">
        <div class="row">
            <a th:href="${song.href}">
            <h4 scope="row" th:text="${song.name}" ></h4>
            </a>
            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
            <h6 scope="row" th:text="${song.singer}" class="align-bottom" ></h6>
        </div>
        <!--
            <td><a th:href="@{/users/update/{userId}(userId=${user.id})}" th:text="${user.name}"></a></td>
        -->
        <div class="row">
            <span th:each="highlight : ${song.highlight}">
                <span th:each="word : ${highlight.value}">
                    <span th:utext="${word}"></span>...
                </span>
            </span>
        </div>
    </li>
</ul>

<nav aria-label="..." class="">
    <ul class="pagination pagination-sm justify-content-center py-5">
        <li class="page-item ">
            <a class="page-link" href="#">
                <span aria-hidden="true">&laquo;</span>
                <span class="sr-only">Previous</span>
            </a>
        </li>
        <li class="page-item"><a class="page-link" href="#">1</a></li>
        <li class="page-item"><a class="page-link" href="#">2</a></li>
        <li class="page-item"><a class="page-link" href="#">3</a></li>
        <li class="page-item">
            <a class="page-link" href="#">
            <span aria-hidden="true">&raquo;</span>
            <span class="sr-only">Next</span>
            </a>
        </li>
    </ul>
</nav>
</body>
</html>

8.阿里云ElasticSearch连接配置

在resources/application.properties中配置如下:

spring.data.jest.uri=http://1xx.xxx.xxx.xxx:8080
spring.data.jest.username=username
spring.data.jest.password=password
spring.data.jest.maxTotalConnection=50
spring.data.jest.defaultMaxTotalConnectionPerRoute=50
spring.data.jest.readTimeout=5000

9.其他

a) thymeleaf 热启动配置,便于测试

  1. 在resources/application.properties中配置如下:
spring.thymeleaf.cache=false
  1. 在pom.xml中增加:
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <optional>true</optional>
        </dependency>
<build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <fork>true</fork>
                </configuration>
            </plugin>
        </plugins>
    </build>

3.每次还是需要重新compile后,修改的thymeleaf模版代码才会自动生效,因为spring boot启动是以target目录为准的

b) 阿里云elasticsearch在esc上配置ngnix代理,以支持本机可以公网访问,便于开发

  1. 购买一台esc
  2. 在ecs上安装ngnix,并配置代理信息server 部分如下:
    server {
        listen       8080;
        #listen       [::]:80 default_server;
        server_name  {本机内网ip} {本机外网ip};
        #root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        #include /etc/nginx/default.d/*.conf;

        location / {
                        proxy_pass http://{elasticsearch 内网 ip}:9200;
        }
    }

10. 最后,查询效果:

(ElasticsSearch学习)歌词检索Demo的实现:二. 搭建spring boot+spring data+jest+elasticsearch环境,实现歌词的全文检索

--分页等页面效果未完成

上一篇:Apache Log4j 漏洞(JNDI注入 CVE-2021-44228)


下一篇:Spark学习[扩展阅读] 详解 Spark RDD