最近要实现的一些功能需要让ES的扩展词、停用词能够热更新,达到让搜索更精确的目的。再此记录一下操作流程:
ES版本:7.11.2
IK分词器版本:7.11.2
方式一:通过加载远程的方式热加载扩展词,停用词
分词器下载路劲:https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.11.2
将下载的分词文件解压到 ~/elasticsearch/plugins/ik 目录,如果缺少相关目录进行创建 mkdir -p plugins/ik
[content@localhost ik]$ pwd
/home/content/elasticsearch-7.11.2/plugins/ik
[content@localhost ik]$ unzip elasticsearch-analysis-ik-7.11.2.zip
[content@localhost ik]$ rm -rf elasticsearch-analysis-ik-7.11.2.zip
进入ik 分词器的配置文件在容器中的路径:~/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
[content@localhost ik]$ cd config/
[content@localhost config]$ vim IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://ip:port/ik/keyWord.txt</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">http://ip:port/ik/stopWord.txt</entry>
</properties>
nginx 目录新建一个 html 文件cd ~/nginx/html vim index.html hello world!
访问 nginx 服务,浏览器打印 hello world!。说明访问 nginx 服务的页面没有问题
创建 ik 分词词库文件
cd ~/nginx/html
mkdir ik
cd ik
vim keyWord.txt
vim stopWord.txt
填写 一些内容,并保存文件 然后访问 http://IP地址:端口号/ik/keyWord.txt 查看文档内容
重启 elasticsearch 服务;
ps -ef | grep elastic
kill -9 [ pid ]
sh elasticsearch -d
方式二:通过连接 MySQL 数据库热加载扩展词,停用词
下载 ik 分词器代码,链接 https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.11.2;下载解压后使用 idea 打开;
在 MySQL 数据库中创建 hot_words 表,stop_words 表
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for hot_words
-- ----------------------------
DROP TABLE IF EXISTS `hot_words`;
CREATE TABLE `hot_words` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`keyword` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`flag` int(255) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of hot_words
-- ----------------------------
INSERT INTO `hot_words` VALUES (1, '奥巴马', 0);
INSERT INTO `hot_words` VALUES (2, '悟空哥', 0);
SET FOREIGN_KEY_CHECKS = 1;
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for stop_words
-- ----------------------------
DROP TABLE IF EXISTS `stop_words`;
CREATE TABLE `stop_words` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`stopword` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`flag` int(255) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 4 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of stop_words
-- ----------------------------
INSERT INTO `stop_words` VALUES (1, 'keyword', 0);
INSERT INTO `stop_words` VALUES (2, 'stopword', 0);
INSERT INTO `stop_words` VALUES (3, 'fielddata', 0);
SET FOREIGN_KEY_CHECKS = 1;
在config目录下添加jdbc.yml
jdbc:
url: jdbc:mysql://172.16.***.***:3306/content?useUnicode=true&autoReconnect=true&failOverReadOnly=false&characterEncoding=utf8&useSSL=false&serverTimezone=UTC
user: root
password : XXXX
keywordSql: SELECT keyword FROM hot_words WHERE flag=0
stopWordSql: SELECT stopword as stopWord FROM stop_words WHERE flag=0
在 pom 文件添加mysql maven库依赖
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.21</version>
</dependency>
修改 src\main\assemblies\plugin.xml 文件 添加如下内容:作用是在 maven 打包时,会把 mysql 驱动 jar 包打包到压缩文件中;
<include>mysql:mysql-connector-java</include>
找到 org.wltea.analyzer.dic.Dictionary 类 添加如下代码
static {
try {
//利用反射把mysql驱动加载到内存
Class.forName("com.mysql.jdbc.Driver").newInstance();
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* 从mysql加载热更新词典
*/
private void loadExpandKeyWordSql() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
ResultSet rstop = null;
Properties prop = null;
InputStream inputStream = null;
try {
prop = new Properties();
inputStream = new FileInputStream(PathUtils.get(getDictRoot(), "jdbc.yml").toFile());
prop.load(inputStream);
conn = DriverManager.getConnection(
prop.getProperty("url"),
prop.getProperty("user"),
prop.getProperty("password"));
stmt = conn.createStatement();
rs = stmt.executeQuery(prop.getProperty("keywordSql"));
while (rs.next()) {
String keyword = rs.getString("keyword");
_MainDict.fillSegment(keyword.trim().toCharArray());
}
logger.info("从mysql热加载 keyWord 成功!");
} catch (Exception e) {
logger.error("error", e);
} finally {
try {
if (inputStream != null) {
inputStream.close();
}
} catch (IOException e) {
e.printStackTrace();
}
if (rs != null) {
try {
rs.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
/**
* mysql 加载停用词词典
*/
private void loadExpandStopWordSql() {
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
ResultSet rstop = null;
Properties prop = null;
InputStream inputStream = null;
try {
prop = new Properties();
inputStream = new FileInputStream(PathUtils.get(getDictRoot(), "jdbc.yml").toFile());
prop.load(inputStream);
conn = DriverManager.getConnection(
prop.getProperty("url"),
prop.getProperty("user"),
prop.getProperty("password"));
stmt = conn.createStatement();
rstop = stmt.executeQuery(prop.getProperty("stopWordSql"));
while (rstop.next()) {
String stopWord = rstop.getString("stopWord");
_StopWords.fillSegment(stopWord.trim().toCharArray());
}
logger.info("从mysql热加载 stopWord 成功!");
} catch (Exception e) {
logger.error("error", e);
} finally {
try {
if (inputStream != null) {
inputStream.close();
}
} catch (IOException e) {
e.printStackTrace();
}
if (rs != null) {
try {
rstop.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (stmt != null) {
try {
stmt.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
logger.error("error", e);
}
}
}
}
找到 org.wltea.analyzer.dic.Dictionary 类 initial() 方法 添加如下代码
/**
* 词典初始化 由于IK Analyzer的词典采用Dictionary类的静态方法进行词典初始化
* 只有当Dictionary类被实际调用时,才会开始载入词典, 这将延长首次分词操作的时间 该方法提供了一个在应用加载阶段就初始化字典的手段
*
* @return Dictionary
*/
public static synchronized void initial(Configuration cfg) {
if (singleton == null) {
synchronized (Dictionary.class) {
if (singleton == null) {
singleton = new Dictionary(cfg);
singleton.loadMainDict();
singleton.loadSurnameDict();
singleton.loadQuantifierDict();
singleton.loadSuffixDict();
singleton.loadPrepDict();
singleton.loadStopWordDict();
if(cfg.isEnableRemoteDict()){
// 建立监控线程
for (String location : singleton.getRemoteExtDictionarys()) {
// 10 秒是初始延迟可以修改的 60是间隔时间 单位秒
pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
}
for (String location : singleton.getRemoteExtStopWordDictionarys()) {
pool.scheduleAtFixedRate(new Monitor(location), 10, 60, TimeUnit.SECONDS);
}
/* 从 mysql 中热加载 keyWord 更新扩展词典 */
pool.scheduleAtFixedRate(() -> {
Dictionary.getSingleton().loadExpandKeyWordSql();
}, 10, 120, TimeUnit.SECONDS);
/* 从 mysql 中热加载 stopWord 更新扩展词典 */
pool.scheduleAtFixedRate(() -> {
Dictionary.getSingleton().loadExpandStopWordSql();
}, 10, 120, TimeUnit.SECONDS);
}
}
}
}
}
找到 plugin-security.policy 文件
grant {
// IP地址 和 端口号 填写 mysql 数据库IP 和端口
permission java.net.SocketPermission "127.0.0.1:3306", "connect,resolve";
permission java.lang.RuntimePermission "setContextClassLoader";
};
进入 pom.xml 文件 修改 分词器版本 和名称,我下载的是 7.11.2 版本分词器,但代码里面版本是 7.4.2,在此修改一下
修改org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin类
打包上传服务器解压即可,打包 命令 mvn clean package -Dmaven.skip.test=true,我本地命令打包一直失败,图形化打包可以!
打包完成后, target\releases 下面会生成一个 zip 压缩文件 上传服务器解压 ~/elasticsearch/plugins/ik-expand
[content@localhost ik-expand]$ pwd
/home/content/elasticsearch-7.11.2/plugins/ik-expand
[content@localhost ik-expand]$ unzip elasticsearch-analysis-ik-7.11.2.zip
[content@localhost ik-expand]$ rm -rf ./elasticsearch-analysis-ik-7.11.2.zip
[content@localhost ik-expand]$ ls
commons-codec-1.9.jar config httpclient-4.5.2.jar mysql-connector-java-8.0.21.jar plugin-security.policy
commons-logging-1.2.jar elasticsearch-analysis-ik-7.11.2.jar httpcore-4.4.4.jar plugin-descriptor.properties protobuf-java-3.11.4.jar
重启 es 就完成了配置;
如果报错日志如下,说明 上述 plugin-security.policy 文件配置未生效;
在/home/es/policy.policy ,在policy.policy 写入与 plugin-security.policy 文件中相同的内容:
grant {
// IP 和 端口号 与 MySQL 数据库保持一致
permission java.net.SocketPermission "******:3306", "connect,resolve";
permission java.lang.RuntimePermission "setContextClassLoader";
};
然后在 elasticsearch config目录文件 jvm.option 添加该命令: -Djava.security.policy=/home/es/policy.policy
[content@localhost config]$
[content@localhost config]$ pwd
/home/content/elasticsearch-7.11.2/config
[content@localhost config]$ vim jvm.options
参考资料:
https://blog.csdn.net/qq_39140300/article/details/110382612
https://blog.csdn.net/zq199419951001/article/details/89884461
代码:
https://gitee.com/gxd_feiyu/es_ik_expand/tree/master/elasticsearch-analysis-ik-7.11.2