分布式id生成之雪花id

2024-04-03 22:22:16

前言
为什么需要分布式id，因为现在的系统都是集群、分布式的，依赖于数据库的自增id在量大的时候肯定扛不住。网上介绍分布式id的文章很多，现在都是倾向于应用雪花算法生成分布式id，可以一定程度上不依赖redis等中间件。但是看文章的时候发现很多文章有写错的地方，例如：workerId = Netutil.ipv4ToLong(NetUtil.getLocalhoststr());生成的机器id并没有应用，所以我决定写一篇总结性的文章进行答疑。

1.id生成需要注意什么：
特点说明
全局唯一不能出现重复的ID号,既然是唯一标识,这是最基本的要求
趋势递增在MySQL的innoDB引擎中使用的是聚集索引,由于多数RDBMS使用Btree的数据结构来存储索引数据,在主键的选择上面我们应该尽量使用有序的主键保证写入性能
单调递增保证下一个ID大于上一个ID,例如事务版本号、IM增量信息、排序等特殊需求
信息安全如果ID是连续的,恶意用户的扒取工作就非常容易做了,直接按照顺序下载指定URL即可，所以在一些应用场景下,需要ID无规则不规则,让竞争对手不好猜
含时间戳这样就能在开发中快速了解分布式id的生成时间
2.id生成的性能考核：
特点说明
高可用发一个获取分布式ID的请求,服务器就要保证99.999%的情况下给我创建一个唯一分布式ID
低延迟发一个获取分布式ID的请求,服务器就要快,极速
高QPS 假如并发一口气创建分布式ID请求同时杀过来,服务器要顶得住且一下子成功创建10万
3.通用方案的优势和劣势对比：
方案优点缺点备注
UUID 性能非常高:本地生成，没有网络消耗,无序，如果只考虑唯一性,OK 入数据库性能差 mysql的官方建议主键要尽量的越短越好
数据库自增主键适合单机不适合分布式
基于redis生成全局id策略提供原子类INCR和INCRBY操作递增，满足性能需求需要维护redis集群如果有集群的话可以采用
4.重点介绍雪花算法生成：
图片介绍其原理：图片名词解释：

名词解释备注
1位不用。二进制中最高位为1的都是负数，但是我们生成的id一般都使用整数，所以这个最高位固定是0 就是固定的0
41位用来记录时间戳（毫秒） 41位可以表示 2 41 − 1 2^{41}-1 241−1个数字,如果只用来表示正整数（计算机中正数包含0），可以表示的数值范围是：0 至 241 − 1 2{41}-1 241−1，减1是因为可表示的数值范围是从0开始算的，而不是1,也就是说41位可以表示 241 − 1 2{41}-1 241−1个毫秒的值，转化成单位年则是 ( 2 41 − 1 ) / ( 1000 ∗ 60 ∗ 60 ∗ 24 ∗ 365 ) = 69 (2^{41}-1) / (1000 * 60 * 60 * 24 * 365) = 69 (241−1)/(1000∗60∗60∗24∗365)=69年
10位用来记录工作机器id 可以部署在 2 10 = 1024 2^{10} = 1024 210=1024个节点，包括 5位datacenterId 和 5位workerId,5位（bit）可以表示的最大正整数是 2 5 − 1 = 31 2^{5}-1 = 31 25−1=31，即可以用0、1、2、3、…31这32个数字，来表示不同的datecenterId或workerId
12位序列号，用来记录同毫秒内产生的不同id 12位（bit）可以表示的最大正整数是 2 12 − 1 = 4095 2^{12}-1 = 4095 212−1=4095，即可以用0、1、2、3、…4094这4095个数字，来表示同一机器同一时间截（毫秒)内产生的4095个ID序号
说明：java中64bit刚好就是整数long类型，所以SnowFlake雪花算法中的id是long 类型存储。

这样的设计保证：1.生成的id按时间的趋势递增，保证有序，2.整个集群部署服务、分布式部署服务因为参数workerId和datacenterId的设计不会产生重复id 当然雪花id的使用的公认缺点是依赖机器时钟，如果机器时钟回拨，会导致重复ID生成在单机上是递增的。但是这种一般能够从运维角度避免掉。

其他大厂分布式id开源框架介绍：

厂商说明
美团（Leaf） Leaf由美团开发，github地址：https://github.com/Meituan-Dianping/Leaf Leaf同时支持号段模式和snowflake算法模式，可以切换使用.
滴滴（Tinyid） Tinyid由滴滴开发，Github地址：https://github.com/didi/tinyid。 Tinyid是基于号段模式原理实现的与Leaf如出一辙，每个服务获取一个号段（1000,2000]、（2000,3000]、（3000,4000]
百度（uid-generator） uid-generator是由百度技术部开发，项目GitHub地址 https://github.com/baidu/uid-generatoruid-generator是基于Snowflake算法实现的，与原始的snowflake算法不同在于，uid-generator支持自定义时间戳、工作机器ID和序列号等各部分的位数，而且uid-generator中采用用户自定义workId的生成策略。uid-generator需要与数据库配合使用，需要新增一个WORKER_NODE表。当应用启动时会向数据库表中去插入一条数据，插入成功后返回的自增ID就是该机器的workId数据由host，port组成。workId，占用了22个bit位，时间占用了28个bit位，序列化占用了13个bit位，需要注意的是，和原始的snowflake不太一样，时间的单位是秒，而不是毫秒，workId也不一样，而且同一应用每次重启就会消费一个workId。
错误举例：代码：

import cn.hutool.core.lang.Snowflake;import cn.hutool.core.util.Idutil;import lombok.extern.slf4j.Slf4j;import org.springframework.stereotype.Component;import javax.annotation.PostConstruct ;@Slf4j@Componentpublic class IdGeneratorSnowflake
private 1ong workerId = 0; private long datacenterId = 1; private Snowflake snowflake = Idutil.createsnowflake(workerId , datacenterId);
//依赖注入完成后执行该方法，进行一些初始化工作
@PostConstruct
public void init() { try{
workerId = Netutil.ipv4ToLong(NetUtil.getLocalhoststr());
log.info(“当前机器的workerId: {}”,workerId);
}catch (Exception e){
e.printStackTrace();
log.warn(“当前机器的workerId获取失败” ,e); //释放ID
workerId = NetUtil.getLocalhostStr() . hashCode();
}
} //使用默认机房号获取ID
public synchronized 1ong snowflakeId( ) { return snowflake.nextId( );
} //自己制定机房号获取ID
public synchronized 1ong snowflakeId(long workerId, 1ong datacenterId) {
Snowflake snowflake = Idutil.createSnowflake (workerId, datacenterId); return snowflake.nextId( );
} /**
生成的是不带-的字符审，类似于: 73a64edf935d4952a287739a66f96e06
* @return
/
public String simpleUUID() { return IdUtil.simpleUUID();
} /
*生成的UUID是带-的字符串，类似于: b12b6401-6f9c-4351-b2b6-d8afc9ab9272
* @return
*/
public String randomUUID() { return IdUtil. randomUUID();
}
public static void main(String[] args)
System. out . println( new IdGeneratorSnowflake().snowflakeId());
}
}
上面的代码在大部分文章中都能看到，看着也生成workerId了，但是其实是没用到的，所以根本不起作用。并且上面的ip转long虽然是数字的，但是雪花算法里面的workerId是限制在0到31的，所以的话上面的代码示例是错误的，估计误导了一大批。争取的做法：

private static Long getWorkId(){ try {
String hostAddress = Inet4Address.getLocalHost().getHostAddress(); int[] ints = StringUtils.toCodePoints(hostAddress); int sums = 0; for(int b : ints){
sums += b;
} return (long)(sums % 32);
} catch (UnknownHostException e) { // 如果获取失败，则使用随机数备用
return RandomUtils.nextLong(0,31);
}
}
private static Long getDataCenterId(){ int[] ints = StringUtils.toCodePoints(SystemUtils.getHostName()); int sums = 0; for (int i: ints) {
sums += i;
} return (long)(sums % 32);
}
上面两个方法用到了依赖：

org.apache.commons commons-lang3 3.8 最终完整的代码：

package com.my.blog.website.utils;
import org.apache.commons.lang3.RandomUtils;import org.apache.commons.lang3.StringUtils;import org.apache.commons.lang3.SystemUtils;
import java.net.Inet4Address;import java.net.UnknownHostException;
/**

Twitter_Snowflake
SnowFlake的结构如下(每部分用-分开):
0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000
1位标识，由于long基本类型在Java中是带符号的，最高位是符号位，正数是0，负数是1，所以id一般是正数，最高位是0
41位时间截(毫秒级)，注意，41位时间截不是存储当前时间的时间截，而是存储时间截的差值（当前时间截 - 开始时间截)
得到的值），这里的的开始时间截，一般是我们的id生成器开始使用的时间，由我们程序来指定的（如下下面程序IdWorker类的startTime属性）。41位的时间截，可以使用69年，年T = (1L << 41) / (1000L * 60 * 60 * 24 * 365) = 69
10位的数据机器位，可以部署在1024个节点，包括5位datacenterId和5位workerId
12位序列，毫秒内的计数，12位的计数顺序号支持每个节点每毫秒(同一机器，同一时间截)产生4096个ID序号
加起来刚好64位，为一个Long型。
SnowFlake的优点是，整体上按照时间自增排序，并且整个分布式系统内不会产生ID碰撞(由数据中心ID和机器ID作区分)，并且效率较高，经测试，SnowFlake每秒能够产生26万ID左右。
*/public class SnowflakeIdWorker {
// Fields=============
/** 开始时间截 (2015-01-01) /
private final long twepoch = 1489111610226L;
/* 机器id所占的位数 /
private final long workerIdBits = 5L;
/* 数据标识id所占的位数 /
private final long dataCenterIdBits = 5L;
/* 支持的最大机器id，结果是31 (这个移位算法可以很快的计算出几位二进制数所能表示的最大十进制数) /
private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
/* 支持的最大数据标识id，结果是31 /
private final long maxDataCenterId = -1L ^ (-1L << dataCenterIdBits);
/* 序列在id中占的位数 /
private final long sequenceBits = 12L;
/* 机器ID向左移12位 /
private final long workerIdShift = sequenceBits;
/* 数据标识id向左移17位(12+5) /
private final long dataCenterIdShift = sequenceBits + workerIdBits;
/* 时间截向左移22位(5+5+12) /
private final long timestampLeftShift = sequenceBits + workerIdBits + dataCenterIdBits;
/* 生成序列的掩码，这里为4095 (0b111111111111=0xfff=4095) /
private final long sequenceMask = -1L ^ (-1L << sequenceBits);
/* 工作机器ID(0~31) /
private long workerId;
/* 数据中心ID(0~31) /
private long dataCenterId;
/* 毫秒内序列(0~4095) /
private long sequence = 0L;
/* 上次生成ID的时间截 */
private long lastTimestamp = -1L;
private static SnowflakeIdWorker idWorker;
static {
idWorker = new SnowflakeIdWorker(getWorkId(),getDataCenterId());
}
//Constructors=======
/**
- 构造函数
- @param workerId 工作ID (0~31)
- @param dataCenterId 数据中心ID (0~31)
  */
  public SnowflakeIdWorker(long workerId, long dataCenterId) { if (workerId > maxWorkerId || workerId < 0) { throw new IllegalArgumentException(String.format(“workerId can’t be greater than %d or less than 0”, maxWorkerId));
  } if (dataCenterId > maxDataCenterId || dataCenterId < 0) { throw new IllegalArgumentException(String.format(“dataCenterId can’t be greater than %d or less than 0”, maxDataCenterId));
  } this.workerId = workerId; this.dataCenterId = dataCenterId;
  }
  // Methods============
  /**
- 获得下一个ID (该方法是线程安全的)
- @return SnowflakeId
  /
  public synchronized long nextId() { long timestamp = timeGen();
  //如果当前时间小于上一次ID生成的时间戳，说明系统时钟回退过这个时候应当抛出异常
  if (timestamp < lastTimestamp) { throw new RuntimeException( String.format(“Clock moved backwards. Refusing to generate id for %d milliseconds”, lastTimestamp - timestamp));
  }
  //如果是同一时间生成的，则进行毫秒内序列
  if (lastTimestamp == timestamp) {
  sequence = (sequence + 1) & sequenceMask; //毫秒内序列溢出
  if (sequence == 0) { //阻塞到下一个毫秒,获得新的时间戳
  timestamp = tilNextMillis(lastTimestamp);
  }
  } //时间戳改变，毫秒内序列重置
  else {
  sequence = 0L;
  }
  //上次生成ID的时间截
  lastTimestamp = timestamp;
  //移位并通过或运算拼到一起组成64位的ID
  return ((timestamp - twepoch) << timestampLeftShift)
  | (dataCenterId << dataCenterIdShift)
  | (workerId << workerIdShift)
  | sequence;
  }
  /*
- 阻塞到下一个毫秒，直到获得新的时间戳
- @param lastTimestamp 上次生成ID的时间截
- @return 当前时间戳
  /
  protected long tilNextMillis(long lastTimestamp) { long timestamp = timeGen(); while (timestamp <= lastTimestamp) {
  timestamp = timeGen();
  } return timestamp;
  }
  /*
- 返回以毫秒为单位的当前时间
- @return 当前时间(毫秒)
  */
  protected long timeGen() { return System.currentTimeMillis();
  }
  private static Long getWorkId(){ try { String hostAddress = Inet4Address.getLocalHost().getHostAddress(); int[] ints = StringUtils.toCodePoints(hostAddress); int sums = 0; for(int b : ints){
  sums += b;
  } return (long)(sums % 32);
  } catch (UnknownHostException e) { // 如果获取失败，则使用随机数备用
  return RandomUtils.nextLong(0,31);
  }
  }
  private static Long getDataCenterId(){ int[] ints = StringUtils.toCodePoints(SystemUtils.getHostName()); int sums = 0; for (int i: ints) {
  sums += i;
  } return (long)(sums % 32);
  }
/**
- 静态工具类
- @return
  */
  public static synchronized Long generateId(){ long id = idWorker.nextId(); return id;
  }
  //Test===============
  /** 测试 */
  public static void main(String[] args) {
  System.out.println(System.currentTimeMillis()); long startTime = System.nanoTime(); for (int i = 0; i < 50000; i++) { long id = SnowflakeIdWorker.generateId();
  System.out.println(id);
  }
  System.out.println((System.nanoTime()-startTime)/1000000+“ms”);
  }
  }
  当然这也是参考网上的代码，其中个人觉得还是不够完美。因为其中还是会存在小概率的重复workId和dataCenterId 的生成。

4.1最简单的使用分布式雪花id：
引入依赖：

cn.hutool hutool-captcha ${hutool.version}
使用代码：

@Component@Slf4jpublic class SnowflakeConfig { @JsonFormat(shape = JsonFormat.Shape.STRING)// private long workerId = getWorkId();//为终端ID
private long dataCenterId = 1;//数据中心ID
private Snowflake snowflake = IdUtil.createSnowflake(getWorkId(),dataCenterId); public synchronized long snowflakeId(){ return snowflake.nextId();
} public synchronized long snowflakeId(long workerId,long dataCenterId){
Snowflake snowflake = IdUtil.createSnowflake(workerId, dataCenterId); return snowflake.nextId();
} /**
* @Author gj
* @Description 获取机器id
* @Params
* @Return
* @Date 2021/5/26 13:52
/
private static Long getWorkId(){ try {
String hostAddress = Inet4Address.getLocalHost().getHostAddress(); int[] ints = StringUtils.toCodePoints(hostAddress); int sums = 0; for(int b : ints){
sums += b;
} return (long)(sums % 32);
} catch (UnknownHostException e) { // 如果获取失败，则使用随机数备用
return RandomUtils.nextLong(0,31);
}
} /*
* @Author gj
* @Description获取数据id
* @Params
* @Return
* @Date 2021/5/26 13:52
*/
private static Long getDataCenterId(){ int[] ints = StringUtils.toCodePoints(SystemUtils.getHostName()); int sums = 0; for (int i: ints) {
sums += i;
} return (long)(sums % 32);
}
}
单元测试进行注入调用即可：

@Autowired
SnowflakeConfig snowflakeConfig;

@Test    public  void contextLoads() {        for (int i = 0; i < 10; i++) {
        System.out.println(snowflakeConfig.snowflakeId());
    }
}

自己实现雪花算法代码：

/**

@program: simple_tools
@description: 雪花算法代码实现
@author: qiqi
/public class IdWorker { // 时间起始标记点，作为基准，一般取系统的最近时间（一旦确定不能变动）
private final static long twepoch = 1288834974657L; // 机器标识位数
private final static long workerIdBits = 5L; // 数据中心标识位数
private final static long datacenterIdBits = 5L; // 机器ID最大值
private final static long maxWorkerId = -1L ^ (-1L << workerIdBits); // 数据中心ID最大值
private final static long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); // 毫秒内自增位
private final static long sequenceBits = 12L; // 机器ID偏左移12位
private final static long workerIdShift = sequenceBits; // 数据中心ID左移17位
private final static long datacenterIdShift = sequenceBits + workerIdBits; // 时间毫秒左移22位
private final static long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; private final static long sequenceMask = -1L ^ (-1L << sequenceBits); / 上次生产id时间戳 /
private static long lastTimestamp = -1L; // 0，并发控制
private long sequence = 0L; private final long workerId; // 数据标识id部分
private final long datacenterId; public IdWorker(){ this.datacenterId = getDatacenterId(maxDatacenterId); this.workerId = getMaxWorkerId(datacenterId, maxWorkerId);
} /
- @param workerId
- ```
       工作机器ID
```
- @param datacenterId
- ```
       序列号
```
/
public IdWorker(long workerId, long datacenterId) { if (workerId > maxWorkerId || workerId < 0) { throw new IllegalArgumentException(String.format(“worker Id can’t be greater than %d or less than 0”, maxWorkerId));
} if (datacenterId > maxDatacenterId || datacenterId < 0) { throw new IllegalArgumentException(String.format(“datacenter Id can’t be greater than %d or less than 0”, maxDatacenterId));
} this.workerId = workerId; this.datacenterId = datacenterId;
} /*
- 获取下一个ID
- @return
  /
  public synchronized long nextId() { long timestamp = timeGen(); if (timestamp < lastTimestamp) { throw new RuntimeException(String.format(“Clock moved backwards. Refusing to generate id for %d milliseconds”, lastTimestamp - timestamp));
  } if (lastTimestamp == timestamp) { // 当前毫秒内，则+1
  sequence = (sequence + 1) & sequenceMask; if (sequence == 0) { // 当前毫秒内计数满了，则等待下一秒
  timestamp = tilNextMillis(lastTimestamp);
  }
  } else {
  sequence = 0L;
  }
  lastTimestamp = timestamp; // ID偏移组合生成最终的ID，并返回ID
  long nextId = ((timestamp - twepoch) << timestampLeftShift)
  | (datacenterId << datacenterIdShift)
  | (workerId << workerIdShift) | sequence; return nextId;
  } private long tilNextMillis(final long lastTimestamp) { long timestamp = this.timeGen(); while (timestamp <= lastTimestamp) {
  timestamp = this.timeGen();
  } return timestamp;
  } private long timeGen() { return System.currentTimeMillis();
  } /*
- 获取 maxWorkerId
/
protected static long getMaxWorkerId(long datacenterId, long maxWorkerId) {
StringBuffer mpid = new StringBuffer();
mpid.append(datacenterId);
String name = ManagementFactory.getRuntimeMXBean().getName(); if (!name.isEmpty()) { /
* GET jvmPid
/
mpid.append(name.split("@")[0]);
} /
* MAC + PID 的 hashcode 获取16个低位
/
return (mpid.toString().hashCode() & 0xffff) % (maxWorkerId + 1);
} /*
- 数据标识id部分
*/
protected static long getDatacenterId(long maxDatacenterId) { long id = 0L; try {
InetAddress ip = InetAddress.getLocalHost();
NetworkInterface network = NetworkInterface.getByInetAddress(ip); if (network == null) {
id = 1L;
} else { byte[] mac = network.getHardwareAddress();
id = ((0x000000FF & (long) mac[mac.length - 1])
| (0x0000FF00 & (((long) mac[mac.length - 2]) << 8))) >> 6;
id = id % (maxDatacenterId + 1);
}
} catch (Exception e) {
System.out.println(" getDatacenterId: " + e.getMessage());
} return id;
}
}
结尾
上面介绍的是通过获取机器的地址然后进行特殊的处理获取的机器id进行初始化雪花发号器的，这个重复的概率已经极低，满足大大部分的业务场景，如果仍然追求设计完美的话，只能依赖redis自增生成机器id或者数据库生成机器id的方式。上面就是我关于分布式id雪花相关的总结，欢迎转载、留言、打赏。

码农公寓

相关文章