protocol buffer 知识整理

2024-02-04 19:21:46

protocol buffer 知识整理

定义消息：

最简单的例子：

1  // 下面是a.proto文件的内容
2  syntax = "proto3";     //必须指明proto3，否则会被认为是proto2
3   
4  message SearchRequest  {     /*这里是消息名*/ 
5     string query = 1;    // 类型  字段名  =  标号;   注意：1-15是高效标号，19000-19999为协议保留
6     int32 page_number = 2;
7     int32 result_per_page = 3;
8  }

字段修饰符： singular 或repeated ，singular是默认修饰符，可以省略。

保留标识符：
为了避免升级导致的标号重用、字段重用、json序列化等问题，可以使用reserved来保留标号和字段名

1 message Foo {
2   reserved 2, 15, 9 to 11;      //保留标号
3   reserved "foo", "bar";        //保留字段名
4 }

标量数值类型：

一个标量消息字段可以含有一个如下的类型——该表格展示了定义于.proto文件中的类型，以及与之对应的、在自动生成的访问类中定义的类型：

.proto Type	Notes	C++ Type	Java Type	Python Type[2]	Go Type	Ruby Type	C# Type	PHP Type
double		double	double	float	float64	Float	double	float
float		float	float	float	float32	Float	float	float
int32	使用变长编码，对于负值的效率很低，如果你的域有可能有负值，请使用sint32替代	int32	int	int	int32	Fixnum 或者 Bignum（根据需要）	int	integer
uint32	使用变长编码	uint32	int	int/long	uint32	Fixnum 或者 Bignum（根据需要）	uint	integer
uint64	使用变长编码	uint64	long	int/long	uint64	Bignum	ulong	integer/string
sint32	使用变长编码，这些编码在负值时比int32高效的多	int32	int	int	int32	Fixnum 或者 Bignum（根据需要）	int	integer
sint64	使用变长编码，有符号的整型值。编码时比通常的int64高效。	int64	long	int/long	int64	Bignum	long	integer/string
fixed32	总是4个字节，如果数值总是比总是比228大的话，这个类型会比uint32高效。	uint32	int	int	uint32	Fixnum 或者 Bignum（根据需要）	uint	integer
fixed64	总是8个字节，如果数值总是比总是比256大的话，这个类型会比uint64高效。	uint64	long	int/long	uint64	Bignum	ulong	integer/string
sfixed32	总是4个字节	int32	int	int	int32	Fixnum 或者 Bignum（根据需要）	int	integer
sfixed64	总是8个字节	int64	long	int/long	int64	Bignum	long	integer/string
bool		bool	boolean	bool	bool	TrueClass/FalseClass	bool	boolean
string	一个字符串必须是UTF-8编码或者7-bit ASCII编码的文本。	string	String	str/unicode	string	String (UTF-8)	string	string
bytes	可能包含任意顺序的字节数据。	string	ByteString	str	[]byte	String (ASCII-8BIT)	ByteString	string

默认值： 与go相似，是对应类型的“零值”

枚举的例子：

 1 message SearchRequest {
 2   string query = 1;
 3   int32 page_number = 2;
 4   int32 result_per_page = 3;
 5   enum Corpus {
 6     UNIVERSAL = 0;
 7     WEB = 1;
 8     IMAGES = 2;
 9     LOCAL = 3;
10     NEWS = 4;
11     PRODUCTS = 5;
12     VIDEO = 6;
13   }
14   Corpus corpus = 4;  //Corpus枚举类型，字段名是corpus， 标号为4(不是枚举中的值，是标号)
15 }

 1   //是否允许枚举的值重复
 2   enum EnumAllowingAlias {
 3    option allow_alias = true; //开启这个选项才能允许重复
 4    UNKNOWN = 0;
 5    STARTED = 1;
 6    RUNNING = 1;  //与上面的STARTED重复
 7  }
 8 
 9  enum EnumNotAllowingAlias {
10    UNKNOWN = 0;
11   STARTED = 1;
12    // RUNNING = 1;  // Uncommenting this line will cause a compile error inside Google and a warning message outside.
13  }

因为enum值是使用可变编码方式的，对负数不够高效，因此不推荐在enum中使用负数。

在消息中使用其他消息作为类型：

1 message SearchResponse {
2   repeated Result results = 1;  //使用了下面的Result作为类型
3 }
4  
5 message Result {
6   string url = 1;
7   string title = 2;
8   repeated string snippets = 3;
9 }

在消息A内定义消息B，并且在消息A内重复使用消息B。或者在外部消息C中重用消息B的定义：

 1  message SearchResponse {
 2    message Result {     //在消息SearchResponse定义消息Result
 3      string url = 1;
 4      string title = 2;
 5      repeated string snippets = 3;
 6    }
 7    repeated Result results = 1;    //在消息SearchResponse内重用上面的Result定义作为类型
 8  }
 9 
10 //外部重用
11 message SomeOtherMessage {
12   SearchResponse.Result result = 1;  //需要指明父级
13 }

导入(重用其他.proto文件的定义)：

import "myproject/other_protos.proto";

// 另外还有一种import public的东西，看图说话：
// new.proto
// All definitions are moved here
 
// old.proto
// This is the proto that all clients are importing.
import public "new.proto"; //伪文件,导入old.proto的时候，会导入new.proto,依赖也会传递下去。用途类似unix的符号链接
import "other.proto";
 
// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, *** but not other.proto ***

升级建议：

不要更改任何已有的字段的数值标识。
如果你增加新的字段，使用旧格式的字段仍然可以被你新产生的代码所解析。你应该记住这些元素的默认值这样你的新代码就可以以适当的方式和旧代码产生的数据交互。相似的，通过新代码产生的消息也可以被旧代码解析：只不过新的字段会被忽视掉。注意，未被识别的字段会在反序列化的过程中丢弃掉，所以如果消息再被传递给新的代码，新的字段依然是不可用的（这和proto2中的行为是不同的，在proto2中未定义的域依然会随着消息被序列化）
非required的字段可以移除——只要它们的标识号在新的消息类型中不再使用（更好的做法可能是重命名那个字段，例如在字段前添加“OBSOLETE_”前缀，那样的话，使用的.proto文件的用户将来就不会无意中重新使用了那些不该使用的标识号）。
int32, uint32, int64, uint64,和bool是全部兼容的，这意味着可以将这些类型中的一个转换为另外一个，而不会破坏向前、向后的兼容性。如果解析出来的数字与对应的类型不相符，那么结果就像在C++中对它进行了强制类型转换一样（例如，如果把一个64位数字当作int32来读取，那么它就会被截断为32位的数字）。
sint32和sint64是互相兼容的，但是它们与其他整数类型不兼容。
string和bytes是兼容的——只要bytes是有效的UTF-8编码。
嵌套消息与bytes是兼容的——只要bytes包含该消息的一个编码过的版本。
fixed32与sfixed32是兼容的，fixed64与sfixed64是兼容的。
枚举类型与int32，uint32，int64和uint64相兼容（注意如果值不相兼容则会被截断），然而在客户端反序列化之后他们可能会有不同的处理方式，例如，未识别的proto3枚举类型会被保留在消息中，但是他的表示方式会依照语言而定。int类型的字段总会保留他们的

any/oneof 类型参考官方文档：

map：
语法：

map < key_type ，value_type > map_field = N ; //任何整数或字符串类型（因此，除了浮点类型之外的任何标量类型bytes）map的值不可以是repeated的。

map<string, Project> projects = 3;  //Project是消息，map的key是string类型

包：

当然可以为.proto文件新增一个可选的package声明符，用来防止不同的消息类型有命名冲突。如：

package foo.bar;
message Open { ... }

//在另一个包中：
message Foo {
... 
required foo.bar.Open open = 1;
...
}

包标志对生成代码的影响依赖所选的语言(其他语言请参考官方文档): 在Go中, 包被作为Go package使用, 除非你在.proto文件中显式提供go_package选项.

定义服务：

例子：
service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse); //入参是搜索请求，返回搜索响应
}

json选项：

1- 发送默认值（默认情况下，默认值是不发送的）

2- 忽略位置字段

3- 使用proto字段名，而非驼峰

4- 枚举字段用整数来发送，而非字符串

选项：

选项的作用访问来分：文件级别(文件顶部写)、消息级别、字段级别等。

optimize_for（文件选项）：可以设置为SPEED，CODE_SIZE或LITE_RUNTIME

编译：

protoc --proto_path = IMPORT_PATH --java_out = DST_DIR--go_out = DST_DIRpath / to / file .proto  //--proto_path可以多次指定，来解决多个导入路径的问题。



风格：
消息： 消息名使用驼峰法：SongServerRequest字段名使用下划线分隔：song_name.
枚举：枚举类型名使用驼峰法(首字母大写), 值的名字使用大写加下划线分隔:

enum Foo {
  FIRST_VALUE = 1;
  SECOND_VALUE = 2;
}

服务：如果.proto文件定义RPC服务, 服务名和任何rpc方法应该用驼峰法(首字母大写):

service FooService {
  rpc GetSomething(FooRequest) returns (FooResponse);
}



参考来源： protobuf官方文档  和 https://blog.csdn.net/feeltouch/article/details/80302860

码农公寓

protocol buffer 知识整理

定义消息：

包：

相关文章