C++开发protobuf动态解析工具

Pandora ·

更新时间:2024-05-17

· 1666 次阅读

为什么需要这个工具

需求描述

搜索现成方案

AST在哪里

开始写代码

第一步

第2步

第3步

第4步

总结

为什么需要这个工具

数据库中存储的protobuf序列化的内容，有时候查问题想直接解析查看内容。很多编码在网上很容易找到编解码工具，但protobuf没有找到编解码工具，可能这样的需求比较少吧，那就自己用C++实现一个。

需求描述

我们知道，要解析protobuf，需要有proto定义，所以我们的输入参数需要包含序列化的数据以及proto定义，如果proto中包含多个message，还需要指定解析到哪个message。所以一共是三个输入参数。

此外，为了方便使用，我们的工具不要求给出完整的proto定义，如果有嵌套的message没有定义，不应影响其他字段解析。

搜索现成方案

网上搜索了一圈，找到的类似方案大多需要导入完整的proto文件：

int DynamicParseFromPBFile(const std::string& file, const std::string& classname, 
      const std::string& pb_str) {
  // ...
  // 导入proto文件
  ::google::protobuf::compiler::Importer importer(&sourceTree, NULL);
  importer.Import(file);
  // 找到要解析的message
  auto descriptor = importer.pool()->FindMessageTypeByName(classname);
  ::google::protobuf::DynamicMessageFactory factory;
  auto message = factory.GetPrototype(descriptor);
  // 动态创建message对象
  auto msg = message->New();
  msg->ParseFromString(pb_str);
  // msg即为解析到的结构
}

这样可以实现动态解析，但仍不满足我们的需求——即使proto不完整，也希望能解析。

举个例子：

message MyMsg {
  optional uint64 id = 1;
  optional OtherMsg other = 2;
}

MyMsg中包含OtherMsg类型，但并没有给出OtherMsg的定义，所以无法正常解析。

AST在哪里

事实上，在解析proto文件时，肯定需要先将其解析为抽象语法树(AST)，在AST中，我们可以很容易修改proto的定义，例如将other字段删掉，或者将其类型改为bytes，这样就可以正常解析了。

那么，proto文件解析成的AST结构在哪里呢？只能从源码中寻找答案了。

一番查找后，终于看到了FindFileByName方法的这段代码：

bool SourceTreeDescriptorDatabase::FindFileByName(const std::string& filename,
                                                  FileDescriptorProto* output) {
  // ...
  io::Tokenizer tokenizer(input.get(), &file_error_collector);
  Parser parser;
  // Parse it.
  output->set_name(filename);
  return parser.Parse(&tokenizer, output) && !file_error_collector.had_errors();
}

从这段代码中可以看到，FileDescriptorProto就是我们要找的AST结构。那么这到底是个什么结构呢？

其实，FileDescriptorProto本身也是一个proto定义的message：

message FileDescriptorProto {
  optional string name = 1;     // file name, relative to root of source tree
  optional string package = 2;  // e.g. "foo", "foo.bar", etc.
  // All top-level definitions in this file.
  repeated DescriptorProto message_type = 4;
  repeated EnumDescriptorProto enum_type = 5;
  repeated ServiceDescriptorProto service = 6;
  repeated FieldDescriptorProto extension = 7;
  // ...
}

从它的字段中可以看到，其代表的是整个proto文件，包括文件中的所有message、enum等定义。

开始写代码 第一步

仿照上面的源码，将输入的proto定义解析为FileDescriptorProto对象：

// proto输入
istringstream ss(proto);
istream* is = &ss;
io::IstreamInputStream input(is);
// 解析到FileDescriptorProto AST
io::Tokenizer tokenizer(&input, nullptr);
FileDescriptorProto output;
compiler::Parser parser;
if (!parser.Parse(&tokenizer, &output)) {
  err_msg = "parse proto failed";
  return -1;
}
output.set_name("proto");
output.clear_source_code_info();
printf("MSG: proto parsed output: %s\n", output.DebugString().c_str());

第2步

处理FileDescriptorProto对象，将没有给定义的字段类型都改成bytes，保证proto可以正常解析：

int ConvertUnknownType2Bytes(FileDescriptorProto& file_descriptor_proto) {
  // 找出所有给出定义的message类型名
  set<string> typename_set;
  for (auto const& msgtype : file_descriptor_proto.message_type()) {
    typename_set.insert(msgtype.name());
    // message内嵌套定义的message也要包含在内
    for (auto const& subtype : msgtype.nested_type()) {
      typename_set.insert(subtype.name());
    }
  }
  // 遍历所有field，检查其类型是否存在定义
  for (auto& msgtype : *file_descriptor_proto.mutable_message_type()) {
    for (auto& field : *msgtype.mutable_field()) {
      auto type_name = field.type_name();
      // 基本类型的type_name是空的
      if (!type_name.empty()) {
        // 如果typename_set中找不到该类型名，则转为bytes类型
        if (typename_set.find(type_name) == typename_set.end()) {
          field.clear_type_name();
          field.set_type(FieldDescriptorProto_Type_TYPE_BYTES);
        }
      }
    }
  }
  return 0;
}

第3步

解析修改后的FileDescriptorProto对象，创建指定message类型对象。

// 解析proto并检查错误
SimpleDescriptorDatabase db;
db.Add(output);
DescriptorPool pool(&db);
auto descriptor = pool.FindMessageTypeByName(msg_type_name);
if (descriptor == nullptr) {
  // proto结构有错
  err_msg = "parse proto failed. FindMessageTypeByName result is null";
  return -1;
}
DynamicMessageFactory factory;
auto message = factory.GetPrototype(descriptor);
unique_ptr<Message> msg(message->New());

第4步

将序列化的数据解析到msg中：

msg->ParseFromString(serilized_pb);
cout << "proto msg: " << msg->ShortDebugString().c_str() << endl;

这样，我们就成功实现了动态解析，也成功将不可读的二进制数据serilized_pb以可读的形式打印出来了。

总结

我们为了实现动态解析不完整的proto，我们首先从源码中找到了将proto定义转化为AST——也就是FileDescriptorProto——的方法。

接着，我们将AST对象进行修改，将不合法的proto改成合法的。

最后，我们再利用修改后的FileDescriptorProto构造出需要的message对象，解析序列化的数据。

以上就是C++开发protobuf动态解析工具的详细内容，更多关于C++ protobuf动态解析工具的资料请关注软件开发网其它相关文章！

c+ 工具 protobuf C++

1024 个赞

需要登录后方可回复, 如果你还没有账号请注册新账号

XSL-FO 区域

Uma 2020-10-21

610

详解.NET Core中的数据保护组件

Ginger 2021-06-18

831

CentOS上运行ZKEACMS的详细过程

Obelia 2020-05-21

791

python3爬虫之入门基础和正则表达式

Dreama 2020-06-22

633

C++面经之什么是RAII面试问题解析

Laila 2023-07-21

548

C++使用expected实现优雅的错误处理

Tia 2023-07-21

1849

c与c++之间的相互调用及函数区别示例详解

Chipo 2023-07-21

513

基于WPF实现简单的文件夹比较工具

Kamiisa 2023-07-22

996

C++存储持续性生命周期原理解析

Rhea 2023-07-28

1265

C++存储链接性原理详解

Kathy 2023-07-28

743

C++ 类模板与成员函数模板示例解析

Nora 2023-07-28

276

C++开发protobuf动态解析工具

Pandora 2023-07-28

1666

利用C++开发一个protobuf动态解析工具

Tani 2023-07-28

1384

我要提问

致谢

帮助他人，成就自己。

人生最大成功就是伸出热情而温暖的双手，尽自己所能去帮助身边的每一个人，只要无私的奉献，就会收获到美好的生活。

1024问感谢每一位朋友的帮助和支持。

软件开发网提供编程的基础软件技术培训教程,软件开发编程实例讲解Go,Node,HTML,CSS,Javascript,Python,Java,Ruby,C,PHP,MySQL等软件开发编程语言以及数据开发的基础知识，也提供大量的软件开发在线实例、从入门到精通就在1024问。

育儿网微养生全球行美食街育儿菜谱大全海南旅游女性养狗百科星座