Create Protobuf Message Without Compiled Generated Code

Protocol Buffers (protobuf) are a language-neutral, platform-neutral extensible mechanism for serializing structured data. It’s the primary message format for gRPC, usually serialized and deserialized through native generated code. While using generated code is favourable for production software due to its fast and easy message parsing, there’s also a time where we might not unable to use generated code. For example, development tools like Postman. In this post, we will explore the possibility of parsing protobuf message without generated code.

Protobuf message is binary serialized, hence it’s almost impossible to create the message by hand, unlike json. Fortunately, protoc (protobuf compiler) provide helper to encode and decode message from and to binary format from its text format.

# encode payload from STDIN and put the result in STDOUT
$ protoc --encode=<package.messageName> <protoFile>

# decode payload from STDIN and put the result in STDOUT
$ protoc --decode=<package.messageName> <protoFile>

// helloworld.proto
syntax = "proto3";

package helloworld;

...

message HelloRequest {
  string name = 1;
}

...

name: "This is example payload"

$ protoc < text_payload --encode=helloworld.HelloRequest helloworld.proto > encoded_payload

$ protoc < encoded_payload --decode=helloworld.HelloRequest helloworld.proto > text_payload

Knowing this, we can utilize this to encode/decode protobuf message programatically without generated code.

Implementation

On my previous post, we created a simple application to send gRPC request using libcurl. Let’s improve it so the project doesn’t need the generated code to encode proto message.

char command[255];
sprintf(command, "protoc --encode=helloworld.HelloRequest %s > encoded_payload", argv[1]); // the protofile is passed through the first argument
FILE *encode_pipe = popen(command, "w");
fputs(argv[2], encode_pipe); // the text format request payload is passed through the second argument

First, instead of relying to generated code to encode the message, we use popen create a pipe that execute the protoc command.

struct stat st;
stat("encoded_payload", &st);
size_t len = st.st_size;
uint8_t *buf;
buf = malloc(PREFIX_LENGTH + len);
buf[0] = 0;
... // code to generate prefix payload
FILE *encoded_payload =fopen("encoded_payload", "rb");
fread(buf+PREFIX_LENGTH, len,1, encoded_payload);

curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
...
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, buf);
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, len + PREFIX_LENGTH);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, argv[1]);

size_t handle_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
  char command[255];
  sprintf(command, "protoc --decode=helloworld.HelloReply %s > response_decoded", (char *) userdata);
  FILE *decode_pipe = popen(command, "w");
  fputs(ptr+PREFIX_LENGTH, decode_pipe);
  pclose(decode_pipe);

  return size * nmemb;
}

Conversely, we can use the same trick to decode the response. Now, we can test whether our application works.

$ ./curl_grpc helloworld.proto "name: \"inline payload\""
$ cat response_decoded
# msg: "Hello inline payload"

It works! With a simple change, we can remove dependency to generated code. Although, there are few concerns regarding this approach.

Fortunately, because the protoc support the on the fly encoding and its library is also open source. We can take a peek at the code and use it directly.

(Kinda) Proper Implementation

First, we need to modify our meson.build system to change the project language to c++

project('curl_grpc', 'cpp',
  version : '0.1',
  default_options : ['warning_level=3'])

cc = meson.get_compiler('cpp')
...
sources = ['curl_grpc.cc']

protobuf = dependency('protobuf',
    version : '>= 2.6.0',
    native : true)

libprotoc = cc.find_library('protoc',
    dirs : protobuf.get_pkgconfig_variable('libdir'))

deps = [
    dependency('libcurl', version: '>=7.50.0'),
    m_dep,
    protobuf,
    libprotoc,
    ]

#include <google/protobuf/util/json_util.h>
  ...
  std::string payload;
  google::protobuf::util::Status status = google::protobuf::util::JsonToBinaryString(type_resolver, "type.googleapis.com/helloworld.HelloRequest", argv[2], &payload);
  if (!status.ok()) {
    std::cout << "fail to parse json to proto message\n";
    return 1;
  }

  size_t len = payload.length();
  std::vector<uint8_t> buf = {0, 0, 0, 0, 0};
  buf.insert(buf.end(), payload.begin(), payload.end());
  ...

For this, we utilize json_util.h a library header that provide functions to parse json to protobuf binary message and vice-versa. Exactly what we need.

But first, we need to create the type resolver so the json parser can understand our data better.

#include <google/protobuf/util/type_resolver.h>
#include <google/protobuf/util/type_resolver_util.h>

  ...
  google::protobuf::util::TypeResolver *type_resolver = google::protobuf::util::NewTypeResolverForDescriptorPool("type.googleapis.com", fd->pool());
  ...

#include <google/protobuf/compiler/importer.h>

  ...
  google::protobuf::compiler::MultiFileErrorCollector *error_collector = new SimpleErrorCollector();
  google::protobuf::compiler::DiskSourceTree *source_tree = new google::protobuf::compiler::DiskSourceTree();
  source_tree->MapPath("", ".");

  google::protobuf::compiler::Importer *importer = new google::protobuf::compiler::Importer(source_tree, error_collector);
  const google::protobuf::FileDescriptor *fd = importer->Import(argv[1]);
  ...

Basically, the descriptor pool is a pool of descriptor data derived from a proto file. We use the importer for this, so the library will resolve any import needed for given proto file.

Also, we need to create an implementation for the error collector. For now, a simple printing should be sufficient.

class SimpleErrorCollector : public google::protobuf::compiler::MultiFileErrorCollector {
public:
  void AddError(const std::string & filename, int line, int column, const std::string & message) override {
    std::cout << "error processing " << filename << " on line: " << line << " column: " << column << " message: " << message << "\n";
  }

  void AddWarning(const std::string & filename, int line, int column, const std::string & message) override {
    std::cout << "warning processing " << filename << " on line: " << line << " column: " << column << " message: " << message << "\n";
  }
};

size_t handle_callback(char *ptr, size_t size, size_t nmemb, void *userdata) {
  std::string payload_json;
  google::protobuf::util::TypeResolver *type_resolver = google::protobuf::util::NewTypeResolverForDescriptorPool("type.googleapis.com", (google::protobuf::DescriptorPool *) userdata);
  google::protobuf::util::Status status_second = google::protobuf::util::BinaryToJsonString(type_resolver, "type.googleapis.com/helloworld.HelloReply", ptr+PREFIX_LENGTH, &payload_json);

  if (!status_second.ok())
    std::cout << "fail to parse message to json\n";
  else
    std::cout << "response: " << payload_json << "\n";

  return size * nmemb;
}

Notice that we’re not recreating the descriptor pool. This is due to the pool is passed to the callback.

  ...
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, fd->pool());
  ...

This way, we don’t need to rebuild the descriptor pool over for a same protofile.

$ ./curl_grpc helloworld.proto "{\"name\": \"hello json\"}"
# response: {"msg":"Hello hello json"}

The request and response is now in string json format. And we don’t need to fork a new shell process each time we encode / decode protobuf message. Neat!

Closing Thought

This (and the previous) concepts are born from my idea to create a lightweight postman-like grpc tools. While this proof of concept works, there are things that I need to clear before I can put it on a production application.

Due to my unfamiliarity with cpp, I think it’s better to contain the cpp code in a shared library and expose C functions from them. Even then, I still need to learn much more about cpp so I can make more optimized library than this slop. Also, I need to iron out details on the application, such as supporting stream message and more complicated proto import.

The production grade application might be still far in the future. But this is a great stepping stone, a solution that I’ve been looking for some times. So then, until next time!