purescript-protobuf

Purescript library and code generator for Google Protocol Buffers version 3.

This library operates on ArrayBuffer, so it will run both in Node.js and in browser environments.

Code Generation

The shell.nix environment provides

The Purescript toolchain
The protoc compiler
The protoc-gen-purescript executable plugin for protoc on the PATH so that protoc can find it.

$ nix-shell

Purescript Protobuf development environment.
To build purescript-protobuf, run:

    npm install
    spago build

To test purescript-protobuf, run:

    protoc --purescript_out=./test/generated test/*.proto
    spago -x test.dhall build
    spago -x test.dhall test

To generate Purescript .purs files from .proto files, run:

    protoc --purescript_out=path_to_output file.proto

[nix-shell]$

Writing programs with the generated code

None of the modules in this package should be imported directly in our program.

Rather, we'll import the message modules from the generated .purs files, as well as modules for reading and writing ArrayBuffers.

For example, a message in a .proto file declared as

message MyMessage {
  sint32 my_field = 1;
}

will export these four names in the generated .purs modules.

A message record type

type MyMessageR = { my_field :: Maybe Int }

A message data type

newtype MyMessage = MyMessage MyMessageR

A message encoder which works with purescript-arraybuffer-builder

putMyMessage :: forall m. MonadEffect m => MyMessage -> PutM m Unit

A message decoder which works with purescript-parsing-dataview
- ```
parseMyMessage :: forall m. MonadEffect m => Int -> ParserT DataView m MyMessage
```
The message decoder needs an argument which tells it the length of the message which it’s about to decode, because “the Protocol Buffer wire format is not self-delimiting.”

Then, in our program, our imports will look something like this.

import Generated.Module (MyMessage(..), putMyMessage, parseMyMessage)
import Text.Parsing.Parser (runParserT)
import Data.ArrayBuffer.Builder (execPutM)

The generated code modules will import modules from this package.

The generated code depends on packages

  , "protobuf"
  , "arraybuffer"
  , "arraybuffer-types"
  , "arraybuffer-builder"
  , "parsing"
  , "parsing-dataview"
  , "uint"
  , "long"
  , "text-encoding"

which are in package-sets, except for purescript-longs (see spago.dhall in this package for the particulars).

It also depends on the Javascript package long.

Generated message instances

We cannot easily derive common instances like Eq for the generated message types because

The types might be recursive.
The types might contain fields of type ArrayBuffer, which doesn't have those instances.

All of the generated message types have an instance of Generic. This allows us to sometimes use genericEq and genericShow on a generated message, if the generated message has those instances for all of its fields.

All of the generated message types have an instance of NewType.

Examples

The purescript-protobuf repository contains three executable Node.js programs which use code generated by purescript-protobuf. Refer to these for further examples of how to use the generated code.

The protoc compiler plugin. The code generator imports generated code. Trippy, right? This program literally writes itself.
The unit test suite
The Google conformance test program

Interpreting invalid encoding parse failures

When the decode parser encounters an invalid encoding in the protobuf input stream then it will fail to parse.

When Text.Parsing.Parser.ParserT fails it will return a ParseError String (Position {line::Int,column::Int}).

The byte offset at which the parse failure occured is given by the formula column - 1.

The path to the protobuf definition which failed to parse will be included in the ParseError String and delimited by '/', something like "Message1 / string_field_1 / Invalid UTF8 encoding.".

Features

We aim to support binary-encoded (not JSON-encoded) proto3. Many proto2-syntax descriptor files will also work, as long as they don't use proto2 features.

We don't support extensions.

The generated optional record fields will use Nothing instead of the default values.

We do not preserve unknown fields.

We do not support services.

Conformance

At the time of this writing, we pass 193 out of 194 of the Google conformance tests for binary-encoded proto3. The one test we fail is the Required.Proto3.ProtobufInput.UnknownVarint.ProtobufOutput test, which is the test for preserving unknown fields, which we do not support, see above.

See the conformance/README.md in this repository for details.

Imports

The code generator will use the package statement in the .proto file and the base file name as the Purescript module name for that file.

The Protobuf import statement allows Protobuf messages to have fields consisting of Protobuf messages imported from another file, and qualified by the package name in that file. In order to generate the correct Purescript module name qualifier on the types of imported message fields, the code generator must be able to lookup the package name statement in the imported file.

For that reason, we can only use top-level (not nested) message and enum types from an import.

The generated Purescript code will usually have module imports which cause the purs compiler to emit warnings. Sorry.

Performance

The implementation is simple and straightforward. We haven't done any special optimizations. For example, when encoding a protobuf varint, we allocate a list of new one-byte ArrayBufferss and then copy them all into position in the final ArrayBuffer. For another example, when decoding a packed field of numbers, we build a list of the numbers, and then copy them all into the final Array. Also, this whole library is very stack-unsafe. This may all be improved in later versions.

Contributing

Pull requests welcome.