From 9cb10577fcefa3ed004e0bbdc61e6238e8137e3c Mon Sep 17 00:00:00 2001
From: Joshua Haberman <jhaberman@gmail.com>
Date: Tue, 4 Jul 2017 17:02:48 -0700
Subject: First version of a real C codegen for upb.

Also includes an implementation of the conformance tests
to display what the API usage will be like.

There is still a lot to do, and things that are broken (oneofs,
repeated fields, etc), but it's a good start.
---
 DESIGN.md | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 DESIGN.md

(limited to 'DESIGN.md')

diff --git a/DESIGN.md b/DESIGN.md
new file mode 100644
index 0000000..4e6dc04
--- /dev/null
+++ b/DESIGN.md
@@ -0,0 +1,83 @@
+
+μpb Design
+----------
+
+**NOTE:** the design described here is being implemented currently, but is not
+yet complete.  The repo is in heavy transition right now.
+
+μpb has the following design goals:
+
+- C89 compatible.
+- small code size (both for the core library and generated messages).
+- fast performance (hundreds of MB/s).
+- idiomatic for C programs.
+- easy to wrap in high-level languages (Python, Ruby, Lua, etc) with
+  good performance and all standard protobuf features.
+- hands-off about memory management, allowing for easy integration
+  with existing VMs and/or garbage collectors.
+- offers binary ABI compatibility between apps, generated messages, and
+  the core library (doesn't require re-generating messages or recompiling
+  your application when the core library changes).
+- provides all features that users expect from a protobuf library
+  (generated messages in C, reflection, text format, etc.).
+- layered, so the core is small and doesn't require descriptors.
+- tidy about symbol references, so that any messages or features that
+  aren't used by a C program can have their code GC'd by the linker.
+- possible to use protobuf binary format without leaking message/field
+  names into the binary.
+
+μpb accomplishes these goals by keeping a very small core that does not contain
+descriptors.  We need some way of knowing what fields are in each message and
+where they live, but instead of descriptors, we keep a small/lightweight summary
+of the .proto file.  We call this a `upb_msglayout`.  It contains the bare
+minimum of what we need to know to parse and serialize protobuf binary format
+into our internal representation for messages, `upb_msg`.
+
+The core then contains functions to parse/serialize a message, given a `upb_msg*`
+and a `const upb_msglayout*`.
+
+This approach is similar to [nanopb](https://github.com/nanopb/nanopb) which
+also compiles message definitions to a compact, internal representation without
+names.  However nanopb does not aim to be a fully-featured library, and has no
+support for text format, JSON, or descriptors.  μpb is unique in that it has a
+small core similar to nanopb (though not quite as small), but also offers a
+full-featured protobuf library for applications that want reflection, text
+format, JSON format, etc.
+
+Without descriptors, the core doesn't have access to field names, so it cannot
+parse/serialize to protobuf text format or JSON.  Instead this functionality
+lives in separate modules that depend on the module implementing descriptors.
+With the descriptor module we can parse/serialize binary descriptors and
+validate that they follow all the rules of protobuf schemas.
+
+To provide binary compatibility, we version the structs that generated messages
+use to create a `upb_msglayout*`.  The current initializers are
+`upb_msglayout_msginit_v1`, `upb_msglayout_fieldinit_v1`, etc.  Then
+`upb_msglayout*` uses these as its internal representation.  If upb changes its
+internal representation for a `upb_msglayout*`, it will also include code to
+convert the old representation to the new representation.  This will use some
+more memory/CPU at runtime to convert between the two, but apps that statically
+link μpb will never need to worry about this.
+
+TODO
+----
+
+The current state of the repo is quite different than what is described above.
+Here are the major items that need to be implemented.
+
+1. implement the core generic protobuf binary encoder/decoder that uses a
+   `upb_msglayout*`.
+2. remove all mention of handlers, sink, etc. from core into their own module.
+   All of the handlers stuff needs substantial revision, but moving it out of
+   core is the first priority.
+3. move all of the def/refcounted stuff out of core.  The defs also need
+   substantial revision, but moving them out of core is the first priority.
+4. revise our generated code until it is in a state where we feel comfortable
+   committing to API/ABI stability for it.  This may involve moving different
+   parts of the generated code into separate files, like keeping the serialized
+   descriptor in a separate file from the compact msglayout.
+5. revise all of the existing encoders/decoders and handlers.  We probably
+   will want to keep handlers, since they let us decouple encoders/decoders
+   from `upb_msg`, but we need to simplify all of that a LOT.  Likely we will
+   want to make handlers only per-message instead of per-field, except for
+   variable-length fields.
-- 
cgit v1.2.3