Deriving the Documentation of JSON Protocols from their Implementation

Julien Richard-Foy
Daniel Krzywicki

Zengularity

http://julienrf.github.io/lambda-days-2016

February 19, 2016

Motivation

Context

Command example: BuyArticle

Let's consider the following example, a command to buy articles:

case class BuyArticle(
  articleId: String,
  quantity: Int
)

Command JSON codec

To send the command from one service to another, we need a codec to marshall the Scala value in and out of JSON.

def encodeBuyArticle(buyArticle: BuyArticle): JsonObject =
  JsonObject(
    "articleId" -> JsonString(buyArticle.articleId),
    "quantity" -> JsonNumber(buyArticle.quantity)
  )

Command JSON codec

On the decoding side, we will probably need to enforce some validation rules (e.g. a positive quantity)

def decodeBuyArticle(json: Json): Either[DecodingError, BuyArticle] =
  json match {
    case JsonObject(fields) =>
      (fields.get("articleId"), fields.get("quantity")) match {
        case (Some(JsonString(s)), Some(JsonNumber(n))) if n >= 1 =>
          Right(BuyArticle(s, n))
        case _ => Left(DecodingError("Bad fields"))
      }
    case _ => Left(DecodingError("Not an object"))
  }

Serialized BuyArticle

Using that codec, we'll have the following serialized representation of a BuyArticle:

{
  "articleId": "123",
  "quantity": 10
}

What about documentation?

"Big data is like teenage sex:

everyone talks about it,

nobody really knows how to do it,

everyone thinks everyone else is doing it,

so everyone claims they are doing it..."

Dan Ariely

The same could be said about REST APIs...

BuyArticle schema documentation

Apart from the human-readable HTML documentation, a REST API should expose a machine-readable representation of our schema, e.g.:

{
  "articleId": {
    "type": "string"
  },
  "quantity": {
    "type": "number",
    "verifying": {
      "minValue": 1
    }
  }
}

Code And Documentation Consistency

At any time, we want to have consistency between:

Some start with the documentation (specification) of an endpoint and implement it (with more or less bugs).

Some start with the implementation and write the documentation later (or never).

In both cases, any change to the former must be ported to the latter, which is usually manual and error prone.

Our Approach

Agenda

Codec[In, Out]

There are many Scala JSON libraries out there. For simplicity, we will consider the following example API:

trait Codec[In, Out] {
  def decode(in: In): Either[DecodingError, Out]
  def encode(out: Out): In
}

JSON ⟺ Int Codec

val int: Codec[Json, Int] =
  new Codec[Json, Int] {

    def decode(json: Json) =
      json match {
        case JsonNumber(n) =>
          n.toInt.left.map(_ => DecodingError("Too large"))
        case _ => Left(DecodingError("Not a number"))
      }

    def encode(n: Int) =
      JsonNumber(BigDecimal(n))

  }

Some Other Basic Codecs

val string: Codec[Json, String] = …
def field(name: String): Codec[JsonObject, Json] = …
def minValue(n: Int): Codec[Int, Int] = …

Codec Combinators

def andThen[A, B, C](ab: Codec[A, B], bc: Codec[B, C]): Codec[A, C] =
  new Codec[A, C] {
    def decode(a: A) = ab.decode(a).right.flatMap(bc.decode)
    def encode(c: C) = ab.encode(bc.encode(c))
  }
def imap[A, B, C](ab: Codec[A, B], f: B => C, g: C => B): Codec[A, C] =
  new Codec[A, C] {
    def decode(a: A) = ab.decode(a).right.map(f)
    def encode(c: C) = ab.encode(g(c))
  }

Codec Combinators (2)

val articleId: Codec[Json, String] =
  andThen(
    field("articleId"),
    string
  )
val quantity: Codec[Json, Int] =
  andThen(
    field("quantity"),
    andThen(
      int,
      minValue(1)
    )
)

Object Codecs

trait ObjCodec[Out] extends Codec[Json, Out] {
  def encode(out: Out): JsonObject
}
def field[A](name: String, codec: Codec[Json, A]): ObjCodec[A] = …

Object Codec Combinator

We can zip two object codecs:

def zip[A, B](a: ObjCodec[A], b: ObjCodec[B]): ObjCodec[(A, B)] =
  new ObjCodec[(A, B)] {
    def decode(json: Json) =
      (a.decode(json), b.decode(json)) match {
        case (Right(aValue), Right(bValue)) => Right((aValue, bValue))
        case (Left(aError),  Right(_))      => Left(aError)
        case (Right(_),      Left(bError))  => Left(bError)
        case (Left(aError),  Left(bError)   => Left(aError ++ bError)
      }
    def encode(abValue: (A, B)) = {
      val (aValue, bValue) = abValue
      JsonObject(
        a.encode(aValue).fields ++ b.encode(bValue).fields
      )
    }
  }

Object Codec Combinator (2)

Finally, we can write a full codec for our model using the different combinators:

val buyArticle: Codec[Json, BuyArticle] =
  imap(
    zip(
      field("articleId", string),
      field("quantity", andThen(int, minValue(1)))
    ),
    BuyArticle.apply, BuyArticle.unapply
  )

Codecs Summary

Documentation

Let's consider the following example of documentation ADT:

sealed trait Doc

case class Scalar(typeName: String, description: Option[String]) extends Doc

case class Object(fields: Field*) extends Doc
case class Field(name: String, doc: Doc, description: Option[String])

case class MinValue(n: Int) extends Doc

case class Satisfying(prerequisite: Doc, subsequent: Doc) extends Doc

Documentation (2)

We can define some basic building blocks:

val int = Scalar("number")
val string = Scalar("string")
val date = Scalar("string", Some("A date with format YYYY-MM-DD"))

Documentation (3)

And then combine them to express a model:

val buyArticle: Doc =
  Object(
    Field("articleId", string, Some("Article ID")),
    Field("quantity", Satisfying(int, MinValue(1)), Some("Quantity"))
  )

Ultimate Goal: Unify Codec and Doc

How to keep the following two in sync?

val buyArticle: Codec[Json, BuyArticle] =
  imap(
    zip(
      field("articleId", string),
      field("quantity", andThen(int, minValue(1)))
    ),
    BuyArticle.apply, BuyArticle.unapply
  )
val buyArticle: Doc =
  Object(
    Field("articleId", string, Some("Article ID")),
    Field("quantity", Satisfying(int, MinValue(1)), Some("Quantity"))
  )

Protocol

What if we could define them at the same time, using a common model?

trait Protocol[In, Out] {
  def codec: Codec[In, Out]
  def doc: Doc
}

What do we want to express with a Protocol?

Protocol[A, ?] is an invariant functor

def imap[A, B, C](ab: Protocol[A, B], f: B => C, g: C => B): Protocol[A, C] =
  new Protocol[A, C] {
    val codec = ab.codec.imap(f, g)
    val doc = ab.doc
  }
trait InvariantFunctor[F[_]] {
  def imap[A, B](fa: F[A], f: A => B, g: B => A): F[B]
}

Protocol[A, B] is an arrow

def andThen[A, B, C](ab: Protocol[A, B], bc: Protocol[B, C]): Protocol[A, C] =
  new Protocol[A, C] {
    val codec = ab.codec.andThen(bc.codec)
    val doc = Satisfying(ab, bc)
  }
trait Arrow[F[_, _]] {
  def andThen[A, B, C](ab: F[A, B], bc: F[B, C]): F[A, C]
}

Why bother with abstract things such as invariant functors and arrows?

ObjProtocol

trait ObjProtocol[Out] extends Protocol[Json, Out] {
  def codec: ObjCodec[Out]
  def doc: Object
}

ObjProtocol is cartesian

def zip[A, B](pa: ObjProtocol[A], pb: ObjProtocol[B]): ObjProtocol[(A, B)] =
  new ObjProtocol[(A, B)] {
    val codec = pa.codec.zip(pb.codec)
    val doc = Object(pa.doc.fields ++ pb.doc.fields)
  }
trait Cartesian[F[_]] {
  def zip[A, B](fa: F[A], fb: F[B]): F[(A, B)]
}

Is Protocol a monad?

trait Monad[F[_]] {
  def flatMap[A, B](fa: F[A], f: A => F[B]): F[B]
}

No.

That’s what makes possible static reasoning on protocol definitions (ie. that’s why we can get a protocol’s documentation).

What are the Protocol’s building blocks?

val string: Protocol[Json, String] =
  new Protocol[Json, String] {
    val codec = Codec.string
    val doc = Doc.string
  }
val int: Protocol[Json, Int] = …
def minValue(n: Int): Protocol[Int, Int] = …

What are the Protocol’s building blocks? (2)

def field[A](
  name: String,
  protocol: Protocol[Json, A],
  description: Option[String]
): ObjProtocol[A] =
  new ObjProtocol[A] {
    val codec = Codec.field(name, protocol.codec)
    val doc = Object(Seq(Field(name), protocol.doc, description))
  }

Examples

val buyArticle: ObjProtocol[BuyArticle] =
  imap(
    zip(
      field("articleId", string,                    Some("Article ID"),
      field("quantity",  andThen(int, minValue(1)), Some("Quantity")
    ),
    BuyArticle.apply, BuyArticle.unapply
  )

Examples

Let's put in some syntactic sugar for arrows:

val buyArticle: ObjProtocol[BuyArticle] =
  imap(
    zip(
      field("articleId", string,              Some("Article ID"),
      field("quantity",  int >>> minValue(1), Some("Quantity")
    ),
    BuyArticle.apply,
    BuyArticle.unapply
  )

Examples

And some syntactic sugar for cartesian invariant functors:

val buyArticle: ObjProtocol[BuyArticle] = (
  field("articleId", string,              Some("Article ID")) :*:
  field("quantity",  int >>> minValue(1), Some("Quantity"))
).mappedAs[BuyArticle]

Examples

And some builder dsl for fields:

val buyArticle: ObjProtocol[BuyArticle] = (
  ("articleId" as string                meaning "Article ID") :*:
  ("quantity"  as (int >>> minValue(1)) meaning "Quantity")
).mappedAs[BuyArticle]

Usage

val command = BuyArticle(articleId = "123", quantity = 5) // BuyArticle("123", 5)
> buyArticle.decode(buyArticle.encode(command))
  Right(BuyArticle("123", 5))
> buyArticle.decode(Json.parse("""{"other": "wrong"}""")
  Left(...)
> buyArticle.encode(command).pretty
  "
  {
    "articleId": "123",
    "quantity": 5
  }
  "
> buyArticle.doc.toJson.pretty  
  "
  {
    "articleId": {
      "type": "string"
    },
    "quantity": {
      "type": "number",
      "verifying": {
        "minValue": 1
      }
    }
  }
  "

Increase Our Agility

Deriving A Protocol[A] From Type A

Shapeless

Implement Protocol Instances For Generic ADTs

trait DerivedObjProtocol[A] {
  def protocol: ObjProtocol[A]
}

object DerivedObjProtocol extends SpecificInstances

If there's already a protocol in implicit scope, use it.


trait SpecificInstances extends GenericInstances {
  implicit def fromExistingProtocol[A](implicit
    existingProtocol: ObjProtocol[A]
  ): DerivedObjProtocol[A] = new DerivedObjProtocol[A] {
      def protocol = existingProtocol
    }
}

Otherwise derive one for representations in terms of generic HLists or Coproducts.

trait GenericInstances extends HListInstances with CoproductInstances {
  implicit def fromHListProtocol[A, Repr <: HList](implicit
    gen: Generic.Aux[A, Repr],  // proof that A is isomorphic to a HList
    hlistProtocol: DerivedObjProtocol[Repr]
  ): DerivedObjProtocol[A] = new DerivedObjProtocol[A] {
    def protocol = hlistProtocol.inmap(gen.from, gen.to)
  }

  implicit def fromCoproductProtocol[A, Repr <: Coproduct](implicit
    gen: Generic.Aux[A, Repr],  // proof that A is isomorphic to a Coproduct
    coproductProtocol: DerivedObjProtocol[Repr]
  ): DerivedObjProtocol[A] = new DerivedObjProtocol[A] {
    def protocol = coproductProtocol.inmap(gen.from, gen.to)
  }
}

Implement Protocol Instances For Generic ADTs (2)

Recursively build a protocol for HLists.

trait HListInstances {

  implicit val hnilProtocol: DerivedObjProtocol[HNil] = …

  implicit def hconsProtocol[H, T <: HList](implicit
    hProtocol: DerivedObjProtocol[H],
    tProtocol: DerivedObjProtocol[T]
  ): DerivedObjProtocol[H :: T] = …
}

Implement Protocol Instances For Generic ADTs (3)

Recursively build a protocol for Coproducts.

trait CoproductInstances {
  implicit val cnilProtocol: DerivedObjProtocol[CNil] = …

  implicit def cconsProtocol[L, R <: Coproduct](implicit
    leftProtocol: DerivedObjProtocol[L],
    rightProtocol: DerivedObjProtocol[R],
  ): DerivedObjProtocol[L :+: R] = …
}

Discussion

Going beyond

Using the Protocol trait, we were able to unify Codec and Doc. However, we've also tightly coupled them together.

Free applicatives, object algebras or finally tagless GADTs make it possible to define the model and its interpretation semantics independently of each other

Revisiting the Abstract Factory

trait AbstractProduct1

trait AbstractProduct2

trait AbstractProduct3

trait AbstractFactory {
  def create1: AbstractProduct1

  def create2: AbstractProduct2

  def assemble(a: AbstractProduct1, b: AbstractProduct2): AbstractProduct3
}

Revisiting the Abstract Factory (2)

object ConcreteProduct1 extends AbstractProduct1

object ConcreteProduct2 extends AbstractProduct2

object ConcreteProduct3 extends AbstractProduct3

object ConcreteFactory {
  def create1 = ConcreteProduct1

  def create2 = ConcreteProduct2

  def assemble(a: AbstractProduct1, b: AbstractProduct2) = ConcreteProduct3
}

Revisiting the Abstract Factory (3)

object Client {
  def orderStuff(factory: AbstractFactory) = {
    val first = factory.create1
    val second = factory.create2
    factory.assemble(first, second)
  }
}

Object algebras

Object algebras can be though of as Abstract Factories which define the basic building blocks and the operation which allow to combine them.

It is also equivalent to a formal grammar with terminals and productions (after all, the L in DSL stands for Language).

type ProtocolAlgebra[F[_]] {
  def integer: F[Int]

  def string: F[String]

  def imap[A, B](a: F[A], f: A => B, g: B => A): F[B]

  def zip[A, B](a: F[A], b: F[B]): F[(A, B)]

  ...
}

Given a object algebra, we can define a program. The actual evaluation of that program depends on the concrete algebra we will use.

val algebra: ProtocolAlgebra[F] = ...

val program: F[(Int, String)] =
  algebra.zip(
    algebra.integer,
    algebra.string
  )

Algebra building blocks

We can decompose the different features of our algebra into mixins, which we can later combine.

trait AlgebraTerminals[F[_]] {

  def integer: F[Int]

  def string: F[String]

}
trait InvariantFunctorAlgebra[F[_]] {

  def imap[A, B](a: F[A], f: A => B, B => A): F[B]

}
trait ArrowAlgebra[F[_, _]] {

  def andThen[A, B, C](a: F[A, B], b: F[B, C]): F[A, C]

}

Algebra building blocks (2)

We can fully customize the expressive power we need from our algebras. Some implementations may only be possible for a limited expressive power.

type WeakProtocolAlgebra[F[_]] = AlgebraTerminals[F]
  with InvariantFunctorAlgebra[F]

type StrongerAlgebra[A, F[_, _]] = AlgebraTerminals[F[A, ?]]
  with InvariantFunctorAlgebra[F[A, ?]]
  with ArrowAlgebra[F]
  with ...

Implementing algebras

We can then have different specializations and implementations of the algebra


trait CodecAlgebra[T] extends ProtocolAlgebra[Codec[T, ?]] {
  def imap[A, B](a: Codec[T, A], f: A => B, B => A): Codec[T, B] = ...

  ...
}

object JsonCodecAlgebra extends CodecAlgebra[Json] {

  val integer: Codec[Json, Integer] = ...
  val string: Codec[Json, String] = ...

  ...
}

object XmlCodecAlgebra extends CodecAlgebra[Xml] {

  val integer: Codec[Xml, Integer] = ...
  val string: Codec[Xml, String] = ...

  ...
}

Implementing algebras (2)

type TypedDoc[A] = Doc

object DocAlgebra extends ProtocolAlgebra[TypedDoc] {

  val integer: TypedDoc[Integer] = ...
  val string: TypedDoc[String] = ...

  def imap[A, B](a: TypedDoc[A], f: A => B, B => A): TypedDoc[B] = ...

  ...
}

Programming a Codec in terms of an algebra:

Let's now define our model in terms of an algebra:

def buyArticle[F[_]](algebra: ProtocolAlgebra[F]): F[BuyArticle] = {
  import algebra._
  imap(
    zip(
      field("articleId", string,                    Some("Article ID"),
      field("quantity",  andThen(int, minValue(1)), Some("Quantity")
    ),
    BuyArticle.apply, BuyArticle.unapply
  )
}

Or with some syntactic sugar:

def buyArticle[F[_]](implicit algebra: ProtocolAlgebra[F]): F[BuyArticle] = (
  ("articleId" as[String]           meaning "Article ID") :*:
  ("quantity"  as[Int](minValue(1)) meaning "Quantity")
).mappedAs[BuyArticle]

Algebra usage:

We can now reuse the model for several use cases, and easily introduce new ones

val codec: Codec[Json, BuyArticle] = buyArticle(JsonCodecAlgebra)
val doc: TypedDoc[BuyArticle] = buyArticle(DocAlgebra)


...

val graph: GraphVisualization[BuyArticle] = buyArticle(GraphAlgebra)
val bsonCodec: Codec[Bson, BuyArticle] = buyArticle(BsonAlgebra)

Conclusion

Questions?

Thanks for your attention.