JSON vs YAML: Which Data Format to Feed Your Web System?

It is common for large systems to capture definitions from data files that are closely related to business operations. This article compares JSON and YAML, the most common data formats to feed into web systems.

Introduction

JSON stands for JavaScript Object Notation. Originated from JavaScript, the prevailing web programming language, JSON has always been a de facto standard for API endpoints and has even become an ISO standard (ISO/IEC 21778:2017) in November 2017.

YAML, acronym for “YAML Ain’t Markup Language,” is a human friendly data serialization standard that was most recently refined in 2009 with the version YAML 1.2. It is the de facto standard used by sysadmins and devOps for system configuration.

YAML is functionally a superset of JSON, and many services officially support both formats at the same time, such as AWS CloudFormation.

Compatibility

Your web system may not feature a single technology stack. JSON stands with a great advantage with its compatibility. For example, Python offers native support of JSON, but not YAML. In some cases when Python environment is even heavy, there are still great tools supporting JSON, such as jq, a JSON content processor written in C with zero runtime dependencies.

For this reason, there exists a number of standardized extensions built upon JSON for various purposes. JSON Schema (currently IETF draft) uses JSON to validate if a piece of JSON data is correctly constructed with all required fields and types. JWT (JSON Web Token, RFC 7519) uses two pieces of JSON data for signing and verification.

Winner: JSON

Functionality

JSON is based on the simplest native data types in JavaScript: number, string, boolean, array, object, and null. There is no extension whatsoever to these data types and the way they are defined.

YAML, on the other hand, really has all the bells and whistles. As a superset of JSON, it additionally supports the syntax of infinity value, binary value, hexadecimal integers, set, ordered map, scalar, tag, and so on. Syntax-wise, a very useful one YAML has is anchor and alias, which allows the reuse of a part of document for multiple times without repeating oneself.

Unlike JSON, YAML is not meant to be mapped to a specific programming language, and many of its extras can be utilized as implementation-specific features. For example, Python does not have native support of tags, so it is completely up to applications to define what they should mean. AWS CloudFormation would read tags as intrinsic functions. For example:

MyAndCondition: !And
  - !Equals ["sg-mysggroup", !Ref "ASecurityGroup"]
  - !Condition SomeOtherCondition

is read as an abbreviation of:

MyAndCondition: 
  Fn::And:
    - Fn::Equals:
        - sg-mysggroup
        - Ref: ASecurityGroup
    - Fn::Condition: SomeOtherCondition

In a different context, for example, you may want to import other YAML files in your main YAML file. And you can simply define extensions in PyYAML. Here’s an example using the open-source package py-yaml-builder:

Resources: !include
  - service/database.yaml
  - service/s3.yaml
  - service/lambda.yaml

From this perspective, YAML opens up to endless possibilities for your custom needs.

Winner: YAML

Readability

YAML can contain comments, allowing developers to include explanatory notes within the data structure. This feature enhances the readability of YAML files, making it easier for both humans and machines to understand the purpose and context of the data. In contrast, JSON does not support native comments, which can make it less expressive in terms of documentation within the file itself. While developers often resort to external documentation for JSON, YAML’s in-line commenting provides a more convenient and compact way to annotate the code.

However, there is one potential challenge arising from the use of the YAML tag feature. When utilizing the CloudFormation template language, specifically when combining the !ImportValue function with the !Sub function, the short form of !ImportValue cannot be used when it contains the short form of !Sub.

# Does not work
!ImportValue
  !Sub '${NetworkStack}-SubnetID'

Instead, the full function name must be used, for example:

Fn::ImportValue:
  !Sub "${NetworkStack}-SubnetID"

As the old saying, with great power comes great responsibility, YAML’s powerful syntaxes might as well be an overhead for your specific project.

Winner: tie

Published by

Ling YANG

Lead consultant at Studio theYANG, an independent web software consulting studio from Montreal, Canada focused on maintenance and support of Python and Linux systems.

Leave a comment