[Dao Data Science] An Introduction to JSON

0 Comments

json-logoA short introduction to JSON (JavaScript Object Notation), the most used data format in data science and web development is explored in this post, inspired by a newly published ebook [1]. The main purpose is to introduce JSON’s advantages, disadvantages and its future, as well as some relevant terms.

JSON, HTTP, REST and IoT

We start by introducing some popular terms in data science.

  • JavaScript Object Notation (JSON) is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is the primary data format used for asynchronous browser/server communication (AJAX), largely replacing XML.
  • The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.
  • Representational State Transfer (REST) In computing, Representational State Transfer (REST) is a software architecture style for building scalable web services. REST gives a co-ordinated set of constraints to the design of components in a distributed hypermedia system that can lead to a higher performing and more maintainable architecture.
  • Internet of Things (IoT) A network of objects (such as sensors and actuators) that can capture data autonomously and self-configure intelligently based on physical-world events, allowing these systems to become active participants in various public, commercial, scientific, and personal processes.

A general description of the relationship between these terms is: most internet companies, especially in social media and data science area, provide web-based REST(ful) APIs as essential and/or extra augmented services; most REST APIs use JSON as data formats over HTTP; and REST is the core part for uniformly accessing and modifying resources in IoT. The current popular REST model uses URIs (Uniform Resource Identifier) to identify the “representation” of objects (eg. “/user/4321”), HTTP verbs to specify an action (eg. “GET” or “POST”), and JSON to represent the object. To fetch an object, a client may send an HTTP request to “GET /user/4321”. The server may respond with an HTTP 200 and a body containing relevant JSON format data. REST is a good model for IoT. Each device can easily make its state information available, and can standardize on a way to create, read, update, and delete that data.

JSON’s Advantages

When JavaScript legend, Douglas Crockford introduced the JSON format, he was interested in specifying a format that eased data interactions between web applications and JavaScript-based clients. It’s a lightweight alternative to XML so JSON quickly gained traction among web developers, and later reached a more general audience.

Several features of JSON make it a great candidate for general purpose data interchange.

  • JSON provides a string-based data format which are friendly human-readable.
  • JSON only needs a very simple schema to format data, ie. as long as the JSON is well-formed, it is valid.
  • JSON supports a minimal and straightforward set of data types: strings, numbers, booleans, objects, arrays, and a null value.
  • JSON data is represented in JavaScript syntax, which makes it both human-readable and easily parseable. One would be hard pressed to find a popular programming language that does not have at least one JSON parser.
  • JSON has many positive side-effects compared to XML, the former popular data format on the web — its cleaner, smaller, simpler (syntax) and faster (parsing).

The following example demonstrates the data type and schema of JSON:

{“array”: [ 1, 2, 3 ],
“boolean”: true,
“null”: null,
“number”: 123,
“object”: {
“a”: “b”,
“c”: “d”,
“e”: “f”},
“string”: “Hello World”}   

JSON is promoted as a low-overhead alternative to XML as both of these formats have widespread support for creation, reading and decoding in the real-world situations where they are commonly used. Other data formats that might provide similar functions include OGDL, YAML and CSV.

Apart from the powerful support of JSON processing in almost all main stream programming languages, there are some excellent standalone JSON processors/parsers/viewers. We recommend three of them as follows,

  • jq: command-line JSON processor; (left screenshot)
  • json_viewjsonview: a browser extension that helps you view JSON documents in the browser; (central screenshot)
  • json editor online: a web-based tool to view, edit, and format JSON. It shows your data side by side in a clear, editable treeview and in a code editor. (right screenshot)

Screenshot-from-2015-05-18-220437   Screenshot from 2015-08-27 11:57:08  Screenshot from 2015-08-27 11:55:41

JSON’s Disadvantages

Although JSON is already a general purpose data format for REST and web-based applications, there are some drawbacks and use cases for us to doubt its appropriateness for the future IoT systems that make up the smart device landscape. According to [1], smart and complex IoT devices typically need to optimize along the following lines:

  • Keep inter- and/or intra- network traffic small and fast. This also requires the message size to be as small as possible.
  • Minimize the amount of raw computation for network encoding and decoding.
  • Use only small amounts of memory and storage.

Devices may run with less than a megabyte of memory or storage, and often run on small batteries. For power consumption reasons, they may only be on the WiFi network for a few seconds at a time, and sometimes only a few times a day. Efficiency is becoming more of a key factor, especially when devices are connected together through networking. JSON is not the best candidate for meeting these requirements.

  • From a space-efficient point of view, JSON is not the best encoding, as all JSON data is expressed as ASCII strings, often with lots of white space added.
  • On the other hand, “things will develop in opposite direction when they become extreme”. The simplicity of JSON format introduces complexity in implementation. JSON’s simple types sometimes are struggling to match the types typically used in IoT programming, such as various numeric types (float numbers, double numbers, etc.) The problem is also compounded by JSON’s simple schema, ie. arrays can contain any number of types, and there are no constraints on how the fields of an object are used.
  • There is also a problem of interpreting a JSON data structure in some cases as fields are essentially unordered (except for arrays). Strategies used for efficient field-level processing generally don’t work well with JSON, which means you have to parse and store large amount of data and results in memory.

Practically, JSON is apparently not a good data format for some use cases. Do we have alternative choices?

JSON’s Future

The author of [1] claimed that “the future of JSON is BINARY”. I agree with that, but would like to add that “the future will be JSON+BINARY”. This means, JSON currently is and in the future will be the best data format for some IoT devices and web-based applications, while BINARY data format will play more important roles for those devices or systems where JSON is struggling to function. Binary encodings are better suited for constrained devices that need space efficiency, such as heater and temperature controller, etc.

Examples of some alternatives of JSON and XML are listed as follows:

  • The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.
  • The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between variety of programming languages.
  • Google’s Protocol Buffers (Protobuf) are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

CBOR is self-describing, and the encoding is focused on producing small message sizes between devices. Both Apache Thrift and Protobuf provide binary encodings of data and have the advantage of automatically enforced schemas.

I believe that JSON is still and will be the most popular data format in data science and web development until more powerful and cross-platform data format(s) emerge in the near future.

References

1. ‘DZone Guide to Internet of Things’, ebook, https://dzone.com/servlet/storage/?file=162677.

2. CBOR.io.

3. The Apache Thrift framework.

4. Protocol Buffers, Google Developers.

Be the first to post a comment.

Add a comment