Goodbye XPT files, hello Dataset-JSON!

There has been a lot of buzz around Dataset-JSON for a while now, and so, like any responsible vendor, we have been keeping a close eye on the development of the standard.

Ever since I started working on SEND, there was frustration with the requirement to use SAS transport file as the file format. Over the years CDISC, PHUSE and FDA all tried to push forward with alternatives. Does anyone remember experimenting with Dataset -XML?

Last year CDISC published Dataset-JSON v1.1. This is a standard for representing CDISC standards in JSON (JavaScript Object Notation), which is a lightweight data-interchange format that is easy for machines to parse and generate and is commonly used by systems to transmit data. It claims to be human-readable, but only so far in the fact that XPT is a binary format, meaning that specialist tools are needed to read/write files. JSON files can be opened in things like a text editor, but I’d say they are only readable by humans like you or I as long as we really know what we are looking at. Still, these files are far more usable than the restrictive XPT files.

On the back of this, just within the past few weeks, the FDA has issued a Federal Register Notice (FRN) asking for feedback from industry on FDA’s potential adoption of Dataset-JSON v1.1. They are specifically asking for feedback regarding the impact on tools. Although not stated explicitly by the FRN, I think it would be reasonable to assume that the agency is keen on Dataset-JSON and would prefer this over the XPT files we have been tied to for years.

As the leading vendor and supplier of SEND tools and services, Dataset-JSON v1.1 is obviously going to impact my world. So, I’m really pleased to see the agency is asking this question. There have been various noises around Dataset-JSON for a while now, and so, like any responsible vendor, we have been keeping a close eye on the development of the standard.

There have also been free open-source tools and utilities developed to help the industry experiment with Dataset-JSON and prepare for its adoption. All of this has meant that at Instem, we are in a really good place to be able to implement support for Dataset-JSON. So at least for us and our customers tools will be available in good time for any FDA requirement for this file format.

If you are interested in getting into the nitty-gritty of the technical details, its worth mentioning that the Dataset-JSON v1.1 standard actually supports 2 different file formats:

  • .json
  • .ndjson

The difference is that .ndjson is Newline-Delimited JSON (also known as JSON Lines) and is much better suited to streaming data.

Would all tools need to support both file formats, or would FDA simply require one and not the other? What if a different regulator or consortium adopts SEND but favours the other flavour of Dataset-JSON?

Those are the sorts of questions that worry me. However, my immediate focus is ensuring that all of our tools and services are fully compliant with Dataset-JSON v1.1 well ahead of its adoption by regulators.

If you would like to provide feedback to FDA, here is the Federal Register Notice: Federal Register Notice.


‘til next time,
-Marc

Marc Ellison

Marc Ellison is the Director of SEND Solutions at Instem and has been a CDISC volunteer for 12 years. He has 3 decades of experience creating nonclinical software and working with researchers on how to best collect and organize their data. Marc refers to himself as a “SEND nerd” and is truly passionate about the concepts, debates, and evolutions around the SEND standard. Being a strong advocate for the importance of SEND in accelerating research, Marc launched his own educational blog at Instem called “Sensible SEND” to help educate and prepare researchers with cutting-edge details and explanations about the ever-developing process.

Share This Article