In my previous blog post we discussed the feedback the FDA received regarding the new CDISC standard Dataset-JSON v1.1 when they issued their Federal Register Notice. One of the standout points was the difference of opinion concerning the readiness of tools. So, in today’s post, I thought we’d dive a little deeper into this topic.
CDISC hackathon tools
The first set of tools that I became aware of were the output of various CDISC hackathons. The first hackathon was back in 2022 and had various tools developed including technology covering file conversion from and to xpt. However the second hackathon was more focused and aimed to create a draft API specification for Dataset-JSON. The third hackathon focused on developing Dataset-JSON Viewers. Seven strong Viewer applications were submitted, with the winning entry—VDE Dataset Viewer by Dmitry Kolosov—praised for its usability, performance, and advanced features. I personally used the VDE Dataset Viewer to test the updates to Submit™ for generating Dataset-JSON v1.1.
While these are great resources for industry, speaking with my nonclinical blinkers on, they have limited use, as they are open-source, unsupported, unvalidated tools on GitHub. In the nonclinical world, we prefer fully commercially available, supported, validated tools from trusted vendors in the space.
When JSON is not just JSON
As anyone who has started working with Dataset-JSON v1.1 will tell you, it’s not actually a single new file format. The standard calls out the use of both JSON and NDJSON as file types. JSON (JavaScript Object Notation) is a standard format for structured data using objects and arrays, while NDJSON (Newline-Delimited JSON) is a format where each line is a separate, valid JSON object—ideal for streaming large datasets. Neither CDISC nor FDA has indicated a preference between these file formats, and it’s unclear if commercially available tools would support both or would function better with a particular one. In addition, CDISC recently published the CDISC Compressed Dataset-JSON Specification v1.0, which supports another file type: Compressed JSON. Compressed JSON refers to JSON data that has been reduced in size using compression algorithms, making it faster to transmit and store. XML4Pharma has produced an open-source tool called Dataset-JSON Converter, which allows for easy conversion between JSON, NDJSON, and Compressed JSON.
Submit™: beta testing
We recently made a beta test version of Submit™ v5.3 available to our customers who were interested in taking part in a test program. This version of Submit includes functionality for generating SEND in Dataset-JSON v1.1 from:
- Provantis
- SEND XPT
- Any excel, csv, or similar flat files of data
- Or any combination of the above
It also supports generating SEND XPT from Dataset-JSON for backward compatibility with systems not yet supporting Dataset-JSON.
All existing functionality, like applying Controlled Terminology, applying pre-defined Edit Rules and the like, were fully supported for Dataset-JSON v1.1.
The customers that took part in the beta test were impressed with how seamlessly we had incorporated this new output standard. They were grateful for the opportunity to experiment with Dataset-JSON while the FDA was requesting feedback. They were appreciative of how quickly we had been able to respond to the Federal Register Notice and get software to them.
Several of our customers ran the output through Pinnacle 21 using version 4.1.0. This version was intended to have some compatibility with Dataset-JSON, however it was released over six months prior to Dataset-JSON v1.1 being finalized, and our customers found that it was not yet ready for Dataset-JSON v1.1 at this time. I’m sure this is something that will be rectified in the future.
Submit™: beyond beta testing
After the successful beta test for generating Dataset-JSON files, we have now included full support for Dataset-JSON across:
- SENDView – our tool for viewing and QA reviewing SEND datasets
- DefineNow – our tool for automating the generation of the define.xml file based on data found in the SEND datasets
- GuidePro – our tool for automating the generation of the nSDRG document based on data found in the SEND datasets
This functionality will be deployed for our customers to use in production in 2026.
Conclusion
There are a variety of tools, from open-source solutions created at hackathons, to validated and supported offerings from longstanding industry vendors. Our biggest unknown right now is the timing and strategy that the FDA will use to require Dataset-JSON v1.1, but I believe our nonclinical industry is in a good position for however and whenever that happens.
‘til next time
Marc