This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
JSON is a popular format for data exchange between APIs and some modern databases. It’s also used as a way to archive audit logs in many systems. Because JSON can represent hierarchical relationships of data fields, many people consider it to be superior to the CSV format — although it’s certainly not yet universal.
I learned recently that newline-delimited JSON, also called JSONL or JSON Lines, is growing in popularity. In a JSONL file, each line of text represents a valid JSON object — building up to a series of records. But there is no hierarchical relationship among these lines, so when taken as a whole the JSONL file is not valid JSON. That is, a JSON parser can process each line individually, but it cannot process the file all at once.
In SAS, you can use PROC JSON to create valid JSON files. And you can use the JSON libname engine to parse valid JSON files. But neither of these can create or parse JSONL files directly. Here’s a simple example of a JSONL file. Each line, enclosed in braces, represents valid JSON. But if you paste the entire body into a validation tool like JSONLint, the parsing fails.
If we needed these records to be true JSON, we need a hierarchy. This requires us to set off the rows with more braces and brackets and separate them with commas, like this:
Creating JSONL with PROC JSON and DATA step
In a recent SAS Support Communities thread, a SAS user was struggling to use PROC JSON and a SAS data set to create a JSONL file for use with the Amazon Redshift database. PROC JSON can’t create the finished file directly, but we can use PROC JSON to create the individual JSON object records. Our solution looks like this:
- Use PROC JSON to read each record of the source data set and create a new JSON file. DATA step and CALL EXECUTE can generate these steps for us.
- Using DATA step, post-process the collection of JSON files and append these into a final JSONL file.
Here’s the code we used. You need to change only the output file name and the source SAS data set.
/* Build a JSONL (newline-delimited JSON) file */ /* from the records in a SAS data set */ filename final "c:\temp\final.jsonl" ; %let datasource = sashelp.class; /* Create a new subfolder in WORK to hold */ /* temp JSON files, avoiding conflicts */ options dlcreatedir; %let workpath = %sysfunc(getoption(WORK))/json; libname json "&workpath."; libname json clear; /* Will create a run a separate PROC JSON step */ /* for each record. This might take a while */ /* for very large data. */ /* Each iteration will create a new JSON file */ data _null_; set &datasource.; call execute(catt('filename out "',"&workpath./out",_n_,'.json";')); call execute('proc json out=out nosastags ;'); call execute("export &datasource.(obs="||_n_||" firstobs="||_n_||");"); call execute('run;'); run; /* This will concatenate the collection of JSON files */ /* into a single JSONL file */ data _null_; file final encoding='utf-8' termstr=cr; infile "&workpath./out*.json"; input; /* trim the start and end [ ] characters */ final = substr(_infile_,2,length(_infile_)-2); put final; run; |
From what I’ve read, it’s a common practice to compress JSONL files with gzip for storage or faster transfers. That’s a simple step to apply in our example, because SAS supports a GZIP method in SAS 9.4 Maintenance 5. To create a gzipped final result, change the first FILENAME statement to something like:
filename final ZIP "c:\temp\final.jsonl.gz" GZIP;
The JSONL format is new to me and I haven’t needed to use it in any of my applications. If you use JSONL in your work, I’d love to hear your feedback about whether this approach would create the types of files you need.
The post Create newline-delimited JSON (or JSONL) with SAS appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |