Split/Slice large JSON using jq

Question

Would like to SLICE a huge json file ~20GB into smaller chunk of data based on array size (10000/50000 etc).. Input: Currently running in a loop to get the desire output by incrementing x/y value, but performance is very slow and takes very 8-20 seconds for a iteration depends on size of the file to complete …

Accepted Answer

In this response, which calls jq just once, I&#8217;m going to assume your computer has enough memory to read the entire JSON.  I&#8217;ll also assume you want to create separate files for each slice, and that you want the JSON to be pretty-printed in each file.Assuming a chunk size of 2, and that the output files are to be named using the template part-N.json, you could write:< input.json jq -r --argjson size 2 '  del(.add) as $object  | (.add|_nwise($size) | ("t", $object + {add:.} ))' | awk '      /^t/ {fn++; next}      { print >> "part-" fn ".json"}'The trick being used here is that valid JSON cannot contain a tab character.

Advertisement

Answer