using grep commands to find a duplicate id within a json file

Question

I am looking for a way to use grep on a linux server to find duplicate json records, is it possible to have a grep to search for duplicate id's in the example below ? so the grep would return: 01 Answer OK, discarding any whitespace from the JSON strings I can offer this if awk is acceptable - hutch

Accepted Answer

OK, discarding any whitespace from the JSON strings I can offer this if awk is acceptable &#8211; hutch being the formatted chunk of JSON above in a file.I use tr to remove any whitespace, use , as a field separator in awk; iterate over the one long lines elements with a for-loop, do some pattern-matching in awk to isolate ID fields and increment an array for each matched ID. At the end of processing I iterate over the array and print ID&#8217;s that have more than one match.Here your data:$ cat hutch { "book": [  {     "id": "01",     "language": "Java",     "edition": "third",     "author": "Herbert Schildt"  },  {     "id": "02",     "language": "Java",     "edition": "third",     "author": "Herbert Schildt"  },  {     "id": "03",     "language": "Java",     "edition": "third",     "author": "Herbert Schildt"  },  {     "id": "01",     "language": "Java",     "edition": "third",     "author": "Herbert Schildt"  },  {     "id": "04",     "language": "C++",     "edition": "second",     "author": "E.Balagurusamy"  } ]}And here the finding of dupes:$ tr -d '[:space:]' <hutch |  awk -F, '{for(i=1;i<=NF;i++){if($i~/"id":/){a[gensub(/^.*"id":"([0-9]+)"$/, "\1","1",$i)]++}}}END{for(i in a){if(a[i]>1){print i}}}'01

Advertisement

Answer