Skip to content
Advertisement

using grep commands to find a duplicate id within a json file

I am looking for a way to use grep on a linux server to find duplicate json records, is it possible to have a grep to search for duplicate id’s in the example below ?

so the grep would return: 01

{
 "book": [

  {
     "id": "01",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },
  {
     "id": "02",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },
  {
     "id": "03",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },
  {
     "id": "01",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },

  {
     "id": "04",
     "language": "C++",
     "edition": "second",
     "author": "E.Balagurusamy"
  }

 ]
}

Advertisement

Answer

OK, discarding any whitespace from the JSON strings I can offer this if awk is acceptable – hutch being the formatted chunk of JSON above in a file.

I use tr to remove any whitespace, use , as a field separator in awk; iterate over the one long lines elements with a for-loop, do some pattern-matching in awk to isolate ID fields and increment an array for each matched ID. At the end of processing I iterate over the array and print ID’s that have more than one match.

Here your data:

$ cat hutch 
{
 "book": [

  {
     "id": "01",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },
  {
     "id": "02",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },
  {
     "id": "03",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },
  {
     "id": "01",
     "language": "Java",
     "edition": "third",
     "author": "Herbert Schildt"
  },

  {
     "id": "04",
     "language": "C++",
     "edition": "second",
     "author": "E.Balagurusamy"
  }

 ]
}

And here the finding of dupes:

$ tr -d '[:space:]' <hutch |  awk -F, '{for(i=1;i<=NF;i++){if($i~/"id":/){a[gensub(/^.*"id":"([0-9]+)"$/, "\1","1",$i)]++}}}END{for(i in a){if(a[i]>1){print i}}}'
01
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement