Skip to content
Advertisement

How to grep for a pattern in the files in tar archive without filling up disk space

I have a tar archive which is very big ~ 5GB.

I want to grep for a pattern on all files (and also print the name of the file that has the pattern ) in the archive but do not want to fill up my disk space by extracting the archive.

Anyway I can do that?

I tried these, but this does not give me the file names that contain the pattern, just the matching lines:

tar -O -xf test.tar.gz | grep 'this'
tar -xf test.tar.gz --to-command='grep awesome'

Also where is this feature of tar documented? tar xf test.tar $FILE

Advertisement

Answer

Here’s my take on this:

while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')

Broken out for explanation:

  • while read filename; do — it’s a loop…
  • tar -xOf file.tar "$filename" — this extracts each file…
  • | grep 'pattern' — here’s where you put your pattern…
  • | sed "s|^|$filename:|"; – prepend the filename, so this looks like grep. Salt to taste.
  • done < <(tar -tf file.tar | grep -v '/$') — end the loop, get the list of files as to fead to your while read.

One proviso: this breaks if you have OR bars (|) in your filenames.

Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrc file:

targrep() {

  local taropt=""

  if [[ ! -f "$2" ]]; then
    echo "Usage: targrep pattern file ..."
  fi

  while [[ -n "$2" ]]; do    

    if [[ ! -f "$2" ]]; then
      echo "targrep: $2: No such file" >&2
    fi

    case "$2" in
      *.tar.gz) taropt="-z" ;;
      *) taropt="" ;;
    esac

    while read filename; do
      tar $taropt -xOf "$2" 
       | grep "$1" 
       | sed "s|^|$filename:|";
    done < <(tar $taropt -tf $2 | grep -v '/$')

  shift

  done
}
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement