I have a tar archive which is very big ~ 5GB.
I want to grep for a pattern on all files (and also print the name of the file that has the pattern ) in the archive but do not want to fill up my disk space by extracting the archive.
Anyway I can do that?
I tried these, but this does not give me the file names that contain the pattern, just the matching lines:
JavaScript
x
tar -O -xf test.tar.gz | grep 'this'
tar -xf test.tar.gz --to-command='grep awesome'
Also where is this feature of tar documented? tar xf test.tar $FILE
Advertisement
Answer
Here’s my take on this:
JavaScript
while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')
Broken out for explanation:
while read filename; do
— it’s a loop…tar -xOf file.tar "$filename"
— this extracts each file…| grep 'pattern'
— here’s where you put your pattern…| sed "s|^|$filename:|";
– prepend the filename, so this looks like grep. Salt to taste.done < <(tar -tf file.tar | grep -v '/$')
— end the loop, get the list of files as to fead to yourwhile read
.
One proviso: this breaks if you have OR bars (|
) in your filenames.
Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrc
file:
JavaScript
targrep() {
local taropt=""
if [[ ! -f "$2" ]]; then
echo "Usage: targrep pattern file ..."
fi
while [[ -n "$2" ]]; do
if [[ ! -f "$2" ]]; then
echo "targrep: $2: No such file" >&2
fi
case "$2" in
*.tar.gz) taropt="-z" ;;
*) taropt="" ;;
esac
while read filename; do
tar $taropt -xOf "$2"
| grep "$1"
| sed "s|^|$filename:|";
done < <(tar $taropt -tf $2 | grep -v '/$')
shift
done
}