My system depends on only having one file (PDF, DOCX) per subdirectory. There are thousands and thousands of subdirectories. Due to a permission error, in some of them, I have ended up with more than one file. In these instances, I only want to keep the one most recently modified file.
I was able to export a list of directories that contain more than one file successfully:
find . -type f -printf '%hn' | sort | uniq -d >test.txt
So I end up with a nice list of all those directories that I need to look at. But it’s rather long.
I was also able to automate the deletion of everything but the most recently modified file in a directory:
ls -t | tail -n +2 | xargs -d 'n' rm -f
That does remove all files but the most recently modified one.
The problem I am running into is that the second command only works within that directory. I have not figured out a way to apply it recursively to all directories.
I have attempted:
find /data/test/CONTAINER/SANDBOX -type f -exec sh -c 'ls -t | tail -n +2 | xargs -d 'n' rm -f ' {} ;
but that just yielded xargs: argument line too long
I have tried to adjust the xargs parameters, but I am sure there must be a better way to perform this? Perhaps a shell scrip that pipes the test.txt file fo the folders to cd into and then perform command two in each of these? Or simply a way to recursively apply command 2 to all subfolders, regardless of how many files are contained within that folder?
The last thing I was thinking of is that perhaps the command 3 I had tried applies from the main directory, where I have hundreds of thousands of directories, no wonder the argument line could be too long – but -mindepth 2 didnt change a thing.
Thank you
Advertisement
Answer
I think the following script should do the trick for you.
#!/bin/bash DIR_TO_FIND="/path/to/dir" find "$DIR_TO_FIND" -type d | while read -r DIR; do cd "$DIR" ls -t | tail -n +2 | xargs -d 'n' rm -f cd "$DIR_TO_FIND" done