How to get filenames and md5sum from google bucket in csv format with gsutil ls

Question

I am trying to to get all files from google cloud storage with md5sum and all as csv. Condition: run it from bash and use only linux commands When I run this: It returns yaml as a stream: What I'd like to see is this: Answer With a docker: Or use gsutil directly if installed: Steps: Run gsutil and pipe

Accepted Answer

With a docker:docker run --name gcloud-gsutil-vladimir --rm --volumes-from gcloud-config -i google/cloud-sdk:latest gsutil ls -L -r gs://some-bucket/subfolder/**|egrep "gs.*@@.*jpg|md5.*"|tr -d 'n'|tr -s '=' 'n'| sed 's/Hash (md5)://'|sed 's/$/==/g'|sed 's/: //'|tr -s ' ' ','Or use gsutil directly if installed:gsutil ls -L -r gs://some-bucket/subfolder/**|egrep "gs.*@@.*jpg|md5.*"|tr -d 'n'|tr -s '=' 'n'| sed 's/Hash (md5)://'|sed 's/$/==/g'|sed 's/: //'|tr -s ' ' ','Steps:Run gsutil and pipe it to egrep to get only lines with filename and md5sumRemove all the new line characters from the stream with tr -d 'n'Lean on md5sum ending &#8216;==&#8217; to replace it again with the newline (the one we need) with tr -s '=' 'n'Optionally remove other things like &#8220;Hash (md5):&#8221;Use sed to return removed &#8220;==&#8221; to the end of the line sed 's/$/==/g'Remove &#8216;: &#8216; (colon with space after &#8220;.jpg: &#8220;)Finally replace all spaces with the comma with tr -s ' ' ','This is one-liner that I&#8217;ve been looking for. It works but probably it could be achieved with less steps and less tools.I know this can be achieved with python, perl and whatnot &#8211; but I would happy to see other &#8220;one-liner&#8221; approaches.

Advertisement

Answer