Skip to content
Advertisement

How to get filenames and md5sum from google bucket in csv format with gsutil ls

I am trying to to get all files from google cloud storage with md5sum and all as csv.

Condition: run it from bash and use only linux commands

When I run this:

JavaScript

It returns yaml as a stream:

JavaScript

What I’d like to see is this:

JavaScript

Advertisement

Answer

With a docker:

JavaScript

Or use gsutil directly if installed:

JavaScript

Steps:

  1. Run gsutil and pipe it to egrep to get only lines with filename and md5sum
  2. Remove all the new line characters from the stream with tr -d 'n'
  3. Lean on md5sum ending ‘==’ to replace it again with the newline (the one we need) with tr -s '=' 'n'
  4. Optionally remove other things like “Hash (md5):”
  5. Use sed to return removed “==” to the end of the line sed 's/$/==/g'
  6. Remove ‘: ‘ (colon with space after “.jpg: “)
  7. Finally replace all spaces with the comma with tr -s ' ' ','

This is one-liner that I’ve been looking for. It works but probably it could be achieved with less steps and less tools.

I know this can be achieved with python, perl and whatnot – but I would happy to see other “one-liner” approaches.

Advertisement