Skip to content
Advertisement

How to paste a custom linux stdout into a JSON with Python?

I’m a beginner in Python and I’m trying to write a script that will take in a unix stdout file and convert it into a JSON format file. The format of the file is the following on every line:

inodeNumber fileSize ownerName pathToFile

The path can contain whitespaces and backslash characters, an example is:

236342512 200 George usr/temp/a path/random1.txt

Now my problem is that if I use the split(” “) method and store it into a dictionary, the whitespaces in the file path will create more than 1 key-value for the path. I have thought of encoding it but it still won’t solve the space problem in the path as that space will be encoded too.

The JSON format I am trying to get is as follows:

{
   "files": [{
       "inodeNumber": "236342512",
       "fileSize": "200",
       "ownerName": "George",
       "pathToFile": "usr/temp/a path/random1.txt"
    },
    {...}]
}

Also, is the best solution to proceed for such a conversion from that custom unix stdout file into JSON to store each attribute into a key-value pair in a Python dictionary and then creating a JSON object and dumping it into a file? I will be working with very large files (over 1gb each!) so performance will need to be taken into consideration too.

Thanks in advance!

Advertisement

Answer

I’d use this method to parse the line, as it does not assumes anything about the path:

s = r'236342512 200 George usr/temp/a path/random1.txt'

def parseLine(s):
    sList = s.split(' ')
    D = {}
    D['inodeNumber'] = sList[0]
    D['fileSize'] = sList[1]
    D['ownerName'] = sList[2]
    D['pathToFile'] = ' '.join(sList[3:])
    return D

print(parseLine(s))   

for your example, it gives this output:

{'inodeNumber': '236342512', 'fileSize': '200', 'ownerName': 'George', 'pathToFile': 'usr/temp/a\ path/random1.txt'}
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement