Skip to content
Advertisement

Losing stdout data in python

I’m trying to make a python script which is going run a bash script on a remote machine via ssh and then parse its output. The bash script outputs lot of data (like 5 megabytes of text / 50k lines) in stdout and here is a problem – I’m getting all the data only in ~10% cases. In other 90% cases I’m getting about 97% of what i expect and it looks like it always trims at the end. This is how my script looks like:

import subprocess
import re
import sys
import paramiko

def run_ssh_command(ip, port, username, password, command):
    ssh = paramiko.SSHClient()    
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())                                                   
    ssh.connect(ip, port, username, password)                                                                   
    stdin, stdout, stderr = ssh.exec_command(command)                                                           
    output = ''                                                                                                 
    while not stdout.channel.exit_status_ready():                                                               
        solo_line = ''                                                                                          
        # Print stdout data when available                                                                      
        if stdout.channel.recv_ready():                                                                         
            # Retrieve the first 1024 bytes                                                                     
            solo_line = stdout.channel.recv(2048).                                                              
            output += solo_line                                                                                 
    ssh.close()                                                                                                 
    return output                                                                                  

result = run_ssh_command(server_ip, server_port, login, password, 'cat /var/log/somefile')
print "result size: ", len(result)                                                                                    

I’m pretty sure that problem is in overflowing of some internal buffer, but which one and how to fix it?

Thank you very much for any tip!

Advertisement

Answer

When stdout.channel.exit_status_ready() starts returning True, there might still be a lot of data on the remote side, waiting to be sent. But you only receive one more chunk of 2048 bytes and quit.

Instead of checking the exit status, you could keep calling recv(2048) until it returns an empty string, which means that no more data is coming:

output = ''
next_chunk = True
while next_chunk:
    next_chunk = stdout.channel.recv(2048)
    output += next_chunk

But really you probably just want:

output = stdout.read()
Advertisement