Skip to content
Advertisement

How to exclude print commands of python script in variable of shell script

I have a python script called spark.py. This scipt will be invoked using a shell script in Linux.

The spark.py is like below:

#!/usr/bin/env python

import sys
import os

if len(sys.argv) != 2:
     print "Invalid number of args......"
     print "Usage: spark-submit file.py Arguments"
     exit()

table=sys.argv[1]
hivedb=sys.argv[2]

from pyspark import SparkContext, SparkConf
conf = SparkConf()
sc = SparkContext(conf=conf)
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
from datetime import datetime


df.registerTempTable('mytempTable')

date=datetime.now().strftime('%Y-%m-%d %H:%M:%S')


try:
    sqlContext.sql("create table {}.`{}` as select * from mytempTable".format(hivedb,table))
except Exception as e:
    status = 'fail'
    error_message = e
else:  # Executes only if no Exception.
    status = 'success'
    error_message = 'No error'
print error_message

print ("{},{},{},{},{}".format(hivedb,table,date,status,error_message))

if status != 'success': sys.exit(1)
sc.stop()

The shell.sh is like below

#!/bin/bash

source /home/$USER/source.sh

[ $# -ne 2 ] && { echo "Usage : $0 input file "; exit 1; }
table=$1
hivedb=$2


TIMESTAMP=`date "+%Y-%m-%d"`
touch /home/$USER/logs/${TIMESTAMP}.success_log
touch /home/$USER/logs/${TIMESTAMP}.fail_log
success_logs=/home/$USER/logs/${TIMESTAMP}.success_log
failed_logs=/home/$USER/logs/${TIMESTAMP}.fail_log

#Function to get the status of the job creation
function log_status
{
       status=$1
       message=$2
       if [ "$status" -ne 0 ]; then
                echo "$result" | tee -a "${failed_logs}"
                else
                    echo "$result" | tee -a "${success_logs}"
                fi
}

result=$(spark-submit --name "Spark" --master "yarn-client" /home/$USER/spark.py ${table} ${hivedb})
g_STATUS=$?
log_status $g_STATUS  "$result"

In this shell script I am collecting the output of the spark.py as a variable. When I do so I am unable to see any print commands of the spark.py in the console logs in Linux.

How can I have all the print commands to be printed in the linux console logs.

And In my spark.py script I have

print error_message

print ("{},{},{},{},{}".format(hivedb,table,date,status,error_message))

How can I exclude the print error_message while collecting the output as a variable in the shell.sh .

Advertisement

Answer

A simple way would be to add echo "$result" to your shell script. You could also revise the sub command to add a tee at the end: result=$(... | tee /dev/stderr ).

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement