linux - Get logs of a function separately each time it is iterated in python -
i have pyspark script below. in script looping through input file table names , executing code.
now want collect logs separately each time function mysql_spark iterated.
for example:
input file
table1 table2 table3 now when execute pyspark script having logs 3 tables in single file.
what want 3 separate log files 1 each table
pyspark script:
#!/usr/bin/env python import sys pyspark import sparkcontext, sparkconf pyspark.sql import hivecontext conf = sparkconf() sc = sparkcontext(conf=conf) sqlcontext = hivecontext(sc) #condition specify exact number of arguments in spark-submit command line if len(sys.argv) != 5: print "invalid number of args......" print "usage: spark-submit import.py arguments" exit() args_file = sys.argv[1] hivedb = sys.argv[2] mysqldb=sys.argv[3] mysqltable=sys.argv[4] def mysql_spark(table, hivedb, mysqldb, mysqltable): print "*********************************************************table = {} ***************************".format(table) df = sqlcontext.table("{}.{}".format(mysqldb, mysqltable)) df.registertemptable("mytemptable") sqlcontext.sql("create table {}.{} select * mytemptable".format(hivedb,table)) input = sc.textfile('/user/xxxxxxxx/mysql_spark/%s' %args_file).collect() table in input: mysql_spark(table, hivedb, mysqldb, mysqltable) sc.stop() shell script invoke pyspark script file run.
#!/bin/bash source /home/$user/mysql_spark/source.sh [ $# -ne 1 ] && { echo "usage : $0 table ";exit 1; } args_file=$1 timestamp=`date "+%y-%m-%d"` touch /home/$user/logs/${timestamp}.success_log touch /home/$user/logs/${timestamp}.fail_log success_logs=/home/$user/logs/${timestamp}.success_log failed_logs=/home/$user/logs/${timestamp}.fail_log #function status of job creation function log_status { status=$1 message=$2 if [ "$status" -ne 0 ]; echo "`date +\"%y-%m-%d %h:%m:%s\"` [error] $message [status] $status : failed" | tee -a "${failed_logs}" exit 1 else echo "`date +\"%y-%m-%d %h:%m:%s\"` [info] $message [status] $status : success" | tee -a "${success_logs}" fi } spark-submit --name "${args_file}" --master "yarn-client" /home/$user/mysql_spark/mysql_spark.py ${args_file} ${hivedb} ${mysqldb} ${mysqltable} g_status=$? log_status $g_status "spark job ${args_file} execution" sample log file:
connection spark ***************************table = table 1 ******************************** created dataframe created table delete temp directory ***************************table = table 2 ******************************** created dataframe created table delete temp directory ***************************table = table 3 ******************************** created dataframe created table delete temp directory expected output
table1.logfile
connection spark ***************************table = table 1 ******************************** created dataframe created table delete temp directory table2.logfile
***************************table = table 1 ******************************** created dataframe created table delete temp directory table3.logfile
***************************table = table 1 ******************************** created dataframe created table delete temp directory shutdown sparkcontext how can achieve this?
is possible so?
you can create new file , write data each iteration.
this simple example:
lis =['table1','table2'] table in lis: logfile = open(str(table)+".logfile",'w') logfile.write(str(table)) logfile.close() in code if implement same concept , pass file object mysql_spark function every iteration should work.
for table in input: logfile = open(str(table)+".logfile",'w') mysql_spark(table, hivedb, mysqldb, mysqltable, logfile) logfile.close()
Comments
Post a Comment