linux - Get logs of a function separately each time it is iterated in python -

January 15, 2012

i have pyspark script below. in script looping through input file table names , executing code.

now want collect logs separately each time function mysql_spark iterated.

for example:

input file

table1 table2 table3

now when execute pyspark script having logs 3 tables in single file.

what want 3 separate log files 1 each table

pyspark script:

#!/usr/bin/env python import sys pyspark import sparkcontext, sparkconf pyspark.sql import hivecontext conf = sparkconf() sc = sparkcontext(conf=conf) sqlcontext = hivecontext(sc)  #condition specify exact number of arguments in spark-submit command line if len(sys.argv) != 5:     print "invalid number of args......"     print "usage: spark-submit import.py arguments"     exit() args_file = sys.argv[1] hivedb = sys.argv[2] mysqldb=sys.argv[3] mysqltable=sys.argv[4]  def mysql_spark(table, hivedb, mysqldb, mysqltable):      print "*********************************************************table = {} ***************************".format(table)      df = sqlcontext.table("{}.{}".format(mysqldb, mysqltable))      df.registertemptable("mytemptable")      sqlcontext.sql("create table {}.{} select * mytemptable".format(hivedb,table))  input = sc.textfile('/user/xxxxxxxx/mysql_spark/%s' %args_file).collect()  table in input:     mysql_spark(table, hivedb, mysqldb, mysqltable)  sc.stop()

shell script invoke pyspark script file run.

#!/bin/bash  source /home/$user/mysql_spark/source.sh [ $# -ne 1 ] && { echo "usage : $0 table ";exit 1; }  args_file=$1  timestamp=`date "+%y-%m-%d"` touch /home/$user/logs/${timestamp}.success_log touch /home/$user/logs/${timestamp}.fail_log success_logs=/home/$user/logs/${timestamp}.success_log failed_logs=/home/$user/logs/${timestamp}.fail_log  #function status of job creation function log_status {        status=$1        message=$2        if [ "$status" -ne 0 ];                 echo "`date +\"%y-%m-%d %h:%m:%s\"` [error] $message [status] $status : failed" | tee -a "${failed_logs}"                 exit 1                 else                     echo "`date +\"%y-%m-%d %h:%m:%s\"` [info] $message [status] $status : success" | tee -a "${success_logs}"                 fi }  spark-submit --name "${args_file}" --master "yarn-client" /home/$user/mysql_spark/mysql_spark.py ${args_file} ${hivedb} ${mysqldb} ${mysqltable}   g_status=$? log_status $g_status "spark job ${args_file} execution"

sample log file:

connection spark ***************************table = table 1 ******************************** created dataframe created table delete temp directory ***************************table = table 2 ******************************** created dataframe created table delete temp directory ***************************table = table 3 ******************************** created dataframe created table delete temp directory

expected output

table1.logfile

connection spark ***************************table = table 1 ******************************** created dataframe created table delete temp directory

table2.logfile

***************************table = table 1 ******************************** created dataframe created table delete temp directory

table3.logfile

***************************table = table 1 ******************************** created dataframe created table delete temp directory shutdown sparkcontext

how can achieve this?

is possible so?

you can create new file , write data each iteration.

this simple example:

lis =['table1','table2']  table in lis:     logfile = open(str(table)+".logfile",'w')     logfile.write(str(table))     logfile.close()

in code if implement same concept , pass file object mysql_spark function every iteration should work.

for table in input:     logfile = open(str(table)+".logfile",'w')     mysql_spark(table, hivedb, mysqldb, mysqltable, logfile)     logfile.close()

Search This Blog

Insert

linux - Get logs of a function separately each time it is iterated in python -

Comments

Post a Comment

Popular posts from this blog

service - Android MediaPlayer calls onCompletion before it already finished -

javascript - Training Neural Network to play flappy bird with genetic algorithm - Why can't it learn? -

javascript - Create a stacked percentage column -