python - How to add new column to dataframe in pyspark -


this question has answer here:

i'm trying , long error:

df=df.withcolumn('newcolumnname', someother_df['time']) 

and doesn't work. doing this:

df=df.withcolumn('newcolumnname', someother_df.select('time')) 

gives me error: assertionerror: col should column

you seems combining 2 dataframes without common keys below code should work you.

import pyspark.sql.functions func  df1 = sc.parallelize([('1234','13'),('6789','68')]).todf(['col1','col2']) df2 = sc.parallelize([('7777','66'),('8888','22')]).todf(['col3','col4'])  # since there no common column between these 2 dataframes add row_index can joined df1=df1.withcolumn('row_index', func.monotonically_increasing_id()) df2=df2.withcolumn('row_index', func.monotonically_increasing_id())  # 'col3' second dataframe (i.e. df2) added first dataframe (i.e. df1) df1 = df1.join(df2["row_index","col3"], on=["row_index"]).drop("row_index") df1.show() 


don't forget let know if solved problem :)


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -