python - How to add new column to dataframe in pyspark -
this question has answer here:
i'm trying , long error:
df=df.withcolumn('newcolumnname', someother_df['time'])
and doesn't work. doing this:
df=df.withcolumn('newcolumnname', someother_df.select('time'))
gives me error: assertionerror: col should column
you seems combining 2 dataframes without common keys below code should work you.
import pyspark.sql.functions func df1 = sc.parallelize([('1234','13'),('6789','68')]).todf(['col1','col2']) df2 = sc.parallelize([('7777','66'),('8888','22')]).todf(['col3','col4']) # since there no common column between these 2 dataframes add row_index can joined df1=df1.withcolumn('row_index', func.monotonically_increasing_id()) df2=df2.withcolumn('row_index', func.monotonically_increasing_id()) # 'col3' second dataframe (i.e. df2) added first dataframe (i.e. df1) df1 = df1.join(df2["row_index","col3"], on=["row_index"]).drop("row_index") df1.show()
don't forget let know if solved problem :)
Comments
Post a Comment