Multiple OR operation in parallel in Scala/Spark -
def notnullcheck(df:dataframe,columns:column*) { df.filter(df(columns(0).isnotnull) || df(columns(1).isnotnull)) }
how generalize above method if have 20 columns , want avoid writing 20 times same condition.
thanks
assuming columns of type list[column]
val columns = list(col("a"), col("b"))
you can
val conditions = columns.foldleft(lit(true))((z,c) => z.isnotnull || c.isnotnull)
behind this:
org.apache.spark.sql.column = ((((true not null) or (a not null)) not null) or (b not null))
Comments
Post a Comment