Multiple OR operation in parallel in Scala/Spark -


def notnullcheck(df:dataframe,columns:column*) {     df.filter(df(columns(0).isnotnull) || df(columns(1).isnotnull)) } 

how generalize above method if have 20 columns , want avoid writing 20 times same condition.

thanks

assuming columns of type list[column]

val columns = list(col("a"), col("b")) 

you can

val conditions = columns.foldleft(lit(true))((z,c) => z.isnotnull || c.isnotnull) 

behind this:

org.apache.spark.sql.column = ((((true not null) or (a not null)) not null) or (b not null)) 

Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -