How to Use UDF in Spark Without Register Them
Here, we will demonstrate the use of UDF via a small example.
Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.
// code snippet how to create UDF in Spark without registering them
Scala
x
15
1
import org.apache.spark.sql.types._
2
import org.apache.spark.sql.functions._
3
4
val rowKeyGenerator = udf((n: String) =>
5
{
6
7
val r = scala.util.Random
8
val randomNB = r.nextInt( (100) ).toString()
9
val deviceNew = randomNB.concat(n)
10
deviceNew
11
}, StringType)
12
13
// "Name" is column name of type string at source DF.
14
val ds2=dfFromFile.withColumn("NameNewValue",rowKeyGenerator(col("Name")))
15
ds2.show()
Note: We can also change the type from String to any other supported type, as per individual requirements. Make sure while developing that we handle null
cases, as this is a common cause of errors. UDFs are a black box for the Spark engine, whereas functions that take a Column
argument and return a Column
are not a black box for Spark. It is always recommended to use Spark's Native API/Expression over UDF's with contrast to performance parameters.