How to Use UDF in Spark Without Register Them

2024-12-02

Here, we will demonstrate the use of UDF via a small example.

Use Case: We need to change the value of an existing column of DF/DS to add some prefix or suffix to the existing value in a new column.

// code snippet how to create UDF in Spark without registering them

     Scala 
   
x
 
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val rowKeyGenerator = udf((n: String) =>
{
  val r = scala.util.Random
  val randomNB =  r.nextInt( (100) ).toString()
  val deviceNew = randomNB.concat(n)
  deviceNew
}, StringType)
// "Name" is column name of type string at source DF.
val ds2=dfFromFile.withColumn("NameNewValue",rowKeyGenerator(col("Name")))
ds2.show()

Note: We can also change the type from String to any other supported type, as per individual requirements. Make sure while developing that we handle null cases, as this is a common cause of errors. UDFs are a black box for the Spark engine, whereas functions that take a Column argument and return a Column are not a black box for Spark. It is always recommended to use Spark's Native API/Expression over UDF's with contrast to performance parameters.