
This is useful in a common Python streaming workloads for example, Writing streaming aggregates in update mode using MERGE and foreachBatch.

However, you might have to initialize the model for every record batch, which introduces overhead. The binary file data source enables you to run model inference tasks in parallel from Spark tables using a scalar pandas UDF. In Databricks Runtime 5.5, we have added an option, recursiveFileLookup, to load files recursively from nested input directories.

In Databricks Runtime 5.4, we already made available the binary file data source to help ETL arbitrary files such as images, into Spark tables. Machine learning tasks, especially in the image and video domain, often have to operate on a large number of files. Faster model inference pipelines with improved binary file data source and scalar iterator pandas UDF (Public Preview)
