Write a Python UDF for Snowflake with BlackDiamond Studio
by Rodrigo Meléndez, on Sep 13, 2022 3:00:00 AM
Last week, we published a blog post on how to write a Scala UDF for Snowflake. This week? We've taken our talents to Python. We'll take the same function (IS_PALINDROME) that we wrote last week, and quickly deploy it as a UDF in Snowflake written in Python.
As a quick summary of the last post:
- BlackDiamond Studio gives you a complete developer environment that's all ready and setup to run any code that can be sent to Snowflake.
- In Snowflake, you can write custom User Defined Functions (UDFs) in any of the following languages:
- SQL
- JavaScript
- Java
- Python
- Scala
- Our IS_PALINDROME function is going to determine if a string is a palindrome. (Not sure what a palindrome is?) This is a function that does not exist in Snowflake.
We can use the following code to define this function in Python:
def is_palindrome(text): return text == text[::-1]
Creating a New BlackDiamond Studio Project
We should run this code to test if it works the way we want it to. If you were going to test this code as a UDF in Snowflake locally, you would need to install the snowflake-snowpark-python library using conda or pip. You would also need to setup a connection with your Snowflake account to create the UDF in that account using the above shown Python code. However, since we're going to use BlackDiamond Studio, the Python environment is already setup and we can quickly connect our Snowflake account.
To walk through a full tutorial on how to create your own project in BlackDiamond Studio, you can visit our documentation site. Or re-visit the blog post from last week that shows you how to write a Scala UDF. The only change from the previous week's post in the "Creating a New BlackDiamond Studio Project" section will be to choose a "Basic Python Template" instead of a Scala Template:
Running the Python Code in BlackDiamond Studio
Now that the project is created and we can open our IDE, we can add a new file called as palindrome.py to our existing Python project in the same folder where the main.py file is located. Let's create a new file in the folder workspace with the Python code shown above (and repeated here):
def is_palindrome(text): return text == text[::-1]
We can test that this code works. Let's modify the main.py file. Replace the existing code with the following:
from palindrome import is_palindrome print("Hello, World!") print(is_palindrome("stats")) print(is_palindrome("statistics"))
Now run the code and check if it works. To run the code in BlackDiamond Studio, there are many options. For now, let's open the “Run and Debug” tab located on the activity bar on the left side. Then click on the blue text that says “create a launch.json file”:
A debug configuration will show up, so let’s choose the Python File option.
A JSON file called launch.json will appear in the file explorer. Now change the value in the program property located in line 11 (that before was called as “${file}”) to “main.py”:
Finally, let's click on the "run" box as shown in red by the following image:
When you run the code, you should get an output in the terminal that looks like the following:
Hello, World! True False
This confirms that the code is working correctly, since the palindrome "stats" was identified as a palindrome and "statistics" was not.
Creating the Python UDF
Next, let's create the UDF in our Snowflake account. You can right-click on any .py file (for instance, main.py) and click on “BlackDiamond: Export Python UDF”:
This will create a template inside the folder called target located in target/UDF-Export-Snowflake-Python.sql:
You can select the generated UDF template file and see the output format:
Now you can replace the placeholders in the generated UDF template file with the actual values that you want to use. It's important to note that it is not necessary to replace all the placeholders. Some placeholders will be filled in automatically when you specified a valid Snowflake connection with your account information. In this example, section 3.1 of the template can be deleted because this entire section will not be needed.
The following placeholders will be replaced automatically when we have an active Snowflake connection:
- <database-name>: The name of the database in which you will create the UDF. This database should already exist on your account.
- <schema-name>: The name of the schema in which you will create the UDF. This schema should already exist on the previously specified database.
The following placeholders will not be replaced automatically when we have an active Snowflake connection:
- <stage-name>: The name of the stage to which your Python code will be uploaded.
- <function-name>: The name of the function you wrote. In this case, it would be “is_palindrome”.
- <function-parameters>: The names and data types of the parameters for the function. In this case, this value should be “text STRING”, since there is only one parameter of type “String” and its name is "text".
- <function-return-type>: The return type for the function. In this case it would be “BOOLEAN”, the equivalent Snowflake data type for Python's “Boolean” type.
The following image represents the last part of the template, with the placeholders replaced by the actual values:
When you are finished replacing the placeholders with the actual values, you can right-click on the template file and choose “BlackDiamond: Deploy”, which will cause the SQL script to be run with the credentials you provided while setting up the project:
When the script finishes running, the UDF will be created in the database and schema you specified in your Snowflake account. You can test it by running the following commands in the Snowflake console:
SELECT DATABASE_TEST.PYTHON_TEST.is_palindrome('stats'); -- Results in TRUE SELECT DATABASE_TEST.PYTHON_TEST.is_palindrome('hello'); -- Results in FALSE SELECT DATABASE_TEST.PYTHON_TEST.is_palindrome('statistics'); -- Results in FALSE
You can also run these same commands from BlackDiamond Studio, by writing that code in an SQL file, selecting each statement separately, right clicking, and choosing “Run Selected Query”.
Conclusion
Congratulations are in order! You have successfully written your own UDF using the Python programming language. For more information about this topic, you can read Snowflake's official documentation about the creation of UDFs.