If you're an avid reader of this blog, you may have noticed that in a previous blog post we discussed how to load data into Snowflake using their Python Connector.
It was mentioned in that post that we could create the table that we are loading and navigate Snowflake's account settings all at the same time we load the data, but it was not discussed.
Well, by popular demand, this blog post is born and will discuss it. This will be the third such blog post to deal with the topic of loading data into Snowflake. (The first part can be found here, and the second part is linked to above.) Feel free to social distance your way on over there.) Are there other blogs and documentation out there that chronicle this journey of loading data into Snowflake? Yes. Lots. Tons, even.
So why flood the limitless space of the web with more of this? Sometimes, it's helpful to take direction from someone who has already been where you think you need to go. (Here's a great case study on that topic.)
Before we get to the code, let's make sure that it's obvious how Snowflake divides and conquers when it comes to setting things up. Start with the account level, your organization's portal into Snowflake. It will be the name of your environment (company.snowflake.com), and how you will access your data.
Are there different types of accounts? Sort of. Depends on the version of Snowflake that you have. From the account level, the next level is the user. You have a specific username and password assigned to you as a user, but there are many different roles that Snowflake uses to define access to the data. Your username may have more than one role attached to it, so it helps to define specifically which role you're using, even if you're confident the default role is correct.
So, you've got an account and a login. Believe it or not, that's really all that you need to get started using the Python connector. You can define the warehouse, database, schema, and table you want to load data into by writing some very simple SQL commands and utilizing the IF NOT EXISTS suffix to your CREATE statements. If you know any SQL already, this should be pretty straightforward... especially if you read last week's post. There is good documentation on Snowflake's website that cover commands such as CREATE WAREHOUSE, CREATE DATABASE, CREATE SCHEMA, and CREATE TABLE (Snowflake uses the DATABASE.SCHEMA.TABLE convention), so it should be pretty simple to execute a SQL command using Snowflake's Python connector (again, as was covered in a previous iteration of this blog).
Additionally, Snowflake's CREATE TABLE (and most any other CREATE statement) allows you to write it as CREATE OR REPLACE TABLE which can be helpful if you're not sure if the table you're trying to create already exists. Of course, if you want to append data to the table, then you probably don't want to replace it.
You can also CREATE anything with an IF NOT EXISTS suffix. This will only create the object if it does not exist. Really helpful when constructing a script to run anytime.
The only thing you can't really do from the Python connection is create an account. You will already need an account and password from Snowflake before you can connect through Python. Luckily, Snowflake will give you a free trial of an account if you don't already have an account. I would assume if you're ready to load data and set Python up to automate it for you, you most most likely have an account already.
So let's take a look at what this all looks like. The load data piece of the code will be identical to one of the previous blog posts, but the definitions at the top will be a bit different. Note that each of these "setup" SQL commands that we pass to Snowflake have to be executed one at a time.
The Python connector will give you this lovely error message: "Multiple SQL statements in a single API call are not supported; use one API call per statement instead." Also remember that there are numerous properties that you can adjust/specify for each of the CREATE commands. Refer to the documentation linked above.
Here's a script that walks you through it. Note the highlighted portions will have to be replaced with your data/information.
"""
@author: Mobilize.Net
@title: Script for creating a table in Snowflake, and
uploading a single large file using the snowflake connector and a pandas dataframe.
"""
# Importing the required packages for all your data framing needs.
import pandas as pd
# The Snowflake Connector library.
import snowflake.connector as snow
from snowflake.connector.pandas_tools import write_pandas
## Phase I: Setup and Connect
# The connector...
conn = snow.connect(user="YOUR_USERNAME",
password="YOUR_PASSWORD",
account="YOUR_ACCOUNT")
# (FYI, if you don't want to hard code your password into a script, there are
# some other options for security.)
# Create a cursor objectcur = conn.cursor()
# Now let's define all of the setup properties that we want to pass
# to Snowflake for each of these properties.# Remember there are properties for each of these that you can alter
# by checking the documentation and defining those settings.
# Starting with the Role.
sql = "USE ROLE SYSADMIN"
cur.execute(sql)# And moving on to define and select the warehouse we want to use.
# We do want to specify a size with the warehouse, but feel free
# to change the warehouse size.sql = """CREATE WAREHOUSE IF NOT EXISTS WAREHOUSE_NAME
WITH WAREHOUSE_SIZE = XSMALL"""cur.execute(sql)
# And then select it.
sql = "USE WAREHOUSE WAREHOUSE_NAME"
cur.execute(sql)
# See if the desired database exists.
sql = "CREATE DATABASE IF NOT EXISTS DATABASE_NAME"
cur.execute(sql)
# And then use it.
sql = "USE DATABASE DATABASE_NAME"
cur.execute(sql)
# Do the same with the Schema.
sql = "CREATE SCHEMA IF NOT EXISTS SCHEMA_NAME"
cur.execute(sql)
# And then use it.
sql = "USE SCHEMA SCHEMA_NAME"
cur.execute(sql)
# And finally, the table.
sql = "CREATE TABLE IF NOT EXISTS TABLE_NAME"
cur.execute(sql)
# Create a cursor object.
cur = conn.cursor()
## Phase II: Upload from the Exported Data File.
# Let's import a new dataframe so that we can test this.
original = r"C:\Users\you\awesome_coding\file.csv" # <- Replace with your path.
delimiter = "," # Replace if you're using a different delimiter.
# Get it as a pandas dataframe.
total = pd.read_csv(original, sep = delimiter)
# Drop any columns you may not need (optional).
# total.drop(columns = ['A_ColumnName',
# 'B_ColumnName'],
# inplace = True)
# Rename the columns in the dataframe if they don't match your existing table.
# This is optional, but ESSENTIAL if you already have created the table format
# in Snowflake.# total.rename(columns={"A_ColumnName": "A_COLUMN",
# "B_ColumnName": "B_COLUMN"},
# inplace=True)
# Actually write to the table in snowflake.
write_pandas(conn, total, "TABLE_NAME")
# (Optionally, you can check to see if what you loaded is identical
# to what you have in your pandas dataframe. Perhaps... a topic for a future
# blog post.## Phase III: Turn off the warehouse.
# Create a cursor object.
cur = conn.cursor()
# Execute a statement that will turn the warehouse off.
sql = "ALTER WAREHOUSE WAREHOUSE_NAME SUSPEND"
cur.execute(sql)
# Close your cursor and your connection.cur.close()
conn.close()
# And that's it. Much easier than using the load data utility, but maybe
# not as user friendly.
For this script to work, all you need to have is an account, a txt file that you're looking to upload, and the ability to rename all those strings highlighted in yellow above. As usual, is this solution the most elegant? Probably not, but hopefully it helps you understand the process better.
And if you're just getting started with a migration to Snowflake, check out SnowConvert. You can try SnowConvert for Teradata in assessment mode for free to get a gauge of the scope of your migration. SnowConvert is a tool that can help you start your own migration from Teradata or Oracle.
Let us know if there's anything else we can illuminate for you on the road to the cloud.
Stay safe. Keep coding.