So you want to run Spark with Snowflake?

by John Browne, on Oct 24, 2022 8:38:34 AM

spark_900px

Spark is a big deal in big data and so is Snowflake, but what if you want to run your Spark SQL queries with a prebuilt connection to a Snowflake environment? 

Well, it turns out there's a hard way and an easy way. 

Before we look at the easy way, let's see what it's compared to. We'll use Spark Python (PySpark) for this example.

The hard way to run Spark with Snowflake

Quick refresher: what is Spark and why do we care about it if we are using Snowflake?

Apache Spark is an open-source solution for running really big workloads on data, utilizing clusters of machines to ramp up the power and performance. Running Spark queries on data in Snowflake is interesting as companies are quickly finding that keeping their databases in Snowflake storage is cost effective and easily scaled.

Here are the steps to get your Spark cluster to work with Snowflake:

  1. Figure out which version of the Snowflake Spark Connector (SSC) you need, depending on which versions of Scala and Snowflake you are running. 
  2. Download and install the jar files for both the Snowflake connector and the JDBC driver on your Spark cluster.
  3. Ensure the correct environment variables are in your Class Path.
  4. Install the PySpark script in your Spark distribution.
  5. Write and run Python code to import all the necessary libraries to establish the correct context.
  6. Create a Python dictionary that includes all the necessary parameters Snowflake requires for a connection.
  7. Read in a Snowflake table into a Spark dataframe to verify the connection is live. Don't forget to set the dbtable option on the dataframe.

Sounds simple, right?

The easy way to use Spark and Snowflake

Get BlackDiamond Studio and here's all you'll need to know:

  1. Your Snowflake login info.

That's it. BlackDiamond Studio will do the rest, using a prebuilt template and an instant git repo we'll set-up for you. All the connection stuff is built right in, so you can spend your time getting work done instead of troubleshooting configurations.

Want to try out Snowflake and Spark?

Try BlackDiamond Studio today. The fastest, easiest way to use Snowflake.

 

Topics:SnowflakesnowparkApache Spark

Comments

Subscribe to Mobilize.Net Blog

More...

More...
FREE CODE ASSESSMENT TOOL