Anonymization/masking of data in SnowPark

by Mauricio Rojas, on Sep 28, 2022 11:41:16 AM

 

Data governance is a strategy for handling data within an organization. It is a set of rules, policies, standards, practices, etc. whose main purpose is to ensure data has high quality and integrity, is safely stored and there are no ambiguities in the meaning of common terms.

Applying this strategy is a long process, engaging the whole organization, especially IT and data-consuming departments. Certain data governance tools help apply these hypothetical plans in real life.

But.... it is something that Snowflake really takes into consideration, and there are some features to can leverage for this purpose.

Snowflake’s approach to access control combines aspects from both of the following models:

  • Discretionary Access Control (DAC): Each object has an owner, who can in turn grant access to that object.
  • Role-based Access Control (RBAC): Access privileges are assigned to roles, which are in turn assigned to users.

But you can also secure your data at different levels:

  • Row Level Access Policies: with these policies, the data at rest is not masked, but at query runtime, Snowflake will  generate the query output for the user, and the query output only contains rows based on the policy definition evaluating to TRUE

    For example:
    And it can be set on tables or views:
    -- table
    create table sales (
      customer   varchar,
      product    varchar,
      spend      decimal(20, 2),
      sale_date  date,
      region     varchar
    )
    with row access policy rap_sales_manager_regions1 on (region);
    
    -- view
    create view sales_v with row access policy rap_sales_manager_regions1 on (region) as select * from sales;
    
  • Column Level Access Security: Also known as Dynamic Data Masking. In this case sensitive data in plain text is loaded into Snowflake, and it is dynamically masked/encrypted at the time of query for unauthorized users. You can learn a lot from this blog post by Vikas Jain
    Policies

For example with Dynamic Data Masking you can ensure under which circumstances data is actually shown.

Masking-Policy-Example

You can apply a masking policy directly on a column, or more recently with new feature like object_tagging, you can associate masking policies to a tag, so they will apply to object that have that tag applied to them.

How does this data-masking work?

I think is good to understand a little of how this works. In general Snowflake transparently rewrites the query at runtime wherever it finds a column with a masking policy attached. For authorized users, query results return sensitive data in plain text, while for unauthorized users sensitive data is masked, encrypted, or tokenized, as can be seen on the table from Vikas's article.

 

Your policy is not restricted to work on just one column. It can use several values a.k.a conditional data masking:

-- Conditional Masking

create masking policy email_visibility as
(email varchar, visibility string) returns varchar ->
  case
    when current_role() = 'ADMIN' then email
    when visibility = 'Public' then email
    else '***MASKED***'
  end;

And to apply it you can use an statement like:

alter table if exists customer_info 
modify column email 
set masking policy
 email_visibility using (email,visibility);

 

This is just a brief introduction. However it shows how sensitive information can be easily taken in Snowflake and protected with great ease. This can also apply to data shares eliminating the need to copy data.

 

Topics:Snowflakesnowparkdata-governance

Comments

Subscribe to Mobilize.Net Blog

More...

More...
FREE CODE ASSESSMENT TOOL