Azure Data Factory and SQL Column Level Encryption

Azure Data Factory and SQL Column Level Encryption

Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'

·

6 min read

In the past, we've had to deal with Column Level Encryption in Azure SQL Server. I got looped into investigating a similar error in a test environment that is blocking the promotion of a feature.

Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Failed to decrypt column 'SomeColumn'. Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'. Verify the properties of the column encryption key and its column master key in your database. The last 10 bytes of the encrypted column encryption key are: 'F1-5C-6C-9F-6E-09-6A-0F-7F-C1'. One or more errors occurred.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Failed to decrypt column 'SomeColumn'. Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'. Verify the properties of the column encryption key and its column master key in your database. The last 10 bytes of the encrypted column encryption key are: 'F1-5C-6C-9F-6E-09-6A-0F-7F-C1'. One or more errors occurred.,Source=.Net SqlClient Data Provider,SqlErrorNumber=0,Class=11,ErrorCode=-2146232060,State=0,Errors=[{Class=11,Number=0,State=0,Message=Failed to decrypt column 'SomeColumn'.,},{Class=11,Number=0,State=0,Message=Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'. Verify the properties of the column encryption key and its column master key in your database. The last 10 bytes of the encrypted column encryption key are: 'F1-5C-6C-9F-6E-09-6A-0F-7F-C1'.,},{Class=11,Number=0,State=0,Message=One or more errors occurred.,},],''Type=Microsoft.Azure.KeyVault.Models.KeyVaultErrorException,Message=Operation returned an invalid status code 'Forbidden',Source=Microsoft.Azure.KeyVault,'

Background

This test environment infrastructure has not been updated in quite some time. The checked infrastructure code was not up to date or consistent with the production environment. The environments consist of an Azure Data Factory instance that performs a copy between two Azure SQL Server databases with some ETL.

Troubleshooting

Verify Decryption

This is what the column values would look like encrypted:

I need to confirm I could decrypt the column value myself using SSMS and the Column Encryption Setting=Enabled in the SQL connection string or the Enable Always Encrypted (column encryption) option.

I had no problems seeing the decrypted value through SQL Server Management Studio.

Column Master Key

I needed to confirm which KeyVault stores the Column Master Key (CMK). You can do this through the UI through Security -> Always Encrypted Keys -> Column Master Keys -> Right click -> Script Column Master Key as:

OR

SELECT * FROM sys.column_master_keys ORDER BY [name]

We care about the key_path column which should link to a keyvault resource in my case. Note that the secrets such as connection strings used by ADF are a separate KeyVault from the one containing the Column Master Key.

I'm assuming the other CMKs are out of date but looked at the key_path to confirm that they are pointing to the same keyvault. We know the functionality works since I was able to decrypt through SSMS earlier.

ADF Identity Verify Access/Permissions

I had to make sure that the identity used by ADF had the proper permissions to the KeyVault. Specifically Decrypt, Unwrap Key, Verify permissions under Cryptographic Operations and at least Get and List under Key Management Operations.

The ADF identity was not in the KeyVault policies. I added the identity and the permissions to it...

Same error...

Boo

SQL Roles and Permissions

I was able to decrypt the column values. My login is associated with a user created using:

CREATE LOGIN [your_add_login] FROM EXTERNAL PROVIDER;

Where you your_add_login is something like user@sometenant.com. The login is then associated with a user:

CREATE USER [your_user_or_group] FROM LOGIN [your_add_login];

Note: The user in this case could be an Azure Active Directory group.

The user is then granted specific permissions to view the master key and encryption key.

GRANT VIEW ANY COLUMN ENCRYPTION KEY DEFINITION TO [your_user]
GRANT VIEW ANY COLUMN MASTER KEY DEFINITION TO [your_user]

However, I created a specific role with these permissions:

CREATE ROLE [db_privilegeddatareader]
GRANT VIEW ANY COLUMN ENCRYPTION KEY DEFINITION TO [db_privilegeddatareader]
GRANT VIEW ANY COLUMN MASTER KEY DEFINITION TO [db_privilegeddatareader]

Then add the group or user to the role:

ALTER ROLE db_datawriter ADD MEMBER [your_user_or_group]
ALTER ROLE db_privilegeddatareader ADD MEMBER [your_user_or_group]

Older versions of SQL would use sp_addrolemember:

EXEC sp_addrolemember 'db_datawriter', [your_user_or_group]
EXEC sp_addrolemember 'db_privilegeddatareader', [your_user_or_group]

The db_datawriter role should allow writing to the encrypted column.

The ADF identity did not have a user (it does not need a login since the SQL connection string stored in KeyVault used an existing SQL login). I added the user and added the user to the db_privilegeddatareader role.

Same error.

Integration Runtime

The Always Encrypted integration with ADF seems to require an ODBC Driver for the connection string. I saw Linked Services using these ODBC secrets, but the Production environment did not use these Linked Services for this pipeline. I updated the integration runtime on the VM through ADF. The VM itself also needed Windows Updates.

Still did not fix the pipeline step failure.

Use Service Principal

There is a Service Principal (SPN) used and it is used in production. I added the non-prod SPN and gave it the same access as the ADF identity in SQL and the KeyVault containing the Column Master Key.

Same error.

Firewall/Network

After banging my head. I began looking at the error message again.

Type=Microsoft.Azure.KeyVault.Models.KeyVaultErrorException,Message=Operation returned an invalid status code 'Forbidden',Source=Microsoft.Azure.KeyVault,

This led me to re-examine the firewall around KeyVault that stored the CMK. Since the pipeline was partially successful, this wasn't an SQL firewall issue. The firewall was enabled and IPs had been added. I needed to check what possible IP addresses Azure Data Factory used. I didn't see a list of possible outbound and inbound IP addresses, so I disabled the firewall and kicked off the pipeline again.

Success!!!

Woohoo….. – Unwind Your Mind

It's a firewall issue. I started to compare the configuration with the equivalent production KeyVault. That's when I noticed that production used a vnet with subnets. There is one vnet that contains two subnets. The subnet that includes the IP Address used by ADF VM was the one I added KeyVault firewall.

Failure - Same error.

WTF

Looking closer at the subnet it has no service endpoints configured. The one in production has SQL, Storage, and KeyVault service endpoints. Ok, I'll just add them.

Oscars 2022: Meme war erupts on Twitter after Smith slaps comedian Chris  Rock for mocking his wife

Nope, I don't have access nor any of my privileged accounts. The other subnet contains IP addresses that didn't correspond to ADF, but had the correct service endpoints. Using this subnet wouldn't work. I have a workaround for now and we'll get access to a privileged account that can add those service endpoints.

Overall, this took me several days of debugging since I tried and retried steps which led to...

Takeaways

  • Tech debt sucks. Especially for those who inherit the product/service.

  • Don't roll your own infrastructure deployment code that deviates too much from existing mechanisms (bicep/arm templates, Azure CLI).

  • Trace the problem from first-principles and targeted area and work your way out.

  • Take notes as you troubleshoot/debug. Note what has been done, commands, etc.

  • Pay special attention to the error message. In some UI, it's better to copy the entire message and add it to your notes.

  • Try adding Diagnostic Logging for the resources you may think are causing the issue.

References