Azure Data Factory and SQL Column Level Encryption
Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'
In the past, we've had to deal with Column Level Encryption in Azure SQL Server. I got looped into investigating a similar error in a test environment that is blocking the promotion of a feature.
Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Failed to decrypt column 'SomeColumn'. Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'. Verify the properties of the column encryption key and its column master key in your database. The last 10 bytes of the encrypted column encryption key are: 'F1-5C-6C-9F-6E-09-6A-0F-7F-C1'. One or more errors occurred.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Failed to decrypt column 'SomeColumn'. Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'. Verify the properties of the column encryption key and its column master key in your database. The last 10 bytes of the encrypted column encryption key are: 'F1-5C-6C-9F-6E-09-6A-0F-7F-C1'. One or more errors occurred.,Source=.Net SqlClient Data Provider,SqlErrorNumber=0,Class=11,ErrorCode=-2146232060,State=0,Errors=[{Class=11,Number=0,State=0,Message=Failed to decrypt column 'SomeColumn'.,},{Class=11,Number=0,State=0,Message=Failed to decrypt a column encryption key using key store provider: 'AZURE_KEY_VAULT'. Verify the properties of the column encryption key and its column master key in your database. The last 10 bytes of the encrypted column encryption key are: 'F1-5C-6C-9F-6E-09-6A-0F-7F-C1'.,},{Class=11,Number=0,State=0,Message=One or more errors occurred.,},],''Type=Microsoft.Azure.KeyVault.Models.KeyVaultErrorException,Message=Operation returned an invalid status code 'Forbidden',Source=Microsoft.Azure.KeyVault,'
Background
This test environment infrastructure has not been updated in quite some time. The checked infrastructure code was not up to date or consistent with the production environment. The environments consist of an Azure Data Factory instance that performs a copy between two Azure SQL Server databases with some ETL.
Troubleshooting
Verify Decryption
This is what the column values would look like encrypted:
I need to confirm I could decrypt the column value myself using SSMS and the Column Encryption Setting=Enabled
in the SQL connection string or the Enable Always Encrypted (column encryption)
option.
I had no problems seeing the decrypted value through SQL Server Management Studio.
Column Master Key
I needed to confirm which KeyVault stores the Column Master Key (CMK). You can do this through the UI through Security -> Always Encrypted Keys -> Column Master Keys -> Right click -> Script Column Master Key as
:
OR
SELECT * FROM sys.column_master_keys ORDER BY [name]
We care about the key_path
column which should link to a keyvault
resource in my case. Note that the secrets such as connection strings used by ADF are a separate KeyVault from the one containing the Column Master Key.
I'm assuming the other CMKs
are out of date but looked at the key_path
to confirm that they are pointing to the same keyvault
. We know the functionality works since I was able to decrypt through SSMS earlier.
ADF Identity Verify Access/Permissions
I had to make sure that the identity used by ADF had the proper permissions to the KeyVault. Specifically Decrypt
, Unwrap Key
, Verify
permissions under Cryptographic Operations and at least Get
and List
under Key Management Operations.
The ADF identity was not in the KeyVault policies. I added the identity and the permissions to it...
Same error...
Boo
SQL Roles and Permissions
I was able to decrypt the column values. My login
is associated with a user
created using:
CREATE LOGIN [your_add_login] FROM EXTERNAL PROVIDER;
Where you your_add_login
is something like user@sometenant.com
. The login is then associated with a user:
CREATE USER [your_user_or_group] FROM LOGIN [your_add_login];
Note: The user in this case could be an Azure Active Directory group.
The user is then granted specific permissions to view the master key and encryption key.
GRANT VIEW ANY COLUMN ENCRYPTION KEY DEFINITION TO [your_user]
GRANT VIEW ANY COLUMN MASTER KEY DEFINITION TO [your_user]
However, I created a specific role with these permissions:
CREATE ROLE [db_privilegeddatareader]
GRANT VIEW ANY COLUMN ENCRYPTION KEY DEFINITION TO [db_privilegeddatareader]
GRANT VIEW ANY COLUMN MASTER KEY DEFINITION TO [db_privilegeddatareader]
Then add the group or user to the role:
ALTER ROLE db_datawriter ADD MEMBER [your_user_or_group]
ALTER ROLE db_privilegeddatareader ADD MEMBER [your_user_or_group]
Older versions of SQL would use sp_addrolemember
:
EXEC sp_addrolemember 'db_datawriter', [your_user_or_group]
EXEC sp_addrolemember 'db_privilegeddatareader', [your_user_or_group]
The db_datawriter
role should allow writing to the encrypted column.
The ADF identity did not have a user (it does not need a login since the SQL connection string stored in KeyVault used an existing SQL login). I added the user
and added the user to the db_privilegeddatareader
role.
Same error.
Integration Runtime
The Always Encrypted
integration with ADF seems to require an ODBC Driver for the connection string. I saw Linked Services using these ODBC secrets, but the Production environment did not use these Linked Services for this pipeline. I updated the integration runtime on the VM through ADF. The VM itself also needed Windows Updates.
Still did not fix the pipeline step failure.
Use Service Principal
There is a Service Principal (SPN) used and it is used in production. I added the non-prod SPN and gave it the same access as the ADF identity in SQL and the KeyVault containing the Column Master Key.
Same error.
Firewall/Network
After banging my head. I began looking at the error message again.
Type=Microsoft.Azure.KeyVault.Models.KeyVaultErrorException,Message=Operation returned an invalid status code 'Forbidden',Source=Microsoft.Azure.KeyVault,
This led me to re-examine the firewall around KeyVault that stored the CMK. Since the pipeline was partially successful, this wasn't an SQL firewall issue. The firewall was enabled and IPs had been added. I needed to check what possible IP addresses Azure Data Factory used. I didn't see a list of possible outbound and inbound IP addresses, so I disabled the firewall and kicked off the pipeline again.
Success!!!
It's a firewall issue. I started to compare the configuration with the equivalent production KeyVault. That's when I noticed that production used a vnet
with subnets
. There is one vnet
that contains two subnets
. The subnet
that includes the IP Address used by ADF VM was the one I added KeyVault firewall.
Failure - Same error.
WTF
Looking closer at the subnet
it has no service endpoints configured. The one in production has SQL
, Storage
, and KeyVault
service endpoints. Ok, I'll just add them.
Nope, I don't have access nor any of my privileged accounts. The other subnet
contains IP addresses that didn't correspond to ADF, but had the correct service endpoints. Using this subnet
wouldn't work. I have a workaround for now and we'll get access to a privileged account that can add those service endpoints.
Overall, this took me several days of debugging since I tried and retried steps which led to...
Takeaways
Tech debt sucks. Especially for those who inherit the product/service.
Don't roll your own infrastructure deployment code that deviates too much from existing mechanisms (bicep/arm templates, Azure CLI).
Trace the problem from first-principles and targeted area and work your way out.
Take notes as you troubleshoot/debug. Note what has been done, commands, etc.
Pay special attention to the error message. In some UI, it's better to copy the entire message and add it to your notes.
Try adding
Diagnostic Logging
for the resources you may think are causing the issue.