Ad attribution is a critical part of your business. It tells you how you acquire customers.
When you run attribute ads, as an ad buyer, you compare a list of your customers with a list of your advertisers impressions, counting matches. In the typical approach, the ad buyer sends raw or hashed customer data to the advertiser. This can violate privacy laws and it creates a new store of sensitive data - in a third party. That increases the surface area for attack and relinquishes control over sensitive data from both parties.
It begs the question: how can you compare two data sets without sharing the private data? I’m using ad attribution as an example, but the same approach applies whenever you need to compare two lists of user data.
Tokenization is the act of replacing sensitive PII data with a form-matching token. It lets you unlock the power of your data without sacrificing privacy or security. For example you might tokenize an email address, like firstname.lastname@example.org, to a unique, random string of characters, that still looks like an email address: email@example.com. The token is not PII, but you can analyze, explore or use it just as easily and effectively as the PII itself.
Tokenization is different from hashing in a few key ways:
Suppose we have a list of paying customer emails and our advertiser has a list of email impressions. Our goal is to find and count the matches without sharing emails. A match might be defined simply, like two matching emails, or it might be defined more precisely, like “two matching emails where the ad was served up to 2 weeks before the purchase.”
So how do you do it safely?
Firstly, both parties hash and tokenize their datasets, using the same token generation policy. They will pick a generation policy that will always produce the same token for the same input (within their pre-defined match window). They can include their matching/attribution logic in the generation policy. For example, they might choose a policy that only generates matching tokens if the ad was served up to 2 weeks before the purchase. If they upload a hash that doesn’t match, they will get a unique token that the other party doesn’t have a match for.
Depending on user disclosures, they may configure their Token Vault in one of two ways:
If they choose the first option, both parties can create a new table with just the matched tokens to perform extra analysis on the tokenized dataset, or resolve the tokens back into the original data and see which customers matched.
Critically, in either case, neither party can see the other parties’ data. This means they get a shared ID space, without sharing sensitive IDs.
In this example, we have successfully attributed this providers ads. We've adhered to zero trust design principles, minimized data sprawl and maintained compliance with data regulations (depending on disclosures).
On top of that:
... and all while maintaining engineering velocity!
If you want to use tokenization to simplify your ad attribution, send us a message at firstname.lastname@example.org .