Will Valori

Preserving privacy during ad attribution

Ad attribution is business-critical: it tells you how you acquire customers. But typical approaches rely on sharing sensitive data, increasing security risks and potentially violating privacy laws. How can you compare two data sets without sharing the private data?

The Challenge

Ad attribution is a critical part of your business. It tells you how you acquire customers.

When you run attribute ads, as an ad buyer, you compare a list of your customers with a list of your advertisers impressions, counting matches. In the typical approach, the ad buyer sends raw or hashed customer data to the advertiser. This can violate privacy laws and it creates a new store of sensitive data - in a third party. That increases the surface area for attack and relinquishes control over sensitive data from both parties.

It begs the question: how can you compare two data sets without sharing the private data? I’m using ad attribution as an example, but the same approach applies whenever you need to compare two lists of user data.

The Answer: tokenization

Tokenization is the act of replacing sensitive PII data with a form-matching token. It lets you unlock the power of your data without sacrificing privacy or security. For example you might tokenize an email address, like will@userclouds.com, to a unique, random string of characters, that still looks like an email address: easfhrtyu@radfshtyui.com. The token is not PII, but you can analyze, explore or use it just as easily and effectively as the PII itself.

Tokenization is different from hashing in a few key ways:

  1. The token is just a random string - it’s not algorithmically defind from the PII
  2. It’s single use, so it can’t be matched across over datasets
  3. It can’t be resolved back into the original PII without access to the token vault
  4. In most cases, it’s considered privacy preserving

Example: email ads

Suppose we have a list of paying customer emails and our advertiser has a list of email impressions. Our goal is to find and count the matches without sharing emails. A match might be defined simply, like two matching emails, or it might be defined more precisely, like “two matching emails where the ad was served up to 2 weeks before the purchase.”

So how do you do it safely?

Firstly, both parties hash and tokenize their datasets, using the same token generation policy. They will pick a generation policy that will always produce the same token for the same input (within their pre-defined match window). They can include their matching/attribution logic in the generation policy. For example, they might choose a policy that only generates matching tokens if the ad was served up to 2 weeks before the purchase. If they upload a hash that doesn’t match, they will get a unique token that the other party doesn’t have a match for.

Depending on user disclosures, they may configure their Token Vault in one of two ways:

  1. Both parties can see all matching tokens, but not non-matching tokens
  2. Both parties can only see the total number of matching tokens

If they choose the first option, both parties can create a new table with just the matched tokens to perform extra analysis on the tokenized dataset, or resolve the tokens back into the original data and see which customers matched.

Critically, in either case, neither party can see the other parties’ data. This means they get a shared ID space, without sharing sensitive IDs.

Outcomes

In this example, we have successfully attributed this providers ads. We've adhered to zero trust design principles, minimized data sprawl and maintained compliance with data regulations (depending on disclosures).

On top of that:

  • No raw PII has been exchanged
  • Neither partner knows the size of others’ data
  • Single-use tokens cannot be used or matched outside of this analysis
  • An easy mutual deletion method has been created
  • Depending on disclosures, compliance with privacy laws like GDPR & CCPA has been maintained
  • Each party's surface area for an attack has been minimized

... and all while maintaining engineering velocity!

If you want to use tokenization to simplify your ad attribution, send us a message at info@userclouds.com .

Reach out today