Vladimir Fedorov

The Four Pillars of Private-By-Design Architecture

Maintaining privacy and compliance is hard, but the right software architecture can make it simple. Four software principles can strengthen your data security and privacy posture - all while accelerating your engineering velocity.

Maintaining privacy, compliance and data security is nearly impossible when your infrastructure doesn't help. Over time, more and more copies of sensitive data get created. Regulations change, new vendor relationships form, and system architecture evolves. Maintenance costs explode, mistakes become inevitable and remedial human approvals flows begin to suffocate the organization.

Luckily, the right software architecture can vastly simplify protecting sensitive data. Applying four foundational principles will strengthen your privacy posture, while reducing costs and streamlining operations.

So how do you know if your system could be more efficient and robust? Typically, we see companies realizing huge benefits if they are already see one or more of these frictions:

  • The company is losing track of where its sensitive data lives. 
  • Serving data download and deletion requests requires coordination between multiple teams
  • Whenever the rules change, engineers have to update policy code in multiple places, systems and languages
  • After rule changes, the team does not always feel confident that they have 'caught' every instance of policy code
  • Extensive, human privacy review processes have evolved to maintain privacy
  • These review processes are slowing down feature releases and data analyses
  • Privacy and compliance work is affecting engineering velocity

If you have observed any of these signals in your organization, adopting these principles will strengthen your privacy posture, accelerate your engineering velocity and reduce costs.

1. Keep your sensitive data in one data store (or as few stores as possible)

The more copies of sensitive data you have, the more code you need to protect them, and the more vulnerable you become to security flaws. Every extra copy of sensitive data requires consent governance, usage reporting and a mechanism for download & deletion requests.

In typical infrastructure, sensitive data is replicated throughout the system. Each copy then requires bespoke protection & policy evaluation.

The foundation of any simple, effective privacy system is minimizing data sprawl. Aim to keep as much of your sensitive data as possible in one single Data Vault. This will help you minimize the number of copies of data that need to be tracked. If data residency laws require you keep data separately, use a distributed set of data vaults with a single CRUD API.

To move towards centralized storage of sensitive data, aim to minimize the length of time any piece of your system has access to raw data. Sensitive data should only be accessed for a particular purpose, and should be discarded/deleted as soon as that purpose is performed.

2. Whenever you move or replicate your data, tokenize it first

Of course, data isn’t meant to be stored - it’s meant to be used! Whenever you you need to make a copy of sensitive data for a new data set or third party system, do not take the raw data out of the vault directly. Generate a substitute token that can be used in place of the data without compromising privacy. Then use that instead.

You can think of the token as a reference to the data instead of the data value itself. By working with a reference, you avoid having to move the associated policy and data use restrictions everywhere the data value propagates.

Sensitive data is stored in a secure vault (1), and a reference token propagates in its place (2). Trusted applications & employees can resolve the token by sending it to the vault with relevant context (3). If the access policy is met, the vault returns the raw data (4).

How do you create a token that is useful, but doesn’t compromise privacy? The goal is to generate a random value, but preserve the minimum characteristics of the raw data for your intended use case.

For example, to create a token for an email that will flow through your systems without causing validation errors, use a generation policy that retains the structure of an email. E.g. replace stevejobs@apple.com with x6ah11nsp@mu76e1d2.com.

If done correctly, the token is not considered personally identifiable (even if the data it references is) - until the moment it is resolved for the actual value. This means it can move freely throughout systems, without weakening data security or compromising privacy. It can be used outside of the user’s home country without breaking data residency laws. It can be used for analysis on employee laptops without raising the potential severity of account takeover or employee abuse. And it can be replicated in third party data centers without increasing your exposure to data breach. At any point, you can delete the link between the token and raw value, rendering the token useless.

For best-practice security, you can issue unique, distinct tokens for each particular use of the same sensitive data value. This prevents matching of tokens across different data sets.

Store the tokens alongside the data in the vault and associate each token with an access policy that governs when the token can be resolved back into the raw data.

3. Control raw data access centrally

Of course, sometimes, tokens aren't sufficient. Your email marketing partner can’t send emails to a tokenized email address. When a teammate, partner or application needs the original sensitive data, let them request to exchange the token for the sensitive data in the Vault.

For each token, define an access policy that governs the circumstances in which the token can be exchanged for the raw data. A well-formed access policy will receive context about who is requesting the raw data, when, where and for what purpose. It will evaluate this context against the user’s consents, local laws and the company’s security settings, to determine whether access is permitted. 

For example, an access policy might enforce that a token can only be resolved by a full-time employee, on a trusted IP address, for the purpose of fraud and integrity work. In the case of a data breach, an attacker would not be able to resolve tokens protected by this policy. They would only see the dataset of tokens, not the sensitive data itself. Furthermore, if you had issued distinct, single-use tokens for each particular use of the sensitive data, the tokens could not be matched across compromised datasets.

Of course, if an employee or application had kept a copy of the raw data, this data would be vulnerable to data breach. Raw data should be discarded/deleted immediately after use.

The key is to apply, store and maintain these access policies centrally. Your systems will evolve, your customers will edit their consents and your privacy regulations will fragment. The more copies, languages and systems involved in your access policies, the harder it will be to update and maintain your code. A simple, effective privacy system stores and applies all access policies in one place, and minimizes the number of policies in the systems.

If done correctly, you will be able to update your access policies and apply the changes across your entire enterprise efficiently. You’ll even be able to change the access policies on tokens after the token has been shared. For example, if your legal team's interpretation of acceptable use of a customer's age changes, you can update all your go-forward policies accordingly.

Log access policy resolutions so you have a clear record of why, when and by whom tokens are resolved. This will give you clearer insight into your data sprawl and usage.

4. Use tools that help your security, privacy & compliance teams thrive with less engineering resource

The only constant in privacy is change. Regulations change, systems change and customer consents change. As far as possible, you should aim for a system that can change with minimal involvement from your engineering team. This removes your engineering team as a bottleneck for privacy issues, and removes privacy issues as a bottleneck from product development.

Use an interface that gives your security, privacy and compliance teams control over data use without involving engineering. A strong interface will allow them to apply access policies and serve deletion requests without coding.

An effective policy management tool will integrate with existing company processes (like security/privacy programs). It will allow simple composition and parametrization of policies to blunt the explosion in the number of policies in the central store.

Getting Started

Now you know what private-by-design architecture looks like - where do you start?

UserClouds is a private-by-design identity system that makes privacy and data security simple. It was built by privacy infrastructure leaders at companies like Amazon and Facebook, so you can get world-class privacy technology, without the cost of designing and building it from scratch.

It serves all your identity needs like authentication, authorization and user data storage, but it’s modular, so you can plug any part of it into your existing systems and up-level your security & compliance step-by-step.

For more info, get in touch at info@userclouds.com .

Reach out today