Data Policies

Data policies enable compliance with regulations such as GDPR, CCPA or HIPAA. Policies operate in a secure layer of the architecture, yet they do not affect your raw Kafka data and applications.

Policies concepts

Lenses follows the National Institute of Standards and Technology (NIST) standards. Policies gives you the ability to apply masking to Kafka specific fields to all Lenses channels (UI/CLI/API/SQL).

  • Redaction Policy: Whether to protect messages at a field level.
  • Category: Category of sensitivity in the data.
  • Impact: The business impact levels concerning the data.
  • Fields: Definition of fields that the data policy will apply to.
The Policies option is available if the permission Data Policies is set

Create a policy

To create a policy:

1. Navigate to Policies
2. Click the New Policy button

No Data Policies for sensitive data

No data policies are enabled by default. Import automatically recommended data policies, or create a new one.

New Data Policy

In the above example, we are:

  1. Creating a new Credit Card Numbers data policy
  2. Selecting LAST-4 as the redaction policy
  3. Setting Financial Data as the data category
  4. Setting HIGH as the impact to the business
  5. Applying this policy to all datasets (Kafka topics or Elasticsearch indexes) that begin with user_. Note that if no datasets are specified, then the field will apply to all datasets.
  6. Applying this policy to the field credit_card in the datasets matching the above.

Data policies in action

Once the above has been created, Lenses will automatically identify ALL Kafka topics and Elasticsearch indexes that contain credit card info.

New Data Policy

Any data on Kafka, whether serialized as Avro, JSON, XML or even ProtoBuf that contains credit_card information will automatically be detected.

Apart from identifying all the sensitive data at a field level, Lenses will also protect the data for you.

Kafka PII Data Policy

That means that anyone accessing data via Lenses (UI/CLI/Python) can access production data while respecting the the sensitivity of the underlying data.

Mask data

Lenses applies the masking to data any time you request to access it. The available policies are:

  • None Track sensitive data, but do not protect them.
  • Last-4 Display the last 4 characters of the value.
  • First-4 Display the first 4 characters of the value.
  • Initials Display the first letter of each word.
  • Email Mask email address, showing the domain name.
  • All Mask the entire value.
  • Number-to-negative-one Replace a numeric value with -1. Note that this only affects numeric types, it will have no effect on strings that contain numbers.
  • Number-to-zero Replace a numeric value with 0. Note that this only affects numeric types, it will have no effect on strings that contain numbers.
  • Number-to-null Replace a numeric value with null. Note that this only affects numeric types, it will have no effect on strings that contain numbers.

Advanced field specification

In the case of nested data, it is possible to specify nested fields using the . character. For example, if your users Kafka topic has a field called details which in turn contains a field called name, it is possible to specify the field details.name so that only that particular field is masked, rather than every field called name.

Note that, for a Kafka topic, there may be both a key and a value, and the policy will apply to each of these if they contain the corresponding field.

In the event of two policies matching a given field, the more specific one will be applied, e.g. if there is a policy for name with a redaction of First-4 and a policy for users.details.name with a redaction of Initials, the latter will be applied. Wildcards (see below) and dataset rules do not affect this.

Note that masking is only performed on nodes without children. Continuing with the example above, details.name can be masked, but if we attempt to apply a data policy to details, it will have no effect, as it has child properties.

It is also possible to specify wildcards using the * character, so that d*s.name will match both details.name and deliveries.name. As . is considered to be a field separator, a wildcard will not match against it. So u*s.name will match users.name but will not match users.details.name.

Manage policies

Once you setup a data policy, you can view all policies and how sensitive data exist in your data platform.

List Lenses Data Policies

Provided your account has the relevant access level, you can click on data policy and edit or remove it.

Edit or remove Data Policies

If you select to edit a data policy you can change its configuration.

Edit Lenses Data Policy

Policy associated resources

Datasets

Lenses identifies all data resources automatically with messages that contain any sensitive payload field. This happens across all data format (JSON, AVRO, XML etc.) and whether the field lives at the record level in the key or value or even in a nested structure.

Flows / Connectors

Lenses identifies all flows and connectors that are consuming or producing such sensitive data so that you can track their usage across multiple data systems.

Flows / SQL Processors

Lenses identifies all streaming SQL processors that produce or consume sensitive data.

Flows / Custom Applications

Lenses identifies all your micro-services and application that are producing or consuming sensitive data.