Back to basics – where in the world is my data and how do I protect it?

29 August 2023
by Linda Sheehan

As the hybrid world of remote and in-person working has been normalised, some businesses are hosting events and conferences using 3D and digital avatars to provide a more interactive space to meet with colleagues and entertain clients. The concept of the metaverse is not formally defined but is generally accepted to include this type of virtual reality. It is not yet clear whether the metaverse is a passing phase or the future way of doing business.

A business working in the metaverse or starting its journey with the use of collaborative tools is creating digital information at an exponential rate. As we have discussed in our recent article there is great value to be derived from getting its electronic house in order and digitising subject matter expertise, such as curating training sets for generative AI for investing in the future.

Conversely, by expanding its digital world and business footprint into a spider web of complexity, meeting the regulatory needs to organise and label data becomes increasingly difficult. When there are multiple forms of business communication channels, businesses are no longer simply checking project folders, emails and hardcopy files in offices to respond to information requests. It becomes imperative to know where information resides in responding to regulatory and compliance requirements timeously.

Here are some of the key scenarios when knowing where your data resides becomes time critical:

  • Dawn raids – by its very nature, on an unexpected morning, a business may need to advise regulators on where information potentially relevant to their investigation resides to prevent an overinclusive raid of its premises and infrastructure.
  • Cyber breach – criminals may lock a business out of its own data meaning it is unable to assess what data has been stolen, making it very difficult to comply with data privacy rules, such as, notification to those individuals or entities that may have been impacted by the breach.
  • Data subject access request – there is a very short timeline to respond to a request from a data subject to provide all information held about them.
  • Litigation – failure to preserve in place information that may become relevant to the litigation can result in adverse findings being drawn. A successful negotiation or court outcome often hinges on factual evidence and could mean that you are not in the best position to negotiate or defend your position if you can’t find, or weren’t aware of, that piece of evidence to support your case.
  • Forensic investigation – once a bad player gets wind of an investigation they will do everything in their power to delete any evidence of wrongdoing.

The next business challenge is identifying personal information within its data and the decision on whether it requires data anonymisation or redaction. This is a complex topic and depends on the use case for the data set. Typically, data anonymisation is used when a business wants to derive value from its data for business analysis purposes and data redaction is used when a business is handing over data to a third party in response to a regulatory or compliance requirement.

In general, data anonymisation hides personal information in data sets to protect businesses from non-compliance with data privacy rules when transferring data cross-border or utilising a data set for purposes that it was not originally processed, for example, curating an AI training set or aggregating information about its employees. There are many different ways to hide information, including encrypting or de-identifying the personal information through patterns and term shuffling. It may be done automatically on live data, a specific subset of data or a cloned copy of a dataset.

Data redaction, often confused with data anonymisation, is the process of completely removing personal information from a data set. The old-fashioned data redaction process involved a human physically drawing a line over words on a physical document with a black marker pen. This was very time consuming and fraught with risk of inaccuracy or not colouring within the lines properly. The modern processes incorporate automated identification and redaction of information through Natural Language Processing (NLP) and Named Entity Recognition (NER). Simply put, AI is trained to find and brand the personal information by applying a digital version of a black line.

The most common theme is in the identification of personal information in large volumes of information:

  • A cyber breach often results in a review of the potentially leaked documents. The key focus is to find and extract personal information in order to meet stringent notification requirements and to assist the business in identifying and assessing the risk to the business.
  • A key part of the response to a DSAR is anonymising privileged and sensitive information before handing information to a third party.
  • In litigation, parties are required to exchange documents that they hold that are relevant to the dispute as part of the discovery process, typically using an eDiscovery document review platform. It is important to redact information between a business and its lawyers. For example, an email partially containing legal advice to the business that falls under a category of legal privilege.
  • In mergers and acquisitions, the seller shares data with the buyer(s) typically in a virtual data room. It is important to strike a balance between providing necessary information to get the best deal alongside any legal requirements to protect information. Typically, the redaction of sensitive information relates to personally identifiable information and business confidential information.

AI tools can scan and process vast amounts of data looking for the precise types of data that trigger notification obligations and sensitive data protection. intelligENS has built custom AI-powered models tailored specifically to the unique African market and covers nuances, such as the requirement to scan for business personal information in South Africa and the fact that a lot of information is not fully electronic.

Speak to our team if you would like to know more about legal and compliance readiness.

Linda Sheehan
Head of intelligENS