Shining a light on dark data

Shining a light on dark data
(Image credit: Pixabay)

The business world is flooded with data. It’s estimated that the aggregate amount of data, which grows twice in size every two years, measures 4.4 zettabytes (trillion gigabytes) and is likely to reach a massive 44 zettabytes by 2020.

The internet is at the heart of this data tsunami. In every minute, internet users share more than 2.5 million pieces of content on Facebook, tweet more than 350,000 times and send more than 204 million text messages.  

Dangers can arise with this volume of data if it’s not controlled in the right way; endpoint security and IT management are the two pillars of content control that companies need to build their digital foundations on.

About the author

Brian Remmington is the CTO of Alfresco.

When a company cannot shed light on its data assets, then this is known as ‘dark data’, which can present real problems around data compliance, legal issues, productivity and costs of storage. Gartner defines dark data as “the information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes”.  So how can you control dark data and ensure it doesn’t damage your business?

Protecting data and preventing sprawl 

Content sprawl occurs when there isn’t a plan to address outdated content or to ensure that data is stored in the right way. There are several reasons this happens. Companies may have acquired other companies with existing technologies, or there may be groups of employees within an organisation who believe that the corporate systems don’t work for them. They feel like they have special technological needs and so they need different systems in place. 

They may start using IT-sanctioned tools online storage tools such as Dropbox, however, they might then realise that they can’t give their colleagues access to files, so they create a public folder and if that person leaves the company then those files are still out there to be accessed by anyone. This leaves a gap in a company’s security safety net, making the rest of the company’s systems vulnerable to potential hacks or data breaches. 

Even storing information on shared drives such as Microsoft OneDrive, Google Drive or Dropbox can be problematic. Most organisations need to manage multiple areas where files are being stored, but the problem with that is that there’s no single source of information, finding what you want is difficult, and it’s too easy to share information with too many people, including those outside of the organisation, with no traceability. 

Mistakes often arise from employees using different versions of updated documents with file names such as “version ‘x’” or “final version”? When information is stored in multiple areas then that creates inefficiencies in the organisation such as time wasted searching, decisions made using outdated information, or recreating information because its been lost in cyberspace. 

As soon as employees develop different preferences for systems, or don’t have up-to-date content across the board, then companies have a problem on their hands. This doesn’t just have an impact on productivity and the bottom line, it also impacts information security by leaving data vulnerable to being hacked or stolen. 

Content sprawl can be a financial burden 

Things can get complicated for a company if litigation is involved. For example, if a company is taken to court and needs to prove all its evidence, if the needed information is spread across umpteen different systems then they have to audit all those systems to find everything about a specific case whether it’s IP infringement, an HR disciplinary or any other equally-serious matter. 

In these situations, auditors have to search through all the content systems to find everything on that topic and put it on hold. This kind of work is often outsourced because it’s hard for companies to find the resources internally, so it can cost them a fortune. 

Having all files and servers controlled and secure is important to protect the company from added litigation issues as well as saving time and money. Rather than being retrospective, if an audit is designed to provide concrete opinions in relation to prospects and risks, auditors are likely to be more reliant than ever on receiving extensive and reliable information from directors and management. An increased audit scope means an even greater risk of litigation, which may well increase the 'audit burden' and increase the costs involved.

Organising the dark corners of data 

To truly conquer the problem of dark data, organisations need to have an audit to check existing business systems and understand why and what they’re using for. Is there a real need for them or is it because someone’s just decided to set up their own system?  Look for things like ‘ROT’ - Redundant, Outdated or Trivial content. This is content that companies can just get rid of and don’t need to maintain.  

Using network monitoring to see what’s being accessed provides information on what systems are being used. Assess all the ways information is leaked, lost or placed in other areas. If an employee is using a tool that is not IT-sanctioned, then it may create an extra dimension of danger. Employees may also use insecure servers which can lead to information being stolen.  

Dark platforms are part of the great unknown and that’s where IT and compliance teams lose control of valuable corporate information. Companies need to reinforce their security perimeters to make sure that all data and files are only accessible to employees as and when needed to do their job.

Putting a spotlight on dark data

One trend which will continue to develop next year is companies wising up to the need to safely manage their content with a central system to reduce data being lost or stolen. But the problem with the growing amount of data will always be managing it and understanding that nobody ever deletes anything. People store things such as old presentations and contracts in certain places when it’s important, but they won’t revisit it later to assess whether it’s still useful or not.  

Creating retention schedules as part of information governance capability helps identify what needs to be kept, why, and for how long. Putting information lifecycle procedures in place can also help to manage and secure data. Rather than keeping data on expensive hard drives, companies can migrate it to cheaper storage known as cold storage. This enables employees to still access it if necessary but can dramatically reduce the cost.

Creating an in-depth framework to organise and store content helps reduce risks of content sprawl and security breaches. It also helps companies tackle the beast of dark data and let employers and employees alike focus on their work. Removing the worry around data enables growth and can ultimately help businesses reach their goals.

Brian Remmington

Brian is the Chief Technology Officer (CTO) at Alfresco. He is a Software Architect, Mentor, and Leader with extensive commercial experience in developing solutions to difficult business problems using leading-edge software technologies. His expertise lies in quickly understanding complex technical concepts and standards, and working out how they can be applied pragmatically in the design of enterprise-scale software systems.