Thursday, June 4, 2015

Protecting Data

Recently it was reported the Federal Office of Personnel Management was hacked by the Chinese in December of 2014. It is possible that around 4 million current and former federal employees data was compromised.(1) So how do we protect this data?

Maybe the better question is why do we store this data? If much of this data was not being stored online in databases, files and on a network, then even if the systems are compromised the data is not there to take. How many of these "former" employees of the Federal government have been gone for decades or longer? How long should their records be kept? More importantly how long should their records be kept "online"?

Two words "retention policies".

Oh and here is another important word "liability".

In today's world of storage getting cheaper everyday, everyone seems complacent to just store information forever. Who cares if it is 20 years old. We "may" need it someday.

It is fair to say there is some data that needs to be kept for long periods of time. There is even some data that should never be removed. For example medical records should be kept for the lifetime of that individual and then some. I doubt you want to delete pictures of your kids to save space. What about your family tree, videos of your wedding, music collection. This data needs to be online and easy to access at anytime as well as backed up. (PLEASE everyone backup your data!)

Retention policies need to be setup on all classes of data. From documents, to pictures, video, databases, logs, etc. When it comes to data, content is the key to how long it should be kept. There are laws on how long some data is to be kept. Beyond legal requirements policies should be set based on usefulness of the data. 7 years is a fairly typical number used to hold financial data for most individuals and businesses as an example.

Keeping data too long is a liability to the holder of that data. If that data is breached or improperly used, the company, government agency, or individual holding onto that data can be liable. Not to mention it is never fun to have to tell millions of customers that their private information is in the hands of hackers soon to be on the black market for thieves all around the world.

Here are some solutions to managing data without deleting it:
Setup a tiered system for storage

This means have multiple levels of storage like
- Online
- Near-line
- Archived

Let me explain these in broad technical terms.

Online storage is just what it says, the data is online and easily/quickly accessible. Excellent security should be in place to protect this data. Online storage has many tiers of it's own based on how fast that data is needed (performance). I will not get into the technicalities of tiered storage systems, just look up "tiered storage models". This is the most expensive tier as the data changes a lot, requires fast storage systems/connections, and must be backed up on a regular basis.

Near-line storage is not accessible online directly by users but is easily put online if needed. Typically this would be onsite but on systems that may not be directly connected to the servers or network. Data administrators would need to bring this data online when needed. The advantage to this is that data is significantly less susceptible to hacking or rouge employees. Automated processes as well as manual processes would follow retention policies to move data from online to near-line storage.

Archived storage is the lowest tier and would typically be stored offsite. This would be kept for legal reasons and would be the slowest to bring online as well as the most work. This data would likely be stored on media like tapes or optical that is not easily modified. This protects the integrity of the data. Archived data would be nearly impossible for hackers or rouge employees to compromise as physical access would be needed to get to this data. Archived storage is also the cheapest to maintain.

Every organization should have policies in place already to deal with data retention as well as tiers of storage to comply with these polices. Organizations that do have policies in place and follow them are much less likely to have data compromised and if they do the amount of damage is minimized. As well if data is compromised your organization can show reasonable efforts were made to secure the data. Hopefully this will play out in the organizations defense in their favor - verses negligence.

Another benefit is deniability. If there is data then there is evidence. If there is no data there is no evidence. Plausible deniability. Not saying that there "is" wrong doing but keeping all your data around is just asking for trouble. Organizations must remember to follow the laws on keeping data for the minimal amount of time. Beyond that get rid of that data if you have no justifiable reason to keep it.

Cost is the main reason to get rid of data. Keeping data is expensive. Users don't think of this nor do most executives or bureaucrats. Data has to be stored on storage systems. In large network environments these storage systems are much more expensive than a desktop hard drive. Large storage systems can easily cost more than a luxury car and go up from there. Next you need to back that data up. So now you need an entirely separate storage system (normally cheaper than the primary) to keep a copy of all that data. But wait there's more... When you backup data there is more than one copy. There are hundreds of copies with version history and deleted data too.

Finally there is the space, power, cooling, and the management of all the data systems and backups. There are countless hours of IT employees' time to keep everything up and running so users can keep making yet more data.


No comments:

Post a Comment