Edward Snowden’s job responsibilities at the NSA included accessing a file-sharing section of the agency’s intranet and manually moving especially sensitive documents to a more secure location. This is how Lonny Anderson, the NSA’s chief technology officer, explained to NPR last month the wide-ranging access Snowden had to classified documents. The vast trove of documents on the NSA’s intranet was there to help the various intelligence agencies collaborate or “connect the dots,” which they were accused of being reluctant to do before 9/11.
The tension between improved collaboration and data protection is a hot issue not just for the NSA. It may well be the defining challenge of the age of big data. Enterprises in both the private and public sectors feel pressured to take advantage of the rapid growth of digital data. Rather than drown in it, they open it up to anyone inside and outside of their walls that can do something useful with the data. But shared data equals both increased value and increased risk.
“As every business process gets digitized, your business cannot function without collaboration,” David Gibson, vice president of marketing at Varonis, told me recently. “Data you can’t share is a frozen asset. If nobody can access it, it has no value, but if too many people have access, it turns from an asset to liability. Secure collaboration, where all the right people have access, all data use is monitored and abuse is flagged, is where we have to get to.”
Varonis is currently helping about 2000 customers to get there and, according to Reuters, it is headed for an IPO by the end of this year. [Update: Varonis filed for an IPO on October 22, 2013] The company addresses the painful by-products of the fundamental division of labor in our digital world: Most digital data is created by individuals but the responsibility and liability for most of it is in the hands of organizations. And increasingly, the data accessed and used by members of one organization, is data that was originally created by someone in another enterprise—perhaps a customer, supplier or partner—as digital collaboration today crosses organizational boundaries.
This rising complexity of data ownership, rights, and responsibilities is further magnified by the widespread lack of data accounting and accountability. Varonis’ surveys found that organizations simply don’t know where the data is stored, who is responsible for it and who can access it, or know who uses it. Says Gibson: “When distributed systems like Windows and UNIX started to proliferate in the mid-1990s, there weren’t tools that allowed you to track your data, certainly not at today’s scale.” Today, these distributed systems encompass multiple computing devices, including mobile phones, in the hands of all employees, generating mostly “unstructured” data, i.e., data such as emails, presentations, and video files, data that is much more difficult to track and audit than “structured” data in traditional databases.
According to a study by the Ponemon Institute, 84% of organizations acknowledge that their users have access to data for which they have no business need. One reason for this sorry state of affairs, says Gibson, is that “manually administering access rights to this data is nearly impossible—there is far too much data and it is growing too rapidly.” He told me of one customer who had four full-time people in the data center answering requests for data access and figuring out who should grant the permission. “We were able to automate this entire process,” says Gibson.
Figuring out access rights is just one of the many issues plaguing data management today. Not knowing where the data is leads to unnecessary and costly duplication. Employees are given access rights to sensitive data when they work on a temporary project but when the project is over, the rights are not revoked. The use of cloud-based storage applications is increasing and employees upload corporate data to their private folders and then forget to delete it. And data has legs—sometimes leaving the organization with an employee going to work somewhere else.
More often than not, the exposure of sensitive information to unauthorized insiders and outsiders is simply inadvertent. Gibson told me about a casino where the first line of the Varonis risk management report revealed that 15 million credit card numbers were stored in a folder that was open to everybody in the company. The second line pointed to a folder with 12 million credit card numbers. The security personnel participating in the meeting rushed to fix the problem which was simply an oversight.
Varonis has methodically addressed these data management issues since 2005 by giving IT staff and their business counterparts a comprehensive view and analysis of the where, who, what, and why of their data. This map of all the data in the enterprise allows IT and the business to make joint decisions about ownership and permissions. Varonis’ adds to this its secret sauce of big data analytics applied to metadata or the data about the data.
It turns out that the metadata in its raw form “could easily dwarf the data itself,” says Gibson. The analytics allows Varonis to take out all the duplicate information, bringing down the metadata to manageable size. It then applies its machine learning and other types of algorithms to alert the owner of the data to any anomalies such as people accessing the data whose profile doesn’t look like the profile of most other people accessing the data. It’s very much like credit card companies using big data analytics to alert you to suspicious behavior in your account, except that Varonis adds a self-service functionality. “In one month,” Gibson recounted one situation with one of Varonis’ customers, “the data owners made thousands of revocations of access on their own.”
The analysis of the metadata also helps Varonis’ customers classify the data in various ways which is especially helpful when they need to flag data that is subject to regulation. When Varonis saw their customers using the metadata analysis for additional data management tasks such as preparing for data migration (e.g., from one data center to another), or archival or deletion, “we decided to build a data transport engine,” says Gibson. “It automates end-to-end the process of data migration, archiving, and retention.”
Recently Varonis added another product to its portfolio, DatAnywhere, providing cloud-like access to all the data that’s stored in the enterprise. Says Gibson: “Our mission with the first release of DatAnywhere is to remove the temptation to use Dropbox and Box. We are giving our customers an experience that is comparable or better—they don’t care where the data is stored. The enterprise has complete control and the end user can access the data the way they want to.”
Like other successful startups, Varonis is addressing a previously unidentified need in the marketplace that can be described (and simplified) with a mathematical formulation, similar to “Metcalfe’s Law”: The level of risk associated with data is proportional to the square number of people sharing it. In other words, the more you share data, the greater the risk.
But there is a corollary to the law of digital risk: The more you share data, the more value you get from it. In most enterprises today, however, the sharing of data is not accompanied by adequate protection. Says Gibson: “Nobody knows who grants access to what data. You are reviewing the keys on everybody’s key ring but you have no idea what doors they unlock.” He believes this is changing: “Now people realize they need a data-centric approach. When enough people realize that this really valuable asset is essentially in the dark without metadata and it’s unmanageable in its current form, we are poised for a watershed moment. You will not only be doing the day-to-day data management more efficiently, but you will also get the value out of the data with better and more secure collaboration.”
[Originally published on Forbes.com]