Summary:

Cloudera wants to get more big companies on board the Hadoop train, and new features for setting access rules are the distribution vendor’s latest tactic.

Hadoop distribution vendors have been taking action to secure the open-source ecosystem for big data applications, and have announced plans in that direction for future work. In line with that trend, Cloudera on Wednesday turned on some new capabilities for determining who’s allowed to do what with programs such as the Hive tool for SQL-like queries and Cloudera’s Impala interactive query analytics engine.

Administrators will now be able to set policies on which databases, schemas and columns certain users are able to view. They can also set parameters about access for different categories of employees in a company, and lower-level employees can get rights to adjust authorization permissions, freeing up time for top admins. In addition to just reading data, there are also fine-grained settings available for choosing who can modify data.

When Hive was developed at Facebook, the ability to prevent the accidental deletion of data on the part of well-meaning users was key, said Justin Erickson, Cloudera’s director of product management. Malicious users were not really top of mind. Now it’s possible to only provide access to just the span of data that an administrator is comfortable with sharing. This way, unwanted manipulation or deletion can be prevented by not surfacing data in the first place.

These features and others, which fall under the Cloudera label Sentry, are now available as a plug-in for the Cloudera distribution of Hadoop, version 4.3, for Hive and as a part of Impala 1.1. Later it will become a part of the standard-issue CDH distribution from Cloudera.

Sentry could help Cloudera pitch Hadoop as more enterprise-ready than ever. More employees can be allowed access to files when cells that contain extra-sensitive data become inaccessible to them, and more generally companies might be more willing to take Hadoop past the science-experiment phase and bring in production data if they know they can comply with regulations such as the Health Insurance Portability and Accountability Act (HIPAA).

Feature image courtesy of Shutterstock user Arman Zhenikeyev.

Comments have been disabled for this post