Rethinking the enterprise data archive for big data analytics and regulatory compliance

Table of Contents

  1. Summary
  2. Situational analysis: traditional big data analytics and archiving
  3. Business challenges of analytics big data archiving
  4. Technical challenges of analytics big data archiving
  5. Regulatory compliance and audit challenges
  6. Real-world benefits of big data archiving and analytics
  7. Solution spotlight: real-world benefits of big data analytics and archiving
  8. Future trends in big data analytics and archiving
  9. Conclusion and key takeaways
  10. About Ashar Baig

1. Summary

With enterprise data growing rapidly and with business and regulatory demands requiring continuous data access, organizations must have a well-thought-out approach for keeping years of history online — with the ability to scale easily as they grow. Enterprise data is growing at a rate of 40 percent to 60 percent per year and projected to grow 50-fold — from under one zettabyte in 2010 to 40 zettabytes by 2020. A big data archive and analytics solution must be able to scale with the needs of the organization if it is to archive and analyze large volumes of new and historical data.

In the past, historical data was archived mainly for regulatory compliance purposes. Today businesses often analyze their historical data against current data sets so they can derive a competitive advantage and better understanding of their customers while also generating incremental revenue.

If businesses are to extract value from years of history and corporate memory, they must store data in a fully accessible database or data store with access methods that are standards-based so they don’t need to maintain a different set of skills and tools. For some organizations, combining current and historical data sets is optimal for providing organizational stakeholders with query access to production data warehouses and data archives.

However, data warehousing on tier-1 storage can be a costly proposition for an enterprise, not only because of the cost of storage hardware but also because of the software, i.e., database management systems. Additionally, there are the human costs required to define internal business processes, analytic models, types of analysis, and, of course, costs for integrating multiple source systems.

The financial scrutiny CIOs exert on organizational IT expenditures magnifies any cost inefficiencies. Today’s flat or decreasing IT budgets point to a more cost-effective data-archive approach that intelligently moves data from expensive tier-1 storage to inexpensive tier-3 or even tier-4 storage. This shift can achieve lower costs and long-term data retention while providing fast and granular access to the data repositories.

Finally, every C-level executive must be mindful of data governance and the security of critical organizational data assets. Big data analytics platforms like Hadoop assume a flat security posture, but a big data archiving and analytics solution must abide by stringent data security and regulatory compliance requirements. Not doing so can result in fines, penalties, and bad PR.

This research report explores today’s big data archive, in which analytical and compliance solutions are implemented by large organizations for the purpose of:

  • Moving large, historical data from tier 1 or tier 2 to cheaper storage for improved efficiency and future scale
  • Making data available, usable, and queryable to organizational stakeholders for easy lookups, analysis, and revenue-generating endeavors
  • Providing robust data security and data retention capabilities that facilitate regulatory compliance and data governance audits.

Note: In this paper, anything more than 100 terabytes (TBs) and growing above 50 percent annually represents big data. An organization that is managing 50 TBs of data in a proprietary enterprise data warehouse (EDW) but has invested up to approximately $100,000 per terabyte to install, build, manage, and maintain it at a cost of approximately $5 million has an expensive environment but doesn’t necessarily have “big data.”