Pentaho is moving its open-source business intelligence capabilities to the Apache license to make them more compatible with big data technologies. Pentaho’s Kettle extract, transform, load (ETL) technology was previously available under the LGPL or lesser Gnu General Public License.
Apache Hadoop, as its name implies, is offered under the Apache license, as are most of the NoSQL databases that are used to attack tons of structured and unstructured data from multiple sources. ETL tools are used in applications when data needs to be pumped out of (extracted) from a source repository; cleaned up or put into the required format (transformed); and then put into (loaded) the application that will manipulate it.
“We want to get our Kettle ETL engine embedded in big data solutions and this is a good way to do that,” said Doug Moran, co-founder and product manager big data for Pentaho.
Apache projects will not allow LGPL code to be mixed with their code, Moran said. “We partner with different Hadoop distributions, and they really strongly recommended — well they pretty much told us — to do this,” Moran said.
Generally, the difference between the Apache and LGPL licenses is that under the Apache model, a developer can put Apache software into a product and distribute it under any other open-source license as long as the embedded Apache-licensed code is unadulterated — the developer hasn’t “diluted” the rights. With GPL and LGPL licenses, a developer cannot distribute that derivative work under a less restrictive license, Moran explained.
Kettle 4.3, available under the Apache License Version 2.0, can ingest, output, manipulate and report on data from Apache Cassandra, Hadoop HDFS, Hadoop MapReduce, Apache Hive, Apache HBase, MongoDB and Hadapt’s Adaptive Analytical Platform, Pentaho said.
Pentaho’s BI tools compete with offerings from Talend, Qlikview, and Tableau. The license change takes hold with the new Pentaho Kettle 4.3 release.