Greenplum
{{Short description|American data technology company}}
{{Infobox company
| name = Greenplum
| logo = greenplumlogotype.jpg
| type = Product of Broadcom
| location = Palo Alto, California
| industry = Big data technologies
| products = Database management system software
}}
{{Infobox software
| title = Greenplum Database
| latest release date = {{Start date and age|2023|09|28}}
| latest release version = 7.0.0
| developer = Broadcom
| website = {{URL |greenplum.org}}
| license =
| genre = Database management system
| operating system = Linux
}}
Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired by EMC Corporation in July 2010.
{{cite news |date=July 6, 2010 |title=EMC to Acquire Greenplum |work=Press release |publisher=EMC Corporation |url=http://www.emc.com/about/news/press/2010/20100706-01.htm |access-date=March 15, 2017}}
Starting in 2012, its database management system software became known as the Pivotal Greenplum Database sold through Pivotal Software. Pivotal open sourced the core engine and continued its development by the Greenplum Database open source community and Pivotal.
Starting in 2020 Pivotal was acquired by VMware{{Cite web |last=Haranas |first=Mark |title=5 Things You Need To Know About VMware's Acquisition Of Pivotal {{!}} CRN |url=https://www.crn.com/slide-shows/virtualization/5-things-you-need-to-know-about-vmware-s-acquisition-of-pivotal |access-date=2024-10-02 |website=www.crn.com}} and VMware continued to sponsor the Greenplum Database open source community as well as commercialize the technology under the brand name VMware Tanzu Greenplum. In November 2023, VMware was acquired by Broadcom.{{Cite news |date=2023-11-23 |title=Chipmaker Broadcom completes $69bn deal to buy VMware |url=https://www.bbc.com/news/business-67505114 |access-date=2024-06-05 |language=en-GB}}
In May 2024, Tanzu by Broadcom made the decision to close source the Greenplum Database project. All future releases of Greenplum Database will be closed source and released as part of the VMware Tanzu Data Suite.
Company
Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan. It was a merger of two smaller companies: Metapa (founded in August 2000 near Los Angeles){{Cite web |title= Form D: Notice of Sale of Securities |publisher= US SEC |date= July 30, 2003 |url= https://www.sec.gov/Archives/edgar/vprr/0302/03028135.pdf |access-date= March 15, 2017 }} and Didera in Fairfax, Virginia.{{Cite news |title= Metapa Buys Didera |author= Maureen O'Gara |date= September 26, 2003 |work= Linux Business News |url= http://www0.cloudcomputingexpo.com/node/35438 |access-date= March 15, 2017 }}
Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total of {{US$|20 million|link=yes}} in funding was announced at the merger.{{Cite news |title= Metapa Acquires Didera and Closes Additional Funding; Industry Pioneers in High-Performance Computing Combine to Create Breakthrough Linux Database Clustering Solution for Decision Support |work= Press release |date= September 23, 2003 |url= http://www.businesswire.com/news/home/20030923005198/en/Metapa-Acquires-Didera-Closes-Additional-Funding-Industry }} Greenplum, based in San Mateo, California, released its database management system software based on PostgreSQL in April 2005 calling it Bizgres.{{Cite web |title= Bizgres project launched |date= April 17, 2005 |work= PostgreSQL developer's web site |url= https://www.postgresql.org/about/news/308/ |access-date= March 15, 2017 }} Rounds of venture capital of about {{US$|15 million}} each were invested in March 2006 and February 2007.{{Cite web |title= Greenplum Takes $27 Million Series C |date= January 21, 2008 |author= Duncan Riley |work= Tech Crunch |url= https://techcrunch.com/2008/01/21/greenplum-takes-27-million-series-c/ |access-date= March 15, 2017 }}
In July 2006 a partnership with Sun Microsystems was announced.{{Cite web |title= Sun/Greenplum |date= June 26, 2007 |work= Business Intelligence Best Practices |author=Colin White |author2=Richard Hackathorn |url= http://www.bi-bestpractices.com/view-articles/4640 |access-date= March 15, 2017 }} Sun, which had also acquired MySQL AB, participated in a round of {{US$|27 million}} investment in January 2009, led by Meritech Capital Partners. The Bizgres project included a few other members, and was supported through about 2008, when the product was just called "Greenplum" as well.{{Cite web|title=History |work=Old Bizgres.org web site |url=http://www.bizgres.org/?page=13 |archive-date=December 22, 2008 |archive-url=https://web.archive.org/web/20081222001346/http://www.bizgres.org/?page=13 |access-date=March 15, 2017 }}{{Cite news |title= Greenplum Updates Open-Source Based Database |work= Information Week |date= February 22, 2008 |url= http://www.informationweek.com/software/information-management/greenplum-updates-open-source-based-database/d/d-id/1064910? |access-date= March 15, 2017 }} The Sun Fire X4500 was a reference architecture and used by the majority of customers until a transition was made to Linux around that time. Greenplum was acquired by EMC Corporation in July 2010, becoming the foundation of EMC's big data software division. Although EMC did not disclose the value, it was estimated at {{US$|300 million}}.{{Cite news |title= Big Data = Big Money: EMC Buys Greenplum |author= Om Malik |work= GigaOm |date= July 6, 2010 |url= https://gigaom.com/2010/07/06/emc-buys-greenplum/ |access-date= March 15, 2017 |archive-date= October 20, 2016 |archive-url= https://web.archive.org/web/20161020222153/https://gigaom.com/2010/07/06/emc-buys-greenplum/ |url-status= dead }}{{Cite news |title= Microsoft, Sun, And SAP Surprising Winners In Greenplum Sale |date= July 7, 2010 |work= Forbes |author= Alexander Haislip |url= https://www.forbes.com/sites/velocity/2010/07/07/microsoft-oracle-and-sap-surprising-winners-in-greenplum-sale/amp/ |access-date= March 15, 2017}} Greenplum's products at the time of acquisition were the Greenplum Database, Chorus (a management tool), and Data Science Labs. Greenplum had customers in vertical markets including eBay.{{cite web |url = http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/| title = ebay's two enormous data warehouses |date= April 30, 2009 |work= DBMS2 blog |publisher= Monash Research |access-date= March 15, 2017 }} It became part of Pivotal Software in 2012.{{Cite news |title= EMC wants to be the Linux of big data: Opens up Chorus tool, borgs agile coders Pivotal Labs |author= Timothy Prickett Morgan |date= March 20, 2012 |work= The Register |url= https://www.theregister.co.uk/2012/03/20/emc_openchorus_pivotal_labs_acquisition/ |access-date= March 15, 2017 }}
A variant using Apache Hadoop to store data in the Hadoop file system called Hawq was announced in 2013.{{Cite web |title= When should I use Greenplum Database versus HAWQ? |work= Pivotal Guru web site |date= January 31, 2014 |url= https://www.pivotalguru.com/?p=642 |access-date= March 15, 2017 }}{{Cite news |title= EMC morphs Hadoop elephant into SQL database Hawq |author= Timothy Prickett Morgan |date= February 25, 2013 |work= The Register |url= https://www.theregister.co.uk/2013/02/25/emc_pivotal_hd_hadoop_hawq_database/ |access-date= March 15, 2017 }} In 2015 the GreenplumDB and Hawq open source software projects were announced.{{Cite magazine |title= Pivotal Doubles Down on Open Source in a Sign of Changing Software World |author= Cade Metz |magazine= Wired |date= February 17, 2015 |url= https://www.wired.com/2015/02/sign-changing-software-world-pivotal-will-open-source-big-data-tools/ |access-date= March 15, 2017 }}
Technology
Pivotal's Greenplum database product uses massively parallel processing (MPP) techniques. Each computer cluster consists of a master node, standby master node, and segment nodes.{{Cite news |title= EMC gets fat and flashy with Greenplum appliances: Take that, Teradata, Exadata, Netezza |author= Timothy Prickett Morgan |date= April 6, 2011 |work= The Register |url= https://www.theregister.co.uk/2011/04/06/emc_greenplum_upgrade/ |access-date= March 18, 2017 }} All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the data definition language. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a query enters the master node, it is parsed, planned and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table. The Structured Query Language, version SQL:2003, is used to present queries to the system. Transaction semantics comply with constraints known as ACID.{{Cite book |title= Getting Started with Greenplum for Big Data Analytics |author= Sunila Gollapudi |publisher= Packt Publishing |date= 2013 |isbn= 978-1-78217-705-0 }}
Competitors include other MPP database management systems provided by major vendors such as Teradata, Amazon Redshift, Microsoft Azure, Alibaba [https://www.alibabacloud.com/product/hybriddb-postgresql?spm=a3c0i.7958991.1097638.dnavproductsd11.3e5d1dfb7zvU7F AnalyticDB] and, in the past, IBM Netezza.{{Cite web |title= System Properties Comparison Amazon Redshift vs. Greenplum vs. Microsoft Azure SQL Database vs. Teradata Aster |work= DB-engines |url= http://db-engines.com/en/system/Amazon+Redshift%3BGreenplum%3BMicrosoft+Azure+SQL+Database%3BTeradata+Aster |access-date= March 18, 2017 }} Additional competition comes from other smaller competitors, column-oriented databases such as HP Vertica, Exasol and data warehousing vendors with non MPP architecture, such as Oracle Exadata, IBM Db2 and SAP HANA.
Greenplum Version 7
In September 2023, Greenplum Database Version 7 was released.{{Cite web | title= VMware Greenplum 7.x Release Notes | date= 2 October 2023 | url= https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/relnotes-release-notes.html }} Version 7 is based on PostgreSQL version 12.12.
Greenplum Version 6
In September 2019, Greenplum Database Version 6 was released. Version 6 is based on PostgreSQL version 9.4 and features massive gains in {{Cite web | title= Greenplum 6 OLTP Benchmarks | date= 15 May 2019 | url= https://greenplum.org/oltp-workload-performance-improvement-in-greenplum-6 }}
OLTP performance. Greenplum 6 was reviewed in the media by several sources and mentioned for its Postgres open source alignment
{{Cite web |title= Pivotal's Greenplum database is about to finally align with the open source project. What will that mean for the platform?|website= ZDNet |url= https://www.zdnet.com/article/greenplum-6-ventures-outside-the-analytic-box/ }} and for its OLTP performance {{Cite web |title= Substantial rev of the open source, MPP data warehouse offers high concurrency, embedded analytics, and data science capabilities |date= 7 November 2019 |url= https://www.infoworld.com/article/3452517/greenplum-6-review-jack-of-all-trades-master-of-some.html }}
Greenplum Version 5
In September 2017, Greenplum Database Version 5 was released. Version 5 includes the first iteration of the Greenplum project strategy of merging PostgreSQL later versions back into Greenplum and is based on PostgreSQL version 8.3 up from the previous version 8.2.{{Cite web |title= Pivotal Greenplum is alive and kicking|work= ZDNet|url= https://www.zdnet.com/article/pivotal-greenplum-is-alive-and-kicking/|access-date= September 14, 2017 }} Version 5 also introducing the General Availability of the GPORCA Optimizer{{Cite web |title= Orca: A Modular Query Optimizer Architecture for Big Data|work= ZDNet|url= http://15721.courses.cs.cmu.edu/spring2016/papers/p337-soliman.pdf |access-date= April 14, 2016 }} for cost based optimization of SQL designed for big data.