Very large database

{{Short description|Database that contains a very large amount of data}}

{{About|databases which are very large|the VLDB conference|International Conference on Very Large Data Bases}}

A very large database, (originally written very large data base) or VLDB, is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies.

Definition

The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements have continually changed what is considered very large.

One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet".

Sizes of a VLDB database

There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data. That said, VLDB issues may start to appear when 1 TB is approached, and are more than likely to have appeared as 30 TB or so is exceeded.

VLDB challenges

Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources.{{rp|11}}

=Configuration=

Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases.{{rp|36–53}}

=Administration=

The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases.

=Availability and maintenance=

When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of time and resources for a VLDB database. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives. To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO). For many systems only geographically remote solutions may be acceptable.

==Backup and recovery==

Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution.

=Performance=

Given the same infrastructure there may typically be a decrease in performance, that is increase in response time as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer (linear time); while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data (sub-linear time). Other effects can be caching becoming less efficient because proportionally less data can be cached and while some indexes such as the B+ automatically sustain well with growth others such as a hash table may need to be rebuilt.

Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of contention will increase. Some solutions to regaining performance include partitioning, clustering, possibly with sharding, or use of a database machine.{{rp|390}}

==Partitioning==

Partitioning may be able assist the performance of bulk operations on a VLDB including backup and recovery., bulk movements due to information lifecycle management (ILM),{{rp|3}}{{rp|105–118}} reducing contention{{rp|327–329}} as well as allowing optimization of some query processing.{{rp|215–230}}

=Storage=

In order to satisfy needs of a VLDB the database storage needs to have low access latency and contention, high throughput, and high availability.

=Server resources=

The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve.

Relationship to big data

VLDB is not the same as big data, but the storage aspect of big data may involve a VLDB database. That said some of the storage solutions supporting big data were designed from the start to support large volumes of data, so database administrators may not encounter VLDB issues that older versions of traditional RDBMS's might encounter.

References

{{reflist|2|refs=

{{cite book|title=Microsoft SQL Server 2000|first1=Ray|last1=Rankins|first2=Paul|last2=Jensen|first3=Paul|last3=Bertucci|chapter=21|id=Administering Very Large SQL Server Databases|edition=2nd|publisher=SAMS|isbn=978-0672324673|date=18 December 2002|chapter-url-access=registration|chapter-url=https://archive.org/details/microsoftsqlserv00rayr|url-access=registration|url=https://archive.org/details/microsoftsqlserv00rayr}}

{{cite web|title=Oracle Database Online Documentation 11g Release 1 (11.1) / Database Administration Database Concepts|url=https://docs.oracle.com/cd/B28359_01/server.111/b28318/partconc.htm#CNCPT011

|id=18 Very Large Databases (VLDB)|website=oracle|access-date=3 October 2018|url-access=limited}}

{{cite web|url=https://docs.oracle.com/en/database/oracle/oracle-database/18/vldbg/partition-intro.html

|url-status=live|archive-url=https://web.archive.org/web/20181003205734/https://docs.oracle.com/en/database/oracle/oracle-database/18/vldbg/partition-intro.html|archive-date=3 October 2018|df=dmy-all}}

{{cite web|website=Technopedia|url=https://www.techopedia.com/definition/14731/very-large-database-vldb|access-date=3 October 2018|title=Very Large Database (VLDB)

|url-status=live|archive-url=https://web.archive.org/web/20180704224849/https://www.techopedia.com/definition/14731/very-large-database-vldb|archive-date=4 July 2018|df=dmy-all}}

Gaines, R. S. and R. Gammill. Very Large Data Bases: An Emerging Research Area, Informal working paper, RAND Corporation

{{Cite book|url=https://books.google.com/books?id=3JYgAAAAMAAJ|title=Data Processing Magazine|date=1964|page=18,58|publisher=North American Publishing Company|language=en|url-access=limited}}

{{Cite book|url=https://books.google.com/books?id=KUgNGCJB4agC|title=Encyclopedia of Computer Science and Technology: Volume 14 - Very Large Data Base Systems to Zero-Memory and Markov Information Source|last=Sidley|first=Edgar H.|date=1 April 1980|pages=1–18|publisher=CRC Press|isbn=9780824722142|language=en}}

{{Cite journal|last1=Gerritsen|first1=Rob|last2=Morgan|first2=Howard|last3=Zisman|first3=Michael|title=On some metrics for databases or what is a very large database?|journal=ACM SIGMOD Record|volume=9|issue=1|pages=50–74|doi=10.1145/984382.984393|date=June 1977|s2cid=6359244|issn=0163-5808|doi-access=free}}

{{cite web|url=http://cdn2.hubspot.net/hubfs/214442/Actifio_For_Very_Large_Databases_White_Paper.pdf|title=The Very Large Database Problem - How to Backup & Recover 30–100 TB Databases|publisher=actifio|url-status=live|archive-url=https://web.archive.org/web/20180219182335/http://cdn2.hubspot.net/hubfs/214442/Actifio_For_Very_Large_Databases_White_Paper.pdf|archive-date=19 February 2018|df=dmy-all}}

{{cite web|title=Tuning & Applying Best Practices On Very Large Databases (VLDB)|url=http://www.aioug.org/sangam14/images/Sangam14/Presentations/201461_Hussain_ppt.pdf|date=2014|publisher=AIOUG|location=Sangam|url-status=live|archive-url=https://web.archive.org/web/20181004205048/http://www.aioug.org/sangam14/images/Sangam14/Presentations/201461_Hussain_ppt.pdf|archive-date=4 October 2018|first=Syed Jaffer|last=Hussain|df=dmy-all}}

{{cite web|url=http://sqlturbo.com/top-10-must-do-items-for-your-sql-server-very-large-database/|title=Top 10 Must-Do Items for your SQL Server Very Large Database|first=Warner|last=Chaves|website=SQLTURBO|date=7 January 2015|access-date=5 October 2018|url-status=live|archive-url=https://web.archive.org/web/20171213085742/http://sqlturbo.com/top-10-must-do-items-for-your-sql-server-very-large-database/|archive-date=13 December 2017|df=dmy-all}}

{{cite web|title=SQL Server VLDB in Azure: DBA Tasks Made Simple|url=https://blogs.msdn.microsoft.com/sqlcat/2018/01/22/sql-server-vldb-in-azure-dba-tasks-made-simple/|website=MSDN|access-date=6 October 2018|date=22 January 2018|first=Dimitri|last=Furman|url-status=live|archive-url=https://web.archive.org/web/20181006072244/https://blogs.msdn.microsoft.com/sqlcat/2018/01/22/sql-server-vldb-in-azure-dba-tasks-made-simple/|archive-date=6 October 2018|df=dmy-all|editor1=Rajesh Setlem |editor2=Mike Weiner |editor3=Xiaochen Wu}}

{{cite web|title=Specialized Requirements for Relational Data Warehouse Servers|date=21 June 1996|website=Red Brick Systems, Inc.

|url=http://www.redbrick.com/rbs-g/whitepapers/tenreq_wp.html

|url-status=dead|archive-url=https://web.archive.org/web/19971010114605/http://www.redbrick.com/rbs-g/whitepapers/tenreq_wp.html|archive-date=10 October 1997|df=dmy-all}}

{{cite web|url=https://mwidlake.wordpress.com/2009/09/18/what-is-a-vldb/|title=What is a VLDB?|date=18 September 2009|website=mwidlake|access-date=7 October 2018|first=Marin|last=Widlake|url-status=live

|archive-url=https://web.archive.org/web/20181006114729/https://mwidlake.wordpress.com/2009/09/18/what-is-a-vldb/|archive-date=6 October 2018|df=dmy-all}}

{{cite web|url=https://www.oracle.com/technetwork/database/availability/rman-fra-snapshot-322251.html|title=Snapshots Are NOT Backups

|archive-url=https://web.archive.org/web/20180907091910/https://www.oracle.com/technetwork/database/availability/rman-fra-snapshot-322251.html|archive-date=7 September 2018

|url-access=limited|df=dmy-all}}

{{cite web|url=https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0006423.html

|title=Using a split mirror as a backup image|website=IBM Knowledge Center|access-date=10 October 2018|url-access=limited

|url-status=live|archive-url=https://archive.today/20180109160158/https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0006423.html|archive-date=9 January 2018|df=dmy-all}}

{{cite web|url=http://www.ovaistariq.net/733/understanding-btree-indexes-and-how-they-impact-performance/

|title=Understanding B+tree Indexes and how they Impact Performance|first=Ovais|last=Tariq|website=ovaistariq.net

|url-status=live|archive-url=https://web.archive.org/web/20180207203602/http://www.ovaistariq.net/733/understanding-btree-indexes-and-how-they-impact-performance/

|archive-date=7 February 2018|df=dmy-all|access-date=10 October 2018|date=2011-07-14}}

{{cite web|url=https://www.pcmag.com/encyclopedia/term/40879/database-machine|title=Encyclopedia|id=Definition of: database machine

|url-status=live|archive-url=https://web.archive.org/web/20160704205410/http://www.pcmag.com/encyclopedia/term/40879/database-machine|archive-date=4 July 2016|df=dmy-all

|access-date=10 October 2018}}

{{cite web|url=http://www.dba-oracle.com/t_backup_vldb.htm|title=Oracle Backup VLDB tips|website=Burleson Consulting|date=26 March 2015|access-date=11 October 2016|first=Donald|last=Burleson

|url-status=live|archive-url=https://web.archive.org/web/20170630223240/http://www.dba-oracle.com/t_backup_vldb.htm|archive-date=30 June 2017|df=dmy-all}}

{{cite web|url=https://www.isaca.org/Journal/archives/2012/Volume-1/Pages/Database-Backup-and-Recovery-Best-Practices.aspx

|title=Database Backup and Recovery Best Practices|first1=Ali Navid|last1=Akhtar|first2=Jeff|last2=Buchholtz|first3=Michael|last3=Ryan|first4=Kumar|last4= Setty|date=2012|access-date=12 October 2012|url-status=live|archive-url=https://web.archive.org/web/20180629131442/https://www.isaca.org/Journal/archives/2012/Volume-1/Pages/Database-Backup-and-Recovery-Best-Practices.aspx|archive-date=29 June 2018|df=dmy-all}}

{{cite web|url=https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html|access-date=12 October 2018

|title=Chapter 1 High Availability and Scalability|url-status=live|df=dmy-all|website=dev.mysql

|archive-url=https://web.archive.org/web/20161215030829/https://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html|archive-date=15 December 2016}}

{{cite speech|title=Get the best out of Oracle Partitioning - A practical guide and reference|first1=Thomas|last=Teske|others=Hermann Bär

|archive-url=https://web.archive.org/web/20181012172456/https://indico.cern.ch/event/697301/attachments/1598206/2532649/Partitioning_guide_v18.pdf

|archive-date=12 October 2018|url-status=live|df=dmy-all}}

{{cite web|url=https://www.oracle.com/technetwork/database/options/partitioning/partitioning-wp-12c-1896137.pdf

|title=Oracle Partitioning in Oracle Database 12c Release 2 Extreme Data Management and Performance for every System|date=March 2017|access-date=17 October 2018|website=Oracle

|archive-url=https://web.archive.org/web/20171215074909/https://www.oracle.com/technetwork/database/options/partitioning/partitioning-wp-12c-1896137.pdf

|url-access=limited|url-status=live|archive-date=15 December 2017|df=dmy-all}}

{{cite conference|conference=7th International Conference on Cloud Computing and Services

|title=High Availability and Performance of Database in the Cloud - Traditional Master-slave Replication versus Modern Cluster-based Solutions

|url=https://www.researchgate.net/publication/317299391

|archive-url=https://web.archive.org/web/20181017152557/https://www.researchgate.net/publication/317299391_High_Availability_and_Performance_of_Database_in_the_Cloud_-_Traditional_Master-slave_Replication_versus_Modern_Cluster-based_Solutions|archive-date=17 October 2018|url-status=live|df=dmy-all|doi-access=free|hdl=10642/6140|hdl-access=free}}

{{cite web|title=Server Hardware Performance Considerations|date=1 August 2018|access-date=17 October 2018

|url=https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/hardware/|website=Microsoft IT Pro Center

|archive-url=https://web.archive.org/web/20181017175544/https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/hardware/

|archive-date=17 October 2018|url-status=live|df=dmy-all}}

{{cite book|title=IBM System Storage Business Continuity: Part 2 Solutions Guide|chapter=Chapter 1. Three Business solution segments defined

|publisher=IBM Redbooks|date=March 2007}}

{{cite web|title=Cluster design considerations|access-date=17 October 2017|website=Crouchbase

|url=https://developer.couchbase.com/documentation/server/3.x/admin/Concepts/bp-clusterDesign.html

|archive-url=https://web.archive.org/web/20181017195247/https://developer.couchbase.com/documentation/server/3.x/admin/Concepts/bp-clusterDesign.html

|url-status=live|archive-date=17 October 2018|df=dmy-all}}

{{cite web|title=Cross Datacenter Replication (XDCR)|access-date=17 October 2017|website=Crouchbase

|url=https://developer.couchbase.com/documentation/server/3.x/admin/XDCR/xdcr-intro.html

|archive-url=https://web.archive.org/web/20181017195516/https://developer.couchbase.com/documentation/server/3.x/admin/XDCR/xdcr-intro.html