Apache IoTDB
{{Short description|Open Source Database Project}}
{{Technical|date=April 2023}}
{{Infobox software
| title = Apache IoTDB
| logo = File:Apache IoTDB Logo.svg
| developer = Apache Software Foundation
| latest release version = 1.1.0
| latest release date = 3 April 2023
| repo = {{URL|https://github.com/apache/iotdb}}
| programming language = Java
| platform = Cross-platform
| genre = {{ubl |distributed |real-time |time-series |column-oriented data store }}
| license = Apache License 2.0
| website = {{URL|https://iotdb.apache.org/}}
}}
Apache IoTDB is a column-oriented open-source, time-series database (TSDB) management system written in Java.{{Cite web |last=Sally |date=23 September 2020 |title=The Apache Software Foundation Announces Apache® IoTDB™ as A Top-Level Project |url=https://news.apache.org/foundation/entry/the-apache-software-foundation-announces68 |access-date=18 November 2022 |website=The Apache Software Foundation Blog}} It has both edge and cloud versions, provides an optimized columnar file format for efficient time-series data storage, and TSDB with high ingestion rate, low latency queries and data analysis support. It is specially optimized for time-series oriented operations like aggregations query, downsampling and sub-sequence similarity search. The name IoTDB comes from Internet of Things (IoT) Database, which means it was designed as an IoT-native TSDB that resolves the pain points of the typical IoT scenarios, including massive data generation, high frequency sampling, out-of-order data, specific analytics requirements, high costs of storage and operation & maintenance, low computational power of IoT devices.{{Cite journal |last1=Wang |first1=Chen |last2=Huang |first2=Xiangdong |last3=Qiao |first3=Jialin |last4=Jiang |first4=Tian |last5=Rui |first5=Lei |last6=Zhang |first6=Jinrui |last7=Kang |first7=Rong |last8=Feinauer |first8=Julian |last9=McGrail |first9=Keven A. |last10=Wang |first10=Peng |last11=Yuan |first11=Jun |last12=Wang |first12=Jianmin |last13=Sun |first13=Jiaguang |title=Apache IoTDB |journal=Proceedings of the VLDB Endowment |date=August 2020 |url=https://www.vldb.org/pvldb/vol13/p2901-wang.pdf |volume=13 |issue=12 |pages=2901–2904 |doi=10.14778/3415478.3415504|s2cid=221352039 }}
History
Apache IoTDB is a project initiated by Prof. Jianmin Wang's team in the School of Software at Tsinghua University. In 2011, the team chose to use open source NoSQL technology instead of Oracle for a project with mass machine data management, and noticed the insufficiency of NoSQL in the industrial internet of things (IIoT) scenarios. The team started to develop a data management system and formally proposed TsFile,{{Cite web |last=Hou |first=Haonan |date=14 March 2022 |title=TsFile Format |url=https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format |access-date=18 November 2022 |website=ASF Confluence}} an optimized columnar compact file storage format for time series data, in March 2016. The source code was then opened on GitHub.
In June 2016, based on TsFile, the team began to develop IoTDB, an IIoT database supporting real-time reading & writing and analysis.
In November 2018, the project IoTDB entered incubator at the Apache Software Foundation (ASF).{{Cite web |title=Apache IoTDB Project Incubation Status |url=https://incubator.apache.org/projects/iotdb.html |access-date=18 November 2022 |website=Apache Incubator}}
On September 16, 2020, the ASF officially issued a resolution to promote Apache IoTDB to the global Top-Level Project (TLP) following a public discussion vote by the community and a show of hands vote by the board.{{Cite web |last=online |first=heise |title=Apache Software Foundation erhebt IoTDB zum Top-Level-Projekt |url=https://www.heise.de/news/Apache-Software-Foundation-erhebt-IoTDB-zum-Top-Level-Projekt-4910004.html |access-date=2022-12-13 |website=Developer |date=23 September 2020 |language=de-DE}}
Architecture
File:Structure of Apache IoTDB.png
The complete storage system of Apache IoTDB follows a client-server architecture, including IoTDB engine (server) and several components as IoTDB suite (client). IoTDB suite can provide a series of functions in the real situation such as data collection, data writing, data storage, data query, data visualization and data analysis. This allows data collected by the sensor to constantly persist in server, where the data can then be used for native query or shipped to other open-source platforms for data analysis. In particular, IoTDB provides a mode called "Edge-Cloud Cooperation", which can synchronize data collected at every user-configured interval from one IoTDB instance to another using Sync Tool.{{Cite web |date= |title=IoTDB User Guide: System Architecture |url=https://iotdb.apache.org/UserGuide/Master/IoTDB-Introduction/Architecture.html |access-date=18 November 2022 |website=Apache IoTDB}}{{Cite web |title=Apache IoTDB |url=https://dbdb.io/db/iotdb |access-date=18 November 2022 |website=Database of Databases|date=27 June 2022 }}
Users can use JDBC to write time series data to local/remote IoTDB. This time series data may represent system state data (such as server load and CPU memory, etc.), message queue data, time series data from applications, or other time series data in the database. The data can be directly written to TsFile locally or on Hadoop Distributed File System (HDFS).
TsFile is a column storage file format developed for accessing, compressing and storing time series data in Apache IoTDB. Its structure is based on LSM-Tree, which reduces the computational resources and optimizes the performance of Apache IoTDB.{{Cite journal |last1=Xiao |first1=Jinzhao |last2=Huang |first2=Yuxiang |last3=Hu |first3=Changyu |last4=Song |first4=Shaoxu |last5=Huang |first5=Xiangdong |last6=Wang |first6=Jianmin |date=2022-09-07 |title=Time series data encoding for efficient storage: a comparative analysis in Apache IoTDB |url=https://doi.org/10.14778/3547305.3547319 |journal=Proceedings of the VLDB Endowment |volume=15 |issue=10 |pages=2148–2160 |doi=10.14778/3547305.3547319 |s2cid=252135944 |issn=2150-8097|url-access=subscription }}
TsFile could be written to the HDFS, thereby implementing data processing tasks such as abnormality detection and machine learning on the Hadoop or Spark data processing platform.
For the data written to HDFS or local TsFile, users can use TsFile-Hadoop-Connector or TsFile-Spark-Connector to allow Hadoop or Spark to process data. The results of the analysis can be written back to TsFile in the same way. Also, IoTDB and TsFile provide client tools to meet the various needs of users in writing and viewing data in SQL form, script form and graphical form.{{Cite book |last1=Huang |first1=Xiangdong |last2=Wang |first2=Jianmin |last3=Wong |first3=Raymond |last4=Zhang |first4=Jinrui |last5=Wang |first5=Chen |title=Proceedings of the 25th ACM International on Conference on Information and Knowledge Management |chapter=PISA |date=2016-10-24 |chapter-url=https://doi.org/10.1145/2983323.2983775 |series=CIKM '16 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=979–988 |doi=10.1145/2983323.2983775 |isbn=978-1-4503-4073-1|s2cid=12456810 }}{{Cite book |last1=Kang |first1=Rong |last2=Wang |first2=Chen |last3=Wang |first3=Peng |last4=Ding |first4=Yuting |last5=Wang |first5=Jianmin |title=Web and Big Data |chapter=Matching Consecutive Subpatterns over Streaming Time Series |date=2018 |editor-last=Cai |editor-first=Yi |editor2-last=Ishikawa |editor2-first=Yoshiharu |editor3-last=Xu |editor3-first=Jianliang |chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-96893-3_8 |series=Lecture Notes in Computer Science |volume=10988 |language=en |location=Cham |publisher=Springer International Publishing |pages=90–105 |doi=10.1007/978-3-319-96893-3_8 |arxiv=1805.06757 |isbn=978-3-319-96893-3|s2cid=21687305 }}{{Cite book |last1=Wu |first1=Jiaye |last2=Wang |first2=Peng |last3=Pan |first3=Ningting |last4=Wang |first4=Chen |last5=Wang |first5=Wei |last6=Wang |first6=Jianmin |title=2019 IEEE 35th International Conference on Data Engineering (ICDE) |chapter=KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping |date=2019 |pages=866–877 |doi=10.1109/ICDE.2019.00082|arxiv=1710.00560 |isbn=978-1-5386-7474-1 |s2cid=46926461 }}{{Cite journal |last1=Mao |first1=Dongfang |last2=Li |first2=Tianan |last3=Huang |first3=Xiangdong |last4=Yuan |first4=Jun |last5=Xu |first5=Yi |last6=Wang |first6=Jianmin |date=27 April 2020 |title=The design of Apache IoTDB distributed framework |url=https://www.researchgate.net/publication/341048663 |journal=National Database Conference |volume=50 |issue=5 |pages=621–636 |doi=10.1360/SSI-2019-0189|s2cid=219053248 |doi-access=free }}{{Cite journal |last1=Qiao |first1=Jialin |last2=Huang |first2=Xiangdong |last3=Wang |first3=Jianmin |last4=Wong |first4=Raymond K. |date=2020-01-01 |title=Dual-PISA: An index for aggregation operations on time series data |url=https://www.sciencedirect.com/science/article/pii/S0306437918305489 |journal=Information Systems |language=en |volume=87 |pages=101427 |doi=10.1016/j.is.2019.101427 |s2cid=201127537 |issn=0306-4379|url-access=subscription }}
Features
= Flexible and cross-platform deployment =
IoTDB is designed to fit three deployment scenarios: 1) file-based storage or embedded time-series database on edge appliance like Raspberry Pi, 2) standalone TSDB on Industrial PC and 3) distributed TSDB or Hadoop cluster with TsFile. IoTDB provides users a one-click installation tool on the cloud, once-decompressed-used terminal tool and the bridging tool between cloud platforms and terminal tools (Data Synchronization Tool).
= Low storage cost =
IoTDB can reach a high compression ratio of disk storage, which means IoTDB can store the same amount of data with less hardware disk cost.{{Cite news |title=IoTDB User Guide: Features |url=https://iotdb.apache.org/UserGuide/Master/IoTDB-Introduction/Features.html |access-date=18 November 2022 |newspaper=Iotdb Website}}
= Efficient directory structure =
= High-throughput read and write =
IoTDB supports millions of low-power devices' strong connection data access, high-speed data read and write for intelligent networking devices and mixed devices mentioned above. Currently, IoTDB supports the ingestion rate of up to 30 million data points per second on a single node.{{Cite web |last=vogler |title=Automation Gateway with Apache IoTDB… {{!}} RocWorks |url=https://www.rocworks.at/wordpress/?p=1062 |access-date=2022-12-13 |language=en-US}}
= Rich query semantics =
= Easy to get started =
= Intense integration with open source ecosystem =
IoTDB supports Hadoop, Spark, etc. analysis ecosystems and Grafana visualization tool.{{Cite web |title=Apache IoTDB Dashboard v0.13.1 |url=https://grafana.com/grafana/dashboards/16132-apache-iotdb-dashboard/ |access-date=2022-12-13 |website=Grafana Labs |language=en}}
Licensing
The Apache 2.0 License is a permissive free software license written by the Apache Software Foundation. It allows end users to modify parts of the original code as long as it contains the appropriate documentation that Apache requires within the redistributed code.{{Cite web |title=Apache License, version 2.0 |url=https://www.apache.org/licenses/LICENSE-2.0 |access-date=18 November 2022 |website=The Apache Software Foundation}}