The proliferation of Internet of Things (IoT) devices and the increasing use of sensors to monitor industrial systems and collect data for scientific experiments has caused critical issues as far as the efficient storage and the analysis of sensor-generated data are concerned. Sensor-generated data are routinely represented as a sequence of values over time, commonly referred to as time-series. Time-series can be finite or unbounded and are ordered by time. The volume of generated data is determined by the sampling rate of the sensor. Typically, we expect that a small and regular interval is used, resulting in large volumes of data at high velocity. Consequently, significant challenges have emerged with regards to storing and querying time-series data.
The unprecedented scale of sensor data generated today renders the use of general purpose Database Management Systems (DBMSs) inappropriate for time-series management. The core functionality of a typical DBMS provides services such as updates and transactions that cause unnecessary overhead when working with time-series. The limitations of DBMSs, the particular features of time-series, and the ever-increasing need for storing and analyzing data collected by sensors lead to the emergence of Time Series Management Systems (TSMSs) in both academic and industrial research.
We envisage the design and implementation of an innovative system especially designed for the management of time-series data. In correspondence with the challenges of storing and querying time-series, we will focus on the following objectives:
- Horizontal scalability: We will build our distributed storage layer on top of a distributed file-system or a distributed database that will partition the time-series data to numerous nodes and allow for horizontal scalability.
- Optimized storage: We will perform a rigorous study of compression techniques (lossless/lossy) and mathematical models that can approximate time-series segments (lossy). Then, we will capitalize on these findings by implementing a storage-optimization layer offering important savings and significantly reducing the total I/O operations of our TSMS.
- Stream processing: Our TSMS should always be available to take writes. We expect that the write rate will exceed the read rate by a couple orders of magnitude. The ingested data should become available for querying shortly after they are produced. We also aim to provide online computation of aggregates.
- Indexing: We will design, implement and assess the impact of indices on query processing time. Indices on both time and value of time-series data will be considered.
Paper accepted for publication in VLDB 2022
Our work entitled "Chimp: Efficient Lossless Floating Point Compression for Time Series Databases" has been accepted for publication in VLDB 2002, that will take place in Sydney, Australia!
3rd Research Workshop of the School of Information Sciences and Technology
A poster with preliminary results of our project was featured in the 3rd Research Workshop of the School of Information Sciences and Technology - AUEB Our poster is available here: Poster
Call for prospective PhD student applications
In the context of the DeLorean project we are seeking a prospective PhD student to work with our team. The call is open to high-potential students that wish to work on data management, data science and data engineering. The focus will be on timeseries management as the student is expected to pursue some of the following research directions:
- Optimized storage through compression: We are interested in lossless, lossy and multivariate compression schemes, as well as approaches that combine symbolic representation with ML models.
- Window aggregations for TSMS: Aggregations are central operations in timeseries management. We will propose approaches that are suitable for contemporary timeseries representations to efficiently derive summaries and perform tasks such as incremental sliding window aggregation.
- ML-based TSMS knob tuning: We will investigate how we can propose ML based tuning methods that can outperform human DBAs while automatically tuning a TSMS based on a variety of workloads.
Panagiotis LiakosPrincipal Investigator
Katia PapakonstantinopoulouAdvisory Board Member
Yannis KotidisAdvisory Board Member
Xenophon KitsiosPhD Candidate
Master Thesis Topics
Multivariate compression for Timeseries Databases
The student will perform a rigorous study of multivariate compression techniques to come up with an approach suitable for multivariate compression that will be implemented as an extension of an open-source timeseries database, such as InfluxDB. We are interested in many data types (integer, floating point, string) and the approach should be fast enough to be suitable for high ingestion rates required in the context of timeseries databases.
Lossy compression for Timeseries Databases
Several approaches providing lossy compression for timeseries have been proposed, including wavelets, PCA, SAX, PMC-Mean and Swing. However, they are either too complex or offer low accuracy guarantees. The goal of the thesis is the design and implementation of a novel lossy compression approach tailored to the needs of timeseries databases.
In-memory Timeseries Management System
Design and implementation of an in-memory timeseries management system providing compression capabilities for IoT devices with low power consumption.
Sliding Window Aggregations for Timeseries Databases
The thesis will focus on proposing and implementing optimizations for sliding window aggregation queries as an extension of an open-source timeseries database, such as InfluxDB.
Timeseries Approximation through Sampling
Obtaining a sample of a relatinal database table is a well known problem, and earlier approaches include techniques such as the creation of materialized sample views. The goal of this thesis is to design and implement techniques for approximating a timeseries through sampling in the context of a timeseries database management system.
DeLorean has received funding from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Innovation (GSRI), under grant agreement No 779. It will be hosted at the Department of Informatics of the Athens University of Economics and Business, Greece.
76 Patission Str., 104.34, Athens
+30 210 727 5165