How Cross-Path Filtering and Time Alignment Work in Apache IoTDB
Understanding Time Alignment and Aggregation in Apache IoTDB
In time-series databases like Apache IoTDB, understanding how queries align and filter data across different paths is crucial for writing accurate queries. A common point of confusion arises when a WHERE predicate on one timeseries path unexpectedly filters out aggregate calculations on another. Let's look at why this happens and how IoTDB handles cross-path filtering under the hood.
The Scenario
Consider a schema with two different paths under root.sg1.cross_filter: a state indicator (line7.run_state) and a temperature sensor (line8.temperature).
CREATE TIMESERIES root.sg1.cross_filter.line7.run_state WITH DATATYPE=TEXT, ENCODING=PLAIN;
CREATE TIMESERIES root.sg1.cross_filter.line8.temperature WITH DATATYPE=DOUBLE, ENCODING=PLAIN;We insert the following sample data points:
INSERT INTO root.sg1.cross_filter.line7(timestamp, run_state) VALUES (1000, 'OK');
INSERT INTO root.sg1.cross_filter.line7(timestamp, run_state) VALUES (2000, 'DOWN');
INSERT INTO root.sg1.cross_filter.line7(timestamp, run_state) VALUES (3000, 'OK');
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (1000, 10.0);
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (2000, 100.0);
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (3000, 30.0);
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (4000, 40.0);If we run a query to calculate the average temperature only when the system state is 'OK':
SELECT avg(line8.temperature)
FROM root.sg1.cross_filter
WHERE line7.run_state = 'OK';The result returned is 20.0 (the average of 10.0 at t=1000 and 30.0 at t=3000), rather than including the value at t=4000.
Why Does the WHERE Predicate Filter the Aggregate?
In Apache IoTDB's tree model, when you query multiple timeseries paths in a single statement, the database performs implicit time alignment (similar to a full outer join on the timestamp axis).
Here is how the query engine processes the data step-by-step:
1. Alignment on the Shared Time Axis
First, IoTDB aligns all involved paths by timestamp, filling in missing points with null:
| Timestamp | line7.run_state | line8.temperature |
|---|---|---|
| 1000 | 'OK' | 10.0 |
| 2000 | 'DOWN' | 100.0 |
| 3000 | 'OK' | 30.0 |
| 4000 | null | 40.0 |
2. Applying the WHERE Filter
Next, the WHERE clause is evaluated for each aligned row. The condition is line7.run_state = 'OK':
- At t=1000:
'OK' = 'OK'is True. (Kept) - At t=2000:
'DOWN' = 'OK'is False. (Filtered out) - At t=3000:
'OK' = 'OK'is True. (Kept) - At t=4000:
null = 'OK'evaluates to Unknown/False. (Filtered out)
Because there is no data for line7.run_state at 4000 ms, its value is treated as null. In SQL three-valued logic, any comparison with null yields unknown, meaning the row at t=4000 is excluded.
3. Calculating the Aggregate
Finally, the aggregation function avg() is applied only to the remaining rows (t=1000 and t=3000):
(10.0 + 30.0) / 2 = 20.0Conclusion
Yes, your understanding is entirely correct. In Apache IoTDB, the WHERE predicate first filters the aligned rows along the shared time axis. If a path referenced in the WHERE clause has no value at a given timestamp, that timestamp is treated as null and will be excluded by the comparison operator. This ensures consistent, time-correlated analysis across different timeseries paths.