Understanding Time Alignment and Aggregation in Apache IoTDB

In time-series databases like Apache IoTDB, understanding how queries align and filter data across different paths is crucial for writing accurate queries. A common point of confusion arises when a WHERE predicate on one timeseries path unexpectedly filters out aggregate calculations on another. Let's look at why this happens and how IoTDB handles cross-path filtering under the hood.

The Scenario

Consider a schema with two different paths under root.sg1.cross_filter: a state indicator (line7.run_state) and a temperature sensor (line8.temperature).

CREATE TIMESERIES root.sg1.cross_filter.line7.run_state WITH DATATYPE=TEXT, ENCODING=PLAIN;
CREATE TIMESERIES root.sg1.cross_filter.line8.temperature WITH DATATYPE=DOUBLE, ENCODING=PLAIN;

We insert the following sample data points:

INSERT INTO root.sg1.cross_filter.line7(timestamp, run_state) VALUES (1000, 'OK');
INSERT INTO root.sg1.cross_filter.line7(timestamp, run_state) VALUES (2000, 'DOWN');
INSERT INTO root.sg1.cross_filter.line7(timestamp, run_state) VALUES (3000, 'OK');

INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (1000, 10.0);
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (2000, 100.0);
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (3000, 30.0);
INSERT INTO root.sg1.cross_filter.line8(timestamp, temperature) VALUES (4000, 40.0);

If we run a query to calculate the average temperature only when the system state is 'OK':

SELECT avg(line8.temperature)
FROM root.sg1.cross_filter
WHERE line7.run_state = 'OK';

The result returned is 20.0 (the average of 10.0 at t=1000 and 30.0 at t=3000), rather than including the value at t=4000.

Why Does the WHERE Predicate Filter the Aggregate?

In Apache IoTDB's tree model, when you query multiple timeseries paths in a single statement, the database performs implicit time alignment (similar to a full outer join on the timestamp axis).

Here is how the query engine processes the data step-by-step:

1. Alignment on the Shared Time Axis

First, IoTDB aligns all involved paths by timestamp, filling in missing points with null:

Timestampline7.run_stateline8.temperature
1000'OK'10.0
2000'DOWN'100.0
3000'OK'30.0
4000null40.0

2. Applying the WHERE Filter

Next, the WHERE clause is evaluated for each aligned row. The condition is line7.run_state = 'OK':

  • At t=1000: 'OK' = 'OK' is True. (Kept)
  • At t=2000: 'DOWN' = 'OK' is False. (Filtered out)
  • At t=3000: 'OK' = 'OK' is True. (Kept)
  • At t=4000: null = 'OK' evaluates to Unknown/False. (Filtered out)

Because there is no data for line7.run_state at 4000 ms, its value is treated as null. In SQL three-valued logic, any comparison with null yields unknown, meaning the row at t=4000 is excluded.

3. Calculating the Aggregate

Finally, the aggregation function avg() is applied only to the remaining rows (t=1000 and t=3000):

(10.0 + 30.0) / 2 = 20.0

Conclusion

Yes, your understanding is entirely correct. In Apache IoTDB, the WHERE predicate first filters the aligned rows along the shared time axis. If a path referenced in the WHERE clause has no value at a given timestamp, that timestamp is treated as null and will be excluded by the comparison operator. This ensures consistent, time-correlated analysis across different timeseries paths.