Is your feature request related to a problem or challenge?
Summary
Support collect_list and collect_set as window functions in DataFusion.
These are commonly used in Spark and other query engines to collect values within a window frame and enable use cases such as rolling lists, session analysis, and sequence-based analytics.
Example
SELECT
user_id,
ts,
collect_list(event) OVER (
PARTITION BY user_id
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS events
FROM t;
SELECT
user_id,
ts,
collect_set(event) OVER (
PARTITION BY user_id
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS unique_events
FROM t;
Motivation
Supporting these functions would improve Spark compatibility and align DataFusion with other query engines that support aggregate window functions, including DuckDB (list), Trino/Presto (array_agg), PostgreSQL (array_agg), BigQuery (ARRAY_AGG), and Snowflake.
Acceptance Criteria
- Support
collect_list(...) OVER (...)
- Support
collect_set(...) OVER (...)
- Support standard window frames (
ROWS and RANGE) where applicable
- Add SQL and DataFrame tests covering common window specifications
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
Summary
Support
collect_listandcollect_setas window functions in DataFusion.These are commonly used in Spark and other query engines to collect values within a window frame and enable use cases such as rolling lists, session analysis, and sequence-based analytics.
Example
Motivation
Supporting these functions would improve Spark compatibility and align DataFusion with other query engines that support aggregate window functions, including DuckDB (
list), Trino/Presto (array_agg), PostgreSQL (array_agg), BigQuery (ARRAY_AGG), and Snowflake.Acceptance Criteria
collect_list(...) OVER (...)collect_set(...) OVER (...)ROWSandRANGE) where applicableDescribe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response