Fix silent error-swallowing in Firebolt scripts; drop bogus `t3a.small` results by thevar1able · Pull Request #970 · ClickHouse/ClickBench

thevar1able · 2026-07-01T17:55:03Z

Resolves #945

Summary

firebolt/query and firebolt/load (and their -parquet/-parquet-partitioned
copies) checked for failure with jq -e '.errors' ..., but jq -e also exits
non-zero on malformed or empty input — which the old code treated identically
to "no .errors key present," i.e. success. A crashed/overloaded engine that
comes back with an empty 200 body or a non-JSON body could silently be
recorded as a fast, error-free query with no timing on stderr, or (for load)
surface only as an opaque curl: (22) ... error 500 with the actual reason
discarded.

query: now checks HTTP status, non-empty body, and JSON-validity
explicitly before checking .errors, and validates .statistics.elapsed
is actually numeric before declaring success.
load: same idea via a run_sql helper — prints the real response body
on any non-200/invalid-JSON/.errors response instead of swallowing it.

Validated that both cases fail to load dataset on t3a.small:

Load time: 0.185
bench: data-size after load is '211108' (<5 GB)
bench: ClickBench's hits dataset doesn't fit in <5 GB on any
bench: system in the catalog; treating this as a partial load
bench: (likely an OOM kill mid-COPY).
Disk usage after: 35281371136
System: firebolt-parquet
Machine: t3a.small
Total time: 233

Load time: 0.986
bench: data-size after load is '170112' (<5 GB)
bench: ClickBench's hits dataset doesn't fit in <5 GB on any
bench: system in the catalog; treating this as a partial load
bench: (likely an OOM kill mid-COPY).
Disk usage after: 35239878656
System: firebolt-parquet-partitioned
Machine: t3a.small
Total time: 144

rschu1ze

I added updated measurements for firebolt - @thevar1able triggered them in the automation yesterday. These look reasonable, i.e. the updated error handling at least didn't break measurements.

BUT: The the automatic run for firebolt did not run on t3a.small, and it also did not run for firebolt-parquet or on firebolt-parquet-partitioned, so we don't have proof yet that the new error handling really works.

@thevar1able Could you start these extra runs^^ and push results to this PR? Besides that, LGTM.

thevar1able · 2026-07-02T13:59:33Z

@rschu1ze none of those jobs can run on t3a.small, see PR description. They fail on data ingest.

Also the measurements from other tasks are probably not so meaningful, I got them before learning that I need to run the script with branch= var, i.e. they used main branch, without error handling fix.

rschu1ze · 2026-07-02T14:36:19Z

@rschu1ze none of those jobs can run on t3a.small, see PR description. They fail on data ingest.

Ah, I needed to re-read the PR msg. Also, I found the failing run on t3a.small (here).

Also the measurements from other tasks are probably not so meaningful, I got them before learning that I need to run the script with branch= var, i.e. they used main branch, without error handling fix.

That's still fine. New results will be generated on the next run of the automation.

thevar1able added 3 commits July 1, 2026 19:53

Check for jq exit code

b9062e0

Check for exit code during data load

12bacac

Remove invalid run records

b147d18

thevar1able changed the title ~~Fix silent error-swallowing in Firebolt scripts; drop bogus t3a.small results~~ Fix silent error-swallowing in Firebolt scripts; drop bogus t3a.small results Jul 1, 2026

thevar1able requested a review from rschu1ze July 1, 2026 17:55

Update measurements

e6c488e

rschu1ze approved these changes Jul 2, 2026

View reviewed changes

rschu1ze merged commit b21efd5 into main Jul 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix silent error-swallowing in Firebolt scripts; drop bogus `t3a.small` results#970

Fix silent error-swallowing in Firebolt scripts; drop bogus `t3a.small` results#970
rschu1ze merged 4 commits into
mainfrom
fix-firebolt-error-handling

thevar1able commented Jul 1, 2026

Uh oh!

rschu1ze left a comment

Uh oh!

thevar1able commented Jul 2, 2026 •

edited

Loading

Uh oh!

rschu1ze commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

thevar1able commented Jul 1, 2026

Summary

Uh oh!

rschu1ze left a comment

Choose a reason for hiding this comment

Uh oh!

thevar1able commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rschu1ze commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thevar1able commented Jul 2, 2026 •

edited

Loading