Skip to content

Fix silent error-swallowing in Firebolt scripts; drop bogus t3a.small results#970

Merged
rschu1ze merged 4 commits into
mainfrom
fix-firebolt-error-handling
Jul 2, 2026
Merged

Fix silent error-swallowing in Firebolt scripts; drop bogus t3a.small results#970
rschu1ze merged 4 commits into
mainfrom
fix-firebolt-error-handling

Conversation

@thevar1able

Copy link
Copy Markdown
Member

Resolves #945

Summary

firebolt/query and firebolt/load (and their -parquet/-parquet-partitioned
copies) checked for failure with jq -e '.errors' ..., but jq -e also exits
non-zero on malformed or empty input — which the old code treated identically
to "no .errors key present," i.e. success. A crashed/overloaded engine that
comes back with an empty 200 body or a non-JSON body could silently be
recorded as a fast, error-free query with no timing on stderr, or (for load)
surface only as an opaque curl: (22) ... error 500 with the actual reason
discarded.

  • query: now checks HTTP status, non-empty body, and JSON-validity
    explicitly before checking .errors, and validates .statistics.elapsed
    is actually numeric before declaring success.
  • load: same idea via a run_sql helper — prints the real response body
    on any non-200/invalid-JSON/.errors response instead of swallowing it.

Validated that both cases fail to load dataset on t3a.small:

Load time: 0.185
bench: data-size after load is '211108' (<5 GB)
bench: ClickBench's hits dataset doesn't fit in <5 GB on any
bench: system in the catalog; treating this as a partial load
bench: (likely an OOM kill mid-COPY).
Disk usage after: 35281371136
System: firebolt-parquet
Machine: t3a.small
Total time: 233
Load time: 0.986
bench: data-size after load is '170112' (<5 GB)
bench: ClickBench's hits dataset doesn't fit in <5 GB on any
bench: system in the catalog; treating this as a partial load
bench: (likely an OOM kill mid-COPY).
Disk usage after: 35239878656
System: firebolt-parquet-partitioned
Machine: t3a.small
Total time: 144

@thevar1able thevar1able changed the title Fix silent error-swallowing in Firebolt scripts; drop bogus t3a.small results Fix silent error-swallowing in Firebolt scripts; drop bogus t3a.small results Jul 1, 2026
@thevar1able thevar1able requested a review from rschu1ze July 1, 2026 17:55

@rschu1ze rschu1ze left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added updated measurements for firebolt - @thevar1able triggered them in the automation yesterday. These look reasonable, i.e. the updated error handling at least didn't break measurements.

BUT: The the automatic run for firebolt did not run on t3a.small, and it also did not run for firebolt-parquet or on firebolt-parquet-partitioned, so we don't have proof yet that the new error handling really works.

@thevar1able Could you start these extra runs^^ and push results to this PR? Besides that, LGTM.

@thevar1able

thevar1able commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

@rschu1ze none of those jobs can run on t3a.small, see PR description. They fail on data ingest.

Also the measurements from other tasks are probably not so meaningful, I got them before learning that I need to run the script with branch= var, i.e. they used main branch, without error handling fix.

@rschu1ze

rschu1ze commented Jul 2, 2026

Copy link
Copy Markdown
Member

@rschu1ze none of those jobs can run on t3a.small, see PR description. They fail on data ingest.

Ah, I needed to re-read the PR msg. Also, I found the failing run on t3a.small (here).

Also the measurements from other tasks are probably not so meaningful, I got them before learning that I need to run the script with branch= var, i.e. they used main branch, without error handling fix.

That's still fine. New results will be generated on the next run of the automation.

@rschu1ze rschu1ze merged commit b21efd5 into main Jul 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Numbers that are unfortunately too good to be true

2 participants