Versions benchmark rework by alexey-milovidov · Pull Request #968 · ClickHouse/ClickBench

alexey-milovidov · 2026-07-01T06:39:07Z

No description provided.

Rebuild the ClickHouse Versions Benchmark infrastructure from scratch around Docker images so every historical and current version can be run identically, replacing the old apt-based scripts. - list-versions.sh: select versions from the authoritative version_date.tsv (all 1.1.x + latest patch per YY.MM, 151 versions), resolve each to a yandex/clickhouse image, package, or unavailable (image-aware, handles 3- vs 4-component tag mismatches). - prepare-data/: build canonical Native data files for hits, SSB (SF100), mgbench (logs1/2/3) and NYC taxi using only oldest-compatible types (Nullable kept only where the queries need IS NULL); stored zstd-6 and streamed via `zstd -dc | clickhouse-client` at load time. - create/: per-version DDL (legacy MergeTree(date,(key),8192) for the earliest 1.1.x, modern PARTITION BY/ORDER BY otherwise) with column schemas under create/schema/. - run-version.sh / run-all.sh: provider abstraction (image + package-in- ubuntu fallback), IPv4 listen override and a matching-version sidecar client image to repair the oldest server images (back to 1.1.54019), plain INSERT ... FORMAT Native loading, 75-query set timed per dataset. Validated full-scale on 1.1.54019 (oldest) and 1.1.54378; data files are gitignored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add build-from-source/ to compile and run the ClickHouse versions that were never published as a Docker image or package — the bare-number early tags 53973..54011 and the 1.1.x releases 54165/54318/54335/54336/54358/54362/54370. - Dockerfile.ubuntu1604: build a tag in its contemporary environment (Ubuntu 16.04) into a runnable clickhouse-built:<v> image. Handles the era quirks: compiler escalates with date (gcc-5 -> 6 -> 7, the later two from the ubuntu-toolchain-r PPA via ARG GCC), strip the hardcoded -Werror, tolerant submodule init (contrib/zookeeper's upstream is gone -> cmake falls back to system libzookeeper-mt-dev), IPv4 listen, a clickhouse multi-call shim and the pre-created data dirs the 2016 server needs. - build.sh / build-all.sh: build one or many (JOBS concurrent — a single make -j$(nproc) doesn't saturate the cores on these small codebases). - versions.txt: the build list with tag, date and required GCC per version. - list-versions.sh: route these versions to their clickhouse-built:<v> image and order all 189 versions chronologically; nothing is "unavailable" anymore. - run-all.sh: load PARALLEL versions concurrently, then benchmark sequentially. - run-version.sh: LOAD_DATASETS lets a run skip a dataset's load (e.g. the huge taxi table) while its queries still run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

create.sh parsed bare build-number versions (e.g. "53982", the pre-1.1 early-release tags) as major=53982 >= 18 and emitted modern PARTITION BY / ORDER BY syntax, which the 2016 servers reject — so every table create failed and those versions produced all-null results. Treat a bare numeric version as an early build (custom partitioning landed at build 54310; all bare tags predate it), so they correctly get the legacy MergeTree(date,(key),8192) engine. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Mirror the main ClickBench cloud flow for the Versions Benchmark: benchmark one version per fresh VM and send the result to the sink. - cloud-init.sh.in: install Docker, download the prepared Native files from s3://clickhouse-public-datasets/versions-benchmark/, build the image from source when the version has none (clickhouse-built:* via build-from-source), run run-version.sh, POST the result JSON (enriched with machine + kind) and the log to sink.data on play.clickhouse.com, then terminate. - run-benchmark.sh: resolve a version's image/tag/gcc and launch a VM (terminate-on-shutdown, capacity-retry), as the main launcher does. - run-all-benchmarks.sh: one VM per runnable version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

If clickhouse dies during a query (e.g. OOM-killed), the container exits but its data layer survives. Detect the dead server (SELECT 1 fails), revive it with docker start (relaunch the daemon for the package provider), and retry the query up to CRASH_RETRIES (default 2). This keeps one heavy query from nulling out every subsequent query for that version. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

e.g. '26.6' resolves to '26.6.1.1193' (one patch per YY.MM is kept), and the launcher canonicalises to the full version. Exact versions and bare tags still match directly; an ambiguous prefix lists the candidates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A prefix now picks the newest matching version instead of erroring on ambiguity: 24 -> 24.12.x, 1.1 -> the latest 1.1.x, 26.6 -> 26.6.1.1193. Exact versions and bare tags still match directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Builds an EBS volume holding the prepared Native files (sized just-enough), snapshots it (labelled versions-data, tagged Name=clickbench-versions-data) and deletes the working volume. Standalone and not wired into the launcher: a snapshot-backed volume lazy-loads from S3, so for one-shot VMs it is not faster than the plain S3 download unless Fast Snapshot Restore or volume reuse is used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…loop stdin - load_data now echoes each CREATE (with DDL), the INSERT ... FORMAT Native and source file, and "loaded <table>: N rows in Ns" — so the cloud-init log shows what's happening during ingest. - Client invocations set HOME=/tmp (old images' clickhouse user has HOME /nonexistent -> history-file error) and TZ=UTC, and the sidecar client mounts the host /usr/share/zoneinfo (some old client images ship no tzdata and fail at startup with "Could not determine local time zone"). - Fix a stdin-drain regression from the crash-retry: the per-query `docker exec/run -i` client (and the SELECT 1 liveness probe) consumed the query file the benchmark loop reads on stdin, truncating each version to ~60/75 queries. Read queries on FD 3 and give the probe </dev/null. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Echo 'qN [dataset]: [t1, t2, t3]' to the log as each query finishes, and cat results/<version>.json at the end so the full result is visible in the run output / cloud-init log (and thus received via the sink), not just written to a file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pipe the compressed file through pv before zstd -> INSERT, so loads report a periodic progress bar (percentage, rate, ETA) based on the known file size. Falls back to cat when pv is absent; pv added to the cloud-init apt install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fetch the actual git commit date for every from-source version (the bare tags 53973..54011 had bogus 2016-01-01 fallbacks from an earlier rate-limited fetch) and record it in versions.txt; list-versions.sh now reports that commit date for built versions instead of the version_date.tsv release date. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Print 'table <TAB> bytes' from system.parts (database 'default'), falling back for old versions without it to du -sLb on the data dir (/var/lib/clickhouse or /opt/clickhouse), following symlinks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Cold-cache query reads and ingest are disk-bound, so use gp3 (default 1000 MB/s / 16000 IOPS, both overridable via throughput=/iops=) instead of gp2 whose throughput is tied to size. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Build the list of dataset files and fetch them concurrently (xargs -P 8 wget --continue --progress=dot:giga) instead of one at a time. Missing files (e.g. ssb/taxi not yet uploaded) fail their own wget and are skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Use aria2c to split each file into parallel byte-range segments (-x16 -s16) and run several files at once (-j4), so the huge taxi file isn't one slow stream and small files don't wait behind it. Falls back to parallel wget if aria2c is absent; aria2 added to the cloud-init install. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aria2 -j (multiple files at once) cross-contaminated per-file sizes (it used hits's length for ssb's byte ranges), so only hits downloaded and ssb/mgbench were aborted -> skipped at load time. Run one aria2c per file via xargs -P instead: each still uses 16 parallel segments, files download concurrently, and --allow-overwrite re-fetches a non-resumable pre-existing file rather than 416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Expand the Versions Benchmark to 9 datasets / 344 queries, each loaded into its own database so same-named tables never collide (e.g. TPC-H and TPC-DS both have `customer`): - TPC-H (SF40) and TPC-DS (SF32): official schemas/queries from the ClickHouse repo, Decimal->Float64, NULL->type defaults, synth_date for the legacy engine. - Coffee Shop (fact_sales_500m, minus the unused order_line_id column) from the published Iceberg tables. - ontime (12 used columns) and UK price-paid, from the docs' saved copies. - Join Order Benchmark (21 IMDB tables, 113 queries) with a CSV re-encoder. - Narrow taxi to the 5 columns its queries use. Also: per-dataset databases in run-version.sh/create.sh; default 6 tries (1 cold + 5 hot); dataset-qualified column schemas. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… timeout - run-version.sh builds clickhouse-built:* images on demand (ensure_built_image) when absent, using the recipe from versions.txt (build.sh) or monthly.tsv (Dockerfile.reconstruct) — nothing is pulled from a registry. - Load all datasets in parallel (one background job per dataset / database). - Per-query timeout (QUERY_TIMEOUT, default 100s): a query that exceeds it, or crashes the server, records null and skips its remaining tries (the server is revived after a crash so later queries still run). - ontime: sort the dump (Year, Month, FlightDate, ...) so INSERT blocks are date-contiguous and don't exceed max_partitions_per_insert_block (~450 months). - Reconstruct the build system for pre-2016-03 snapshots that lack one: Dockerfile.reconstruct + reconstruct.sh transplant the 2016-03 donor's build system + contrib, glob renamed sources, stub QuickLZ/MongoDB, generate re2_st, add an isnan shim; build-monthly.sh sweeps monthly.tsv. Strip the never-public add_subdirectory(private) from the 2016-06..08 tags in Dockerfile.ubuntu1604. - cloud-init: install docker-buildx; drop the now-redundant explicit build step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…2 builds - run-benchmark.sh: default volume 500 -> 1000 GB; parallel loading of all datasets peaks disk usage above the on-disk total (NOT_ENOUGH_SPACE otherwise). - reconstruct.sh: build the pre-2016-02 era with the old libstdc++ ABI (_GLIBCXX_USE_CXX11_ABI=0) — that era used the refcounted (COW) std::string, which the struct sizing assumes (Field's DBMS_TOTAL_FIELD_SIZE=32). Also: generic prune of donor-listed sources absent in an older target (encoding-safe), disable utils/, strip add_subdirectory(private), vendor Poco/Ext/ScopedTry.h. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

An aborted/incomplete INSERT (crash, OOM, disk full, interrupted stream) can leave a partially-loaded table. Drop it so the dataset's queries report null instead of timing against incomplete data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add two subobjects to results/<version>.json: - "load_time": {dataset: sum of its tables' load times in seconds} — accumulated per table during the (parallel, possibly separate) load phase into a stats file and summed per dataset at bench time. - "data_size": {dataset: on-disk bytes} — per-database sum(bytes_on_disk) from system.parts (each dataset is its own database), with a data-directory du fallback for old versions lacking that column. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2016-01 and older predate a big refactor and were built on trusty/gcc-5 with the old libstdc++ ABI (refcounted std::string). Reconstruct.sh + Dockerfile.reconstruct now build them end-to-end (verified: 2016-01 builds server+client and boots, SELECT version() -> 0.0.53400): - Base is now ubuntu:14.04 so gcc-5 defaults to the old ABI and the system boost is old-ABI too (16.04's new-ABI boost broke the client's boost::program_options). - Vendor the never-public Yandex libs from the donor: statdaemons embedded dictionaries (via DB/Dictionaries/Embedded) and the daemon base (a thin BaseDaemon compat carrying the used API + --config-file handling, avoiding the donor's newer zkutil/graphite deps); stub statdaemons/Interests.h. - Glob the whole dbms library (excluding the Server/Client executables + ODBC driver) so renamed/moved sources compile regardless of the donor's file lists. - Force-include <numeric>/<random> (not transitively available on this toolchain); build re2 then the server and client with single-target makes (avoid a recursive-make race on shared static libs); os.walk for Python 3.4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…s, robust revive Improvements to the versions-benchmark runner driven by inspecting the live results in sink.data: - Log the reason for every null: on a query failure run-version.sh now emits the server's error text (unsupported syntax/function), a timeout notice, or a crash notice — once per query, tagged with its "qN [dataset]" label. Errors were previously discarded silently. - Per-query minimum supported version (queries/<ds>.minver, aligned to <ds>.sql): "0" runs everywhere, a version is the first release known to run the query, "26.7" (future) means never seen to succeed. Below a query's minimum the runner records null without running it, saving time and avoiding crashes that would null later queries. Annotations computed from the 32 full runs. - Reorder QUERY_ORDER so the heavy, crash-/timeout-prone datasets (tpch, tpcds, job) run last: a late crash can no longer null earlier datasets as collateral (as happened to taxi on 20.4). - Best-effort revive_server: also relaunch the daemon in-place when the server process dies but the container stays up (previously a no-op for image providers), fall back to docker restart, use a longer timeout, and log container state/logs so a persistent failure is diagnosable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…brary wall) Extend the from-source reconstruction below 2015-12. At the 2015-11 -> 2015-12 boundary a coordinated refactor inlined several never-public external Yandex libraries and reshaped core containers; reproduce that so 2015-11 builds+boots (0.0.53350): - stats/*: append the templated intHash32<salt>+IntHash32 to the era's Hash.h (where 2015-12 moved it) and forward <stats/IntHash.h> there; overlay the UniquesHashSet / ReservoirSampler{,Deterministic} algorithms from a "last known-good month" PATCH_REF and forward the old <stats/...> paths to them. - ReservoirSampler: retarget its PODArray<Allocator<...>> buffer to std::vector to avoid back-porting the templated allocator through every container. - Strip illegal virt-specifiers from member-function templates (a modern gcc-5 error the era's compiler tolerated). - Extend the statdaemons compat auto-map to DB/Core (Exception.h etc. lived there in the oldest trees), make the MongoDB stub era-agnostic. - Dockerfile.reconstruct: PATCH_REF/PATCH_FILES overlay mechanism for the few gcc-hostile files fixed by 2015-12 (SummingSortedBlockInputStream, the stats algorithms). monthly-built.tsv records 2015-11 alongside 2015-12/2016-01/2016-02. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…add earlyoom Reconstruction — break the 2015-10 -> 2015-11 external-statdaemons boundary so 2015-10 builds and boots (0.0.53340). Before 2015-11 the codebase pulled the bulk of its infrastructure from the never-public external "statdaemons"/"stats" Yandex libraries; they were inlined in-repo at that boundary. Reproduce that: - Vendor DB::Exception + StackTrace + ErrnoException (the class itself was external here; DB/Core/Exception.h only #included it and added free functions). - Back-port the statdaemons infra from PATCH_REF (Stopwatch, ConfigProcessor, Pool{,WithFailover}Base, OptimizedRegularExpression(+.inl), HTMLForm, AIO, the HyperLogLog trio, NetException, Increment+CounterInFile, SimpleCache, threadpool, and the uniq/quantile algorithms), forwarding the old <statdaemons/*> (+ ext/*.hpp) include paths to them. - strconvert compat (escape.h escaped_for_like + hash64.h; MySQL dicts/OLAP are never exercised), Yandex/Revision.h stub, DB/Common/Exception.h and DB/Core/FieldVisitors.h forwards, ConnectionPoolWithFailover::getMany adapted to the newer 2-arg base (distributed plumbing, compile-only), BaseDaemon::sleep. - Overlay is now split so it can't downgrade newer, self-contained months: PATCH_FILES overwrites (only the gcc-hostile SummingSorted), PATCH_FILL adds the infra only when absent. monthly-built.tsv records 2015-10. Runner — skip loading a dataset when the version supports none of its queries: dataset_supported() checks the per-query .minver annotations; if every query is below the version's minimum (e.g. coffeeshop on 20.4), the dataset isn't loaded at all (its queries are recorded null anyway), saving load time and disk. Cloud-init — install and enable earlyoom (as the main ClickBench cloud-init does): loading all datasets in parallel (and heavy joins on old, less memory-efficient versions) can exhaust RAM; without earlyoom the kernel thrashes and the VM gets stuck. earlyoom kills the offender early; the server crash is then recovered by revive_server (or the query is recorded null). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… when launching run-benchmark.sh already retried run-instances on capacity/quota errors (as the main ClickBench launcher does), but a version sweep fires dozens of launches back-to-back and hits EC2 API throttling (RequestLimitExceeded / Throttling) far more than the single-shot main launcher. A throttled call was treated as a hard error and skipped that version — a likely cause of the large contiguous gaps in the runs (e.g. whole 23.x / 24.x missing). Add RequestLimitExceeded / Throttling to the retry set and factor the backoff into an aws_retry helper used for run-instances AND the pre-launch describe-instance-types / describe-images calls (a throttled describe would otherwise blank arch/ami and make the launch fail non-retryably). Genuine config errors still fail fast. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Remove the pre-rework apt-based harness (versions/scripts, versions/unified_scripts) and the old 75-query result JSONs, all superseded by the Docker-based pipeline. - Add fetch-results.sh: pull the latest Versions Benchmark result for each version from the sink (sink.data, kind=versions-benchmark), enrich with the release date (list-versions.sh / monthly.tsv), and write results/<version>.json. Uses $CONNECTION_PARAMS like the repo's collect-results.sh. - Regenerate results/ from the sink: 179 versions, each with per-dataset load_time and data_size, release_date, and the 344-query result array. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The old smart-pointer wrapper's operator bool() returned a boost::shared_ptr, relying on the implicit safe-bool conversion; trusty's boost makes operator bool explicit, so make the conversion explicit. Guarded on the file and the exact return statement, so it is a no-op once StoragePtr became std::shared_ptr. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…, headers, toggles Rework the Versions Benchmark page for the new 10-dataset / 344-query layout: - Split the data out of index.html into data.generated.js (as the main ClickBench page does). generate-results.sh now emits 'const datasets' (name + queries per dataset, in result[] order) and 'const data' (release-ordered per-version results) to that file instead of editing index.html in place. - index.html is now static markup + render logic that loads data.generated.js. - Add per-dataset load time (summed) and on-disk data size, shown in each dataset's header row per version, plus a Total row summing over the enabled datasets. - Group the detailed table into per-dataset sections with a header/separator row before each dataset (with a checkbox to toggle that dataset's queries). - Add a Dataset selector row to switch each dataset on/off; a disabled dataset is hidden from the table and excluded from the relative-time summary. - Generalise the hot metric to the 6-try runs and sort versions by release date. - fetch-results.sh: only take runs in the current 344-query format (skips 9 stale pre-rework sink rows) and write compact result files. Regenerated results/ (170 versions) and data.generated.js from the sink. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Load time and data size are now regular colorized table rows (per dataset, plus a Total that sums load time / totals data size over the enabled datasets), matching the main ClickBench page, instead of being crammed into the dataset header cells. - Dataset separator rows carry a checkbox in column 1, so the Total checkbox and all per-dataset checkboxes line up vertically. - Chart and table are always ordered by version release date, latest first (not by result). Port the main page's horizon (multi-scale) bar chart, and show the release date over the bar (as on the main page) instead of a hover balloon. - Calendar versions display as major.minor only (20.4.9.110 -> 20.4); the 1.1.54xxx scheme, dates and bare revisions stay full. - An absent result now counts as 2x that version's own worst query time. - Remove the annotation paragraphs at the bottom of the page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… 18.10.3 18.10.3's result was anomalous: data_size was null and hits/ssb/taxi/coffeeshop never loaded. Its run log shows the container stopped mid-load ('cannot exec in a stopped container') while five multi-GB tables (hits, taxi, tpch.orders, tpcds.catalog_sales, ssb.lineorder_flat) were streaming in parallel — earlyoom OOM-killed clickhouse-server, and being the container's PID 1 that stopped the whole container, failing every in-flight load. Fix in run-version.sh: - Load at most LOAD_PARALLEL (default 4) datasets concurrently instead of all ~10, so peak memory stays bounded on the older, less memory-efficient versions. - Add a revive-and-retry pass: reload any dataset that is not fully present, one at a time, reviving the server first if it died. load_one_dataset now skips already-loaded tables, so the retry only redoes what is missing. Remove the anomalous results/18.10.3.json and regenerate data.generated.js (169 versions); 18.10.3 can be re-run with the fixed loader. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The metric selector click handlers set selectors.metric and re-rendered the table but never refreshed the selector's active state, so 'Cold Run' / 'Hot Run' did not visually toggle. Call updateSelectors() from the handlers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Both have empty load_time and 4096-byte (empty-directory) data sizes -- nothing loaded, same failure class as 18.10.3. Remove their results and regenerate data.generated.js (167 versions); they can be re-run with the fixed loader. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- The Total load time / data size now show '—' for any version that is missing an enabled dataset (a skipped/failed load), since a partial sum understates it and is not comparable across versions. Per-dataset rows already show '—' for their own missing dataset. - Left-align the dataset name in the table's dataset-header rows (the details table cells are otherwise right-aligned). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ClickBench - Compact URL: encode the selectors as the shortest unique subsequence of each key, listing only the smaller of the selected/unselected side (encodeState/decodeState + helpers). The fragment drops from ~6 KB of base64 JSON to a few dozen characters. - Delete button: a ✕ on each summary row (shown on hover) removes that version from the selection, as on the main page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- If a dataset has no load time it was not loaded, so its data size is treated as null (both in the page and in run-version.sh, which now only reports sizes for datasets that recorded a load time -- avoids reporting an empty/partial directory). - Make the details header (version + date) sticky to the top and the two left columns (checkbox + query number) sticky to the left while scrolling the wide table. - Port the color-scale legend under the summary heading and enable tooltips on touch devices (:active), both from the main ClickBench page. (The 'skip queries failing on every selected version' guard was already present in renderSummary.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e + header polish - Metric selector: add Load Time and Data Size. The summary then ranks versions by total load time / data size over the enabled datasets (a version missing any enabled dataset shows '—'); the column heading noun updates accordingly. - Calendar versions (18.x+) now use the date of the FIRST release in their YY.MM line for display and sorting, not the latest patch's date (fetch-results.sh derives it from version_date.tsv; existing result files updated, data.generated.js regenerated). - Sticky: the header row pins to the very top and the two left columns (checkbox + query number) to the very left of the page — body top/left padding removed so they sit flush with no margin; column 2's offset is set to column 1's measured width. - Theme: follow the OS prefers-color-scheme when there is no stored choice, and set the attribute at bootstrap without an early render. - Left-align the dataset name in separator rows; show the header date smaller, non-bold and fainter. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… column 1 Restructure the dataset/Total separator rows into three cells: checkbox (sticky column 1), dataset name (sticky column 2), and a non-sticky band spanning the version columns. The name now stays pinned to the left like the query numbers, and the Total and per-dataset checkboxes sit in the same flush-left column 1 as the per-query checkboxes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add WHERE active to the system.parts sum so outdated/inactive parts still on disk (not yet cleaned up) are not counted in a dataset's reported data size. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…taset ones The per-query checkboxes inherited the right-aligned #details td default while the separator (dataset/Total) checkboxes were left-aligned, so they sat at different positions. Left-align column 1 for all rows so every checkbox shares the same left position. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ckbox column - Add a small left margin to the top header, the selectors, and the Detailed Comparison heading so they are not glued to the page edge, while the tables' sticky columns stay flush. - Shrink the checkbox column (tighter padding, smaller width); column 2 follows via the measured --qn-left offset. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

.tooltip-query had margin-left:-3rem, which pushed the balloon off the left of the page now that the query column sits flush near the edge. Anchor it at the cell's left (no negative margin, max-width so it extends rightward) and move the arrow to the left. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

max-width let the balloon shrink to the narrow query column, wrapping the text over many lines. Use an explicit width (50rem) anchored at the cell's left so it stays wide and extends rightward. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The query-number cell is a sticky cell (its own stacking context), so its tooltip sat below the sticky header/cells. Lift the whole cell (z-index 1000 on hover) and the tooltip so the balloon appears above everything. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Only add the vendored tryLogCurrentException(Poco::Logger*) overload when the tree already has some tryLogCurrentException to forward to. The pre-2014 snapshots have none at all, so the overload's body would call a non-existent function and recurse onto itself (const char* -> Poco::Logger* type error); they also have no ErrorHandlers.h that needs it. Guarded, so it's unchanged on 2015-era trees. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…shims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two more pre-2014 shims (guarded, no-ops on newer trees): - Add WriteBufferFromFileDescriptor::sync() (flush via next()) when the class lacks it; the back-ported CounterInFile.h calls sync() and this era only has next(). - Generalize the UniquesHashSet template-arg rewrite to bare local declarations (e.g. 'UniquesHashSet tmp_set;'), not just the typedef. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Strip the pre-2014 'namespace Yandex' qualifier generally in the DateLUT migration (libmysqlxx uses Yandex::DateLUTSingleton / Yandex::DateLUT::Values / Yandex::DayNum_t / Yandex::VisitID_t); the donor moved all of these to global scope. Broaden the pass to files that reference Yandex:: even without DateLUT. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…shims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Three pre-2014 shims for the legacy Server/OLAP + Embedded-dictionary code (guarded, no-ops on newer trees): - Drop the donor Server/CMakeLists.txt INCLUDEs of tools/init.d/CMakeLists.init and tools/logrotate/CMakeLists.logrotate when those fragments are absent (they are not .cmake/CMakeLists.txt so the overlay doesn't bring them; packaging-only). - Add DB::toString to WriteHelpers.h when absent (back-ported RegionsHierarchy.h uses it); same template the donor defines, via WriteBufferFromString + writeText. - Fill the reconstructed statdaemons/Interests.h with the interest-category bit flags the legacy OLAPAttributesMetadata.h masks against (OLAP HTTP interface is dead code for the benchmark, so distinct flag values suffice). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a Yandex/optimization.h shim (likely/unlikely branch-prediction macros) under the common/ prefix; pre-2013 code (libpocoext/ThreadNumber.cpp) includes it and the donor moved these macros to common/likely.h. Guarded, no-op where present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A version sweep launches many gp3-backed VMs back-to-back and can trip the account's aggregate EBS storage quota (VolumeLimitExceeded); that quota frees as earlier benchmarks terminate and their volumes are deleted, so add it to the retry set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Break the pre-2013-05 daemon/legacy-header wall (all guarded, no-ops on newer trees): - Map <Yandex/daemon.h> to BaseDaemon and drop the obsolete libcommon/src/daemon.cpp. - Vendor Poco/Ext/scopedTry.h (Poco::ScopedTry try-lock) used by StorageMergeTree. - Vendor mysqlxx/PoolWithFailover.h as a thin wrapper over the era's mysqlxx::Pool (the failover pool arrived 2013-05; the Embedded dicts that include it never run in the benchmark). - Rename the DateLUT day-number conversions safeFromDayNum/safeToDayNum -> fromDayNum/toDayNum (the donor dropped the 'safe' prefix; they are bounds-checked). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make the back-ported Embedded RegionsHierarchy.h include DB/IO/WriteHelpers.h directly (it uses DB::toString but only included ReadHelpers.h, relying on WriteHelpers arriving transitively -- which fails in this era). Guarded, no-op elsewhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…shims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 30 commits June 29, 2026 15:29

versions: default the benchmark VM to c7a.4xlarge

3f2121d

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: download data with wget --continue --progress=dot:giga

b9ba0e5

Match the main ClickBench download style (resumable, giga-scale progress). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: drop prepare-ebs-snapshot.sh

e5b4637

A snapshot-backed volume lazy-loads from S3, so it isn't faster than the plain S3 download for one-shot VMs; the snapshot approach is not used. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexey-milovidov and others added 30 commits July 2, 2026 05:16

versions: count only active parts in the data-size query

b79cd6a

Add WHERE active to the system.parts sum so outdated/inactive parts still on disk (not yet cleaned up) are not counted in a dataset's reported data size. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: number hits queries from 0, other datasets from 1

d572e2c

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: reconstruct 2013-10 from source (builds clean with current …

1096658

…shims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: reconstruct 2013-07 from source (builds clean with current …

d164eb5

…shims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

versions: reconstruct 2013-02 from source (builds clean with current …

f8820f5

…shims) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Versions benchmark rework#968

Versions benchmark rework#968
alexey-milovidov wants to merge 83 commits into
mainfrom
versions-benchmark-rework

alexey-milovidov commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alexey-milovidov commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant