[#44] Normalize refs table to deduplicate property strings#80
Conversation
Major version bump because the analyze schema will get some changes.
The refs table repeated the property_path/property_type strings on every row, making it by far the largest part of the analyze database. Store the strings once in new property_names/property_types lookup tables and keep only integer ids in refs. A refs_view rejoins them to expose the original strings. On a large customer project this reduces the database from 515 MB to 351 MB (-32%) with negligible impact on analyze time. Also: - Add PRAGMA user_version = 1; find-refs now rejects older databases with a clear "unsupported schema version" message instead of an obscure SQL error. - Update the material/monoscript views and find-refs queries to read the strings via refs_view (or resolve ids once). - Fix view_material_texture_refs, which used double-quoted string literals that SQLite parsed as identifiers (pre-existing, failed on old databases too).
There was a problem hiding this comment.
Pull request overview
Normalizes the analyzer SQLite schema to significantly reduce database size by deduplicating refs.property_path and refs.property_type strings into lookup tables, while preserving query ergonomics via a refs_view. Updates find-refs, tests, and docs to use the new view and introduces schema versioning to gate incompatible databases.
Changes:
- Normalized
refsby introducingproperty_names/property_typeslookup tables and arefs_viewto preserve the pre-change shape for consumers. - Updated analyze writer to populate lookup tables once per unique string and store integer ids in
refs. - Updated
find-refs, SQL views, tests, and documentation; addedPRAGMA user_version = 1and rejection of older schemas infind-refs.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| UnityDataTool/UnityDataTool.csproj | Bumps tool version to 2.0.0 to reflect the schema/API change. |
| UnityDataTool.Tests/FindRefsTests.cs | Adds coverage for refs_view, lookup-table consistency, and old-schema rejection behavior. |
| UnityDataTool.Tests/BuildReportTests.cs | Updates BuildReport reference assertions to query refs_view for string columns. |
| ReferenceFinder/ReferenceFinderTool.cs | Switches find-refs to read from refs_view, resolves m_Script path id once, and rejects unsupported schema versions. |
| Documentation/contentlayout.md | Fixes Unity manual link target for build report history page. |
| Documentation/analyzer.md | Documents the normalized refs schema, lookup tables, and preferred querying via refs_view. |
| Documentation/analyze-examples.md | Updates example query to use refs_view instead of refs. |
| Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs | Adds id providers + one-time inserts to populate property_names / property_types and writes ids into refs. |
| Analyzer/SQLite/Commands/SerializedFile/AddReference.cs | Updates parameter types for refs.property_path / property_type to integers. |
| Analyzer/SQLite/Commands/SerializedFile/AddPropertyType.cs | Adds SQLite command for inserting rows into property_types. |
| Analyzer/SQLite/Commands/SerializedFile/AddPropertyName.cs | Adds SQLite command for inserting rows into property_names. |
| Analyzer/Resources/MonoScript.sql | Updates script_object_view to join via refs_view. |
| Analyzer/Resources/Init.sql | Adds lookup tables, refs_view, updates dependent views to use refs_view, fixes SQLite string literal quoting, and sets user_version = 1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| m_GetRefsCommand = db.CreateCommand(); | ||
| m_GetRefsCommand.CommandText = @"SELECT object, property_path, EXISTS (SELECT * FROM assets a WHERE a.object = r.object) FROM refs r WHERE referenced_object = @id"; | ||
| m_GetRefsCommand.CommandText = @"SELECT object, property_path, EXISTS (SELECT * FROM assets a WHERE a.object = r.object) FROM refs_view r WHERE referenced_object = @id"; | ||
| m_GetRefsCommand.Parameters.Add("@id", SqliteType.Integer); |
There was a problem hiding this comment.
Addressed at the source in 2cc7e92 rather than with IFNULL: refs_view now uses INNER JOIN, so property_path can never be NULL (every refs row's ids are always populated and their lookup rows written in the same transaction). That makes GetString(1) here unconditionally safe. I avoided IFNULL deliberately — it would silently return '' for a genuinely corrupt/orphaned row instead of failing loudly, which would be worse for a read-only analysis DB.
The refs property_path/property_type ids are always populated and their lookup rows written in the same transaction, so the joins always match. INNER JOIN expresses that invariant and removes the theoretical NULL that LEFT JOIN implied, making find-refs' property_path read unconditionally safe.
Summary
Fixes #44.
On large builds the
refstable dominates the analyze database — it repeated theproperty_pathandproperty_typestrings on every one of its (often millions of) rows. This normalizes those strings into two lookup tables sorefsstores only integer ids, and exposes arefs_viewthat rejoins them so existing string-based queries barely change.On a large customer project (5.8M refs rows) this reduces the database from 515 MB to 351 MB (-32%) with negligible impact on analyze time.
This changes the
refstable schema, so any tooling or SQL that readsrefs.property_path/refs.property_typedirectly needs a small update. Those two columns are no longer text — they now hold integer ids into the newproperty_names/property_typestables.Recommended fix: query
refs_viewinstead ofrefs. The view has the same column names and the same text values the oldrefstable had (object,referenced_object,property_path,property_type), so in most cases it's a one-word change:If you need to join
refsto other tables yourself (e.g. for performance), resolve the strings through the lookup tables:Queries that only use
object/referenced_object(e.g. dependency traversal) are unaffected.Databases now carry a schema version:
PRAGMA user_versionreturns1for this format and0for databases produced by older builds. Tools can check it to fail fast (asfind-refsnow does) rather than hitting confusing errors. There is no in-place migration — regenerate the database by re-runninganalyze.Changes
Schema (
Analyzer/Resources/)property_namesandproperty_typeslookup tables (id, name).refs.property_path/property_typechange fromTEXTtoINTEGERids into those tables.refs_viewreproduces the pre-change shape (object,referenced_object,property_path,property_typeas text) so consumers read the strings without joining by hand.PRAGMA user_version = 1— the first formal schema version (older databases report 0).view_material_shader_refs,view_material_texture_refsandscript_object_viewnow read viarefs_view.Populating during analyze
SerializedFileSQLiteWriterassigns string ids with twoIdProviders and writes each lookup row once (newAddPropertyName/AddPropertyTypecommands, following theAddTypepattern).find-refs
refs_view; resolves them_Scriptpath to its id once per run rather than per object.Bonus fix
view_material_texture_refsused double-quoted string literals ("Texture","Material") that SQLite parses as identifiers, so the view errored. This was pre-existing (it failed on old databases too); fixed to single-quoted literals while rewriting the view.Testing
dotnet test— full suite green (UnityFileSystem 425, Analyzer 62, UnityDataTool 234).BuildReportTests→refs_view).FindRefsTestscoverage:refs_viewround-trip, lookup-table consistency, and the unsupported-schema-version rejection.find-refsrenders property paths correctly (e.g.m_SavedProperties.m_TexEnvs[2].second.m_Texture), and runningfind-refsagainst a pre-change database exits 1 with the friendly message.