Coframe: A Grammar-Layer Substrate for AI-Native Analytics¶

A practitioner's introduction to a new approach to analytical data processing.

Companion to the Coframe Core Manual. Version 1.0.

Changes from v0.5.7 (v1.0 publication pass): Closing contact-information paragraph updated to reference concrete contact channels (the project website at coframe.tech and an email address). Older changelog entries pruned and preserved in git history.

1. Why this exists¶

If you're building analytical infrastructure for an organization in 2026, you're navigating a specific kind of pressure. The data team has built a warehouse, a transformation pipeline, a semantic layer, dashboards. Now leadership wants AI agents — Claude, GPT, internal tools — answering analytical questions in natural language. The agents need to be right.

The semantic layer was supposed to solve this. Define your metrics once, give the AI a governed surface, and the agent returns trustworthy answers. In practice, the picture is more complicated. Metrics defined for human dashboards don't always work cleanly for agents. Agents ask questions human analysts wouldn't, at grains the metrics weren't designed for. Cross-grain navigation — the natural agent question "show me peak weekly revenue per region last quarter" — runs into joins, fan traps, and pre-aggregation inconsistencies. The agent produces SQL that runs but is silently wrong, or it produces SQL that fails, or you give up and route the question to a human.

The hard cases are the ones where SQL itself is structurally weak. SQL bundles three concerns into a single language: what data to read (FROM, JOIN), what to compute (SELECT, aggregations), and how to compute it correctly (grouping keys, join cardinalities, NULL handling). An agent — or a human analyst not steeped in your warehouse's specific structure — has to get all three right simultaneously. Small mistakes produce silently wrong results.

Coframe is a substrate that handles the structural layer differently. The framework separates the grammar of analytical data processing — the structural reasoning about anchors, derivations, family relationships among metrics — from the semantics — what the metrics mean to the business, what they're called, what conventions a team uses. Once the grammar layer is governed structurally, agents (and humans) can express analytical intent without making structural decisions, and the framework verifies the result is correctly produced.

This article is a practitioner's introduction. It assumes you're familiar with relational databases, SQL, and the general shape of analytics tooling — dimensional modeling, semantic layers, BI tools. Where existing tools come up, they're context, not comparison.

The article walks through:

What "grammar layer" means and why it's different from a semantic layer (§2).
How Coframe represents a metric's derivation history through DNA (§3).
The family vocabulary that emerges from this representation: family-names, siblings, cousins (§4).
How queries resolve correctly against the framework's structural reasoning (§5).
How AI agents fit naturally into this architecture (§6).
Skeptical questions you're likely to have (§7).
Coframe Core and Coframe Pro distinctions, and what's available now (§8).
Path to action (§9).

The article isn't a specification. The reference manual covers that.

2. The grammar layer¶

Coframe's central thesis is that analytical data processing has two layers, usually conflated, that should be governed separately:

A grammar layer: structural reasoning about how data is organized. What entities does a column observe? How is it anchored? How does it relate to other columns through aggregation, derivation, and dimension hierarchy?
A semantic layer: meaning. What does this column represent in the business? What's the definition of "revenue" — gross, net, recognized, billed? Which date is "the order date" — placement, ship, fulfillment? What conventions does this team use?

The grammar layer governs structural correctness. The semantic layer governs meaning. Existing tools — dbt's MetricFlow, Cube, LookML, AtScale — bundle them: a metric definition encodes both "this is what we mean by revenue" and "this is the structural plumbing that produces it." The bundling is workable but has costs you've probably encountered: defining the same metric across multiple grains, multiple physical models, multiple semantic contexts. Each definition repeats structural logic. Adding a new metric requires re-stating the structural plumbing every time.

Coframe separates the layers. The grammar layer is governed by structural metadata declared once per Analytics Collection (AC). The semantic layer is your domain — what an AC's metrics mean to the business is fully your choice; the framework holds no opinions there.

A note on Coframe's scope. Coframe's commitment is to tabular-output queries over input-flexible backends. Frame-QL's results are rectangular (frames are row-sets), but the source data need not be rectangular. Any backend that can host data series with name+entity declarations and respond to operators — relational engines, columnar stores, key-value stores, document stores — can host a Coframe AC. The reference implementations in v1.0 are Polars and DuckDB; the data-API protocol admits other backend types. Coframe's longevity is tied to the durability of tabular-output analytical consumption — how humans and most current AI agents consume analytical results — not to any specific input data shape.

A specific structural property worth naming, since it shapes what queries are askable. Coframe's FD-DAG and family genealogy admit function-derived edges and metrics as first-class participants alongside data-attested ones. A query like SUM(revenue) BY MONTH_OF(day) resolves through the framework's structural reasoning even when no month column is materialized in any schema, because the FD-edge day → month is established by the operator catalog. Similarly, dimension transformations like BUCKET(price, 10) or SUBSTR(product_code, 0, 2) produce groupings that participate in the framework's anchor-reach reasoning. Cross-grain navigation extends to function-derived groupings — the reasoning surface scales with what the data admits given the operator catalog, not just with what's been pre-materialized as columns. (This is why Coframe is a grammar layer, not a query engine: grammar specifies what's well-formed; expressivity follows.)

What this separation looks like¶

A small example. Your warehouse has a transactions table at transaction grain, a weekly_summary table at (store, week) grain, a monthly_summary at (store, month) grain. All three contain a column representing revenue; the names happen to be amount, total_revenue, and monthly_revenue respectively.

In a semantic layer, you'd typically write three metric definitions, one per fact table. Each says "revenue here means this aggregation of these columns under these conditions." Each is independently maintained.

In Coframe, you declare:

The revenue family exists in this AC.
It's an additive metric: applying SUM at coarser anchors preserves the metric's identity.
It appears in the transactions schema at [transaction], in weekly_summary at [store, week], in monthly_summary at [store, month].
The latter two are derived from the first via SUM at their respective grains.

The framework now knows how to navigate. An agent asking "total revenue per region last quarter" gets resolved automatically: the framework picks whichever schema's grain is closest to (region, quarter), aggregates upward via the FD-DAG, and returns the answer. It can pick transactions (full granularity, more rows, slower) or weekly_summary (pre-aggregated, faster) — and it produces the same answer either way. That's a structural guarantee called Multi-Table Invariance, and it follows from the framework's grammar-layer commitments.

What you didn't have to do: write three metric definitions. Specify which physical table to use. Worry about which join produces the right cardinality. The grammar layer handles all of that.

What you also didn't have to do: declare what revenue means. The framework doesn't impose a definition. Your AC's revenue family is whatever you defined it to be — gross, net, with returns, without — and the framework reasons about its structural relationships without engaging the semantic content.

Names are entirely your choice¶

A consequence worth pulling out: the framework treats column names as opaque labels. The framework has two operations on names — equality comparison (do two columns share a family?) and naming-function output verification (when an AC declares a naming function, does this column's name match what the function produces?). The framework does not parse names, decompose them, recognize prefixes or suffixes, or extract any structure from them.

This means: an AC author can name columns in English, in domain-specific terminology, in internal codenames, in another natural language entirely, in abstract identifiers — and the framework reasons identically. The framework imposes no naming aesthetic. ACs migrating from existing systems can adopt the warehouse's existing column names directly; no rename is forced.

Naming practice — the engineer's choice of names plus the optional declaration of a naming function — is the AC author's foundational commitment. The framework verifies internal consistency; it does not constrain what consistent looks like.

And what to expose is also your choice¶

Naming is one half of the AC author's curatorial authority. The other half is selection. The AC author chooses which backend columns to include in the AC. Backend columns not selected are outside the AC scope — invisible to queries, outside the framework's reasoning.

Real warehouse tables have hundreds of columns: business columns alongside ETL bookkeeping, audit timestamps, denormalized convenience fields, status flags, internal codes. An AC over such a table typically includes a handful — exactly the columns the AC's analytical purpose requires. The rest stays outside scope.

This selectivity is what makes Coframe authoring tractable on real warehouses. It also gives the AC its structural focus: the AC's scope is exactly what the author committed to, not a broad reflection of the backend's full inventory.

The same backend data may support multiple ACs with different scopes. A finance AC over the transactions table includes revenue, cost, customer, region. A marketing AC over the same table includes customer_segment, campaign_id, attribution. Each AC's scope serves its analytical purpose; neither needs to expose everything.

The AC author's three foundational choices — selection (what to include), naming (what to call it), and structural commitments (how it behaves) — together constitute the AC scope. The framework's reasoning operates within the scope; columns outside scope are outside the framework's authority.

3. DNA: representing how a metric came to be¶

The grammar layer's representation of a metric's derivation is structurally simple. Each column in the AC carries a DNA field: a snapshot of the column's predecessor — the metric it was derived from.

DNA records three things about the predecessor:

The predecessor's family-name.
The predecessor's anchor (the entities the predecessor observes).
The operator that produced the predecessor itself.

For a column that's observationally rooted (no derivation in the AC), DNA is self-referential — the column points to itself. For a column derived from another, DNA points back one step. Walking DNA backward through chains of predecessors eventually reaches a root.

This gives every column a recoverable derivation history: from the column, walk DNA to the root, and you trace the operations that produced it.

How predecessor and successor relate¶

Each step in DNA links a predecessor metric to a successor metric through an operator. The framework's structural commitment is about how the two are related — specifically, how the successor's name and anchor connect to the predecessor's. Three rules govern this:

1. Anchor-independence of names. The successor's anchor doesn't enter the successor's name. Two successors with the same predecessor-and-operator but different anchors share a name. This is what makes "revenue at transaction grain" and "revenue at (region, year) grain" both belong to the same conceptual revenue.

2. Identity-preservation by the ip_reducer. When the operator is the predecessor's identity-preserving reducer (its ip_reducer), the successor's name equals the predecessor's name. Different operators produce different names. SUM applied to revenue (where SUM is revenue's ip_reducer) keeps the result as revenue. MAX applied to revenue produces something different — typically called peak_revenue or whatever you choose.

3. Operator-determined naming for non-identity-preserving operations. Different operators produce different names per the AC's declared naming function. The framework doesn't dictate the function — you adopt the operator catalog's defaults, override per operator, declare custom, or skip naming-function verification entirely. The framework verifies whatever you declare.

These rules together make naming a structural commitment that the framework can check, without dictating what the names look like.

Roots and partition-invariance¶

A root column carries an operator field whose value is the family's ip_reducer. For families whose ip_reducer is partition-invariant — operators like SUM, MAX, MIN, COUNT, BOOL_AND, BOOL_OR — cross-anchor navigation works. The framework can apply the ip_reducer at coarser anchors and produce siblings of the root.

For families whose root operator is not partition-invariant — AVG, MEDIAN, COUNT_DISTINCT, STDEV, etc. — the family has no ip_reducer. Such families are anchor-locked: their columns exist at specific anchors but cannot be derived to other anchors via name-preserving aggregation. A column like mean_revenue derived from revenue via AVG cannot be navigated to a coarser anchor and still be mean_revenue (the AVG of AVGs isn't an AVG without weighting); the framework refuses queries that would require this.

This isn't a limitation of Coframe specifically — it's the algebra of partitioning. SUM distributes; AVG doesn't. Coframe makes this distinction visible at the framework level. Operators' partition-invariance is declared in the operator catalog; AC authors don't think about it directly, but the framework's behavior follows it.

4. Family vocabulary¶

The DNA representation, plus the naming rules from §3, give the AC's metric columns a structural organization the framework can reason over.

Families¶

A family is the set of columns sharing a name. The framework partitions the AC's metric columns by family-name; every metric column belongs to exactly one family. Two columns sharing a name are claimed by the engineer to be in the same family; the framework verifies.

Family-roots¶

Within a family, each column has a family-root: the earliest ancestor in DNA chain that shares the column's name. To find the family-root, walk DNA backward as long as the predecessor's name matches; the family-root is the last column reached.

A column whose own DNA is self-referential is its own family-root — a primitive observation in the AC.

Siblings and cousins¶

Two columns in the same family relate to each other in one of two ways:

Siblings: same family-name, same family-root. They represent the same conceptual metric at different anchors. The framework can navigate between them via the family's ip_reducer; queries against either produce equivalent results when the family has an ip_reducer.
Cousins: same family-name, different family-roots. They share a name but are observationally independent metrics. The shared name is a claim that they're conceptually related; the different family-roots indicate they're not structurally derivable from a common observation.

Siblings are what makes cross-anchor navigation work. Cousins are what makes ambiguity surface. When a query references a family-name and the AC contains cousins, the framework refuses the query as dubious and asks the engineer to disambiguate.

Why this matters¶

The sibling/cousin distinction is the structural truth that previous semantic layers gestured at without naming. Two revenue columns at different grains in different schemas — are they the same metric? It depends on whether they're siblings (yes, navigable) or cousins (no, independent). The framework computes this directly from DNA.

This produces sharper guarantees than semantic layers can offer. A semantic layer that says "use this metric definition for revenue" silently treats all columns named revenue equivalently. Coframe's framework either confirms the columns are siblings (sharing a structural lineage to a common root) or surfaces them as cousins requiring disambiguation. Errors that would silently produce wrong results in a semantic layer become explicit refusals in Coframe.

5. How queries resolve¶

The framework's query language is Frame-QL. Queries reference columns by their family-names — not by physical column names in backend tables. The framework handles the structural mapping.

A query like:

SELECT region, year, SUM(revenue) AS total_revenue
WHERE region = 'west'
BY (region, year)
ORDER BY year

doesn't say which schema to use, which join produces (region, year), or what aggregation maps revenue from its source anchor to (region, year). It says: "I want total revenue for the west region by year." The framework figures out the rest.

The four-rule filter¶

For each column term in a query, the framework's resolver runs a four-rule filter to identify schemas that can serve the term:

Family membership: the schema contains a column with the queried family-name.
Anchor reach: the schema's anchor reaches the query's target anchor via the FD-DAG.
Coverage: the schema's value-sets cover the query's required values.
Family-root agreement: among schemas passing the first three rules, do they share the same family-root? If yes, they're siblings — the framework picks any (per cost-based heuristics) and proceeds. If no, they're cousins — the query is refused as dubious.

The four-rule filter is the structural mechanism that resolves the question "which schema serves this query?" The answer is grounded in the AC's metric genealogy and verified integrity conditions.

Multi-Table Invariance¶

When multiple schemas survive the four-rule filter as siblings, the framework picks any per cost-based heuristics. The Multi-Table Invariance theorem guarantees that all siblings produce equivalent results: the framework's choice is operationally free, correctness is preserved.

MTI is what makes "the framework picks the right schema automatically" structurally trustworthy. In Coframe Core's default configuration, MTI is an unconditional guarantee within scope: it rests on the integrity conditions DQ verifies (FD-DAG attestation, cross-schema value-mapping consistency, coverage map honoring) plus per-DNA-edge value attestation, which verifies cross-schema metric coherence per attestable edge during DQ Phase 3. Engineers can opt out of attestation per AC; in opt-out mode, MTI becomes conditional on the engineer's commitment to ETL-side coherence, and query results carry an explicit coherence-asserted-not-verified annotation propagating the rigor posture to consumers.

The AC's rigor posture is summarized as a verification level — A, AA, or AAA — surfaced on the verification status and propagated to query results. AAA means MTI is unconditional within scope (every metric coherence commitment verified, regardless of how — by data-attestation, by construction through operator semantics, or by transparent toleration with rationale); AA means dimensional structure is verified but cross-schema metric coherence is not yet fully grounded; A means structural well-formedness only, with no data examined and no function evaluation required. AI agents and BI tools branch on the level when assessing result trust. The levels are described in §7 below.

Dubious queries¶

When the four-rule filter produces survivors with different family-roots — cousins — the framework refuses the query and surfaces a diagnostic naming the cousins. The engineer disambiguates via:

A qualified reference (transactions.revenue instead of bare revenue).
An explicit FROM clause restricting to specific schemas.
A BY-clause anchor specifying the resolution path.

This is how the framework avoids silently producing one answer when multiple are possible. Most analytical errors in production aren't crashes; they're queries that produce a number that looks plausible but uses the wrong column variant. Coframe refuses such queries.

What you don't write¶

Notice what's absent from Frame-QL queries:

No JOIN clauses. Cross-schema reach is automatic.
No GROUP BY. The BY clause specifies the output grain; aggregation is automatic per the grain.
No physical column names. The AC's family-names are what queries reference.
No subqueries (except WITH-chained intermediate frames, which are session-local).

What Frame-QL queries express is analytical intent at a specific grain. The framework handles the rest.

6. AI agents in this architecture¶

The framework's structural commitments make it a natural substrate for AI agents.

Why text-to-SQL is fragile¶

Modern LLMs are dramatically better at text-to-SQL than the 2023 generation; accuracy can approach 100% for queries that fall cleanly within a well-modeled semantic layer. But the failure modes are still there: queries the LLM gets syntactically right but semantically wrong, joins it constructs based on guesses about cardinality, aggregations that look reasonable but produce silently incorrect results.

The reasons are structural. SQL bundles three concerns: what to read (FROM, JOIN), what to compute (SELECT, aggregations), and how to compute correctly (GROUP BY, NULL handling, cardinality reasoning). LLMs are good at expressing analytical intent in natural language and good at translating intent to a result specification. They're weakest at the precise structural reasoning that SQL demands — the kind of reasoning where small mistakes silently produce wrong results.

What Frame-QL gives an agent¶

Frame-QL lets an agent express analytical intent — what columns to compute, at what grain, with what filtering and ordering — without making structural decisions about joins, cardinality, or NULL handling. The framework's resolver makes those decisions per the AC's structural commitments. The result: queries the agent constructs are either resolvable (and produce correct answers) or dubious (and refused with a structured diagnostic).

When an agent constructs SELECT region, SUM(revenue) AS total_revenue, COUNT_DISTINCT(customer) AS customer_count WHERE year = 2026 BY region, it's making decisions in its strong domain — what metrics, at what grain, with what filtering. It's not making decisions in its weak domain — which tables to join, what grouping keys produce the right cardinality. The query is grounded in the AC's vocabulary; the framework's machinery ensures correct execution.

Family vocabulary as agent surface¶

The AC's family-vocabulary structure helps LLMs reason in a way a flat metric list doesn't. A family is a concept the agent can hold; a sibling-set within a family is a concept the agent can navigate; a cousin disambiguation is a concept the agent can ask about. When an agent asks "what's our revenue this quarter?", it looks up the revenue family, sees what anchors are observable, identifies the right sibling for "this quarter," and constructs the query. With a flat metric list, the agent would have to reconstruct these relationships from documentation; Coframe gives them to the agent directly.

The AC scope is also what the agent sees. If the AC includes 30 columns from a backend table of 300, the agent navigates the 30. If the AC names them in business vocabulary, the agent uses business vocabulary. If the AC excludes PII or sensitive operational data, the agent has no path to those columns. ACs serve as deliberate exposure boundaries — different teams' ACs over the same backend can have different scopes, and the framework structurally enforces what the AC author committed to expose. This is meaningfully different from text-to-SQL approaches that hand the LLM the full warehouse and ask it to infer what's relevant.

The MCP server¶

Coframe ships with an MCP server exposing ACs to LLM clients (Claude, GPT, custom agents). The server supports two modes: direct mode where the LLM constructs Frame-QL using exposed AC metadata, and dialogue mode where the LLM submits a natural-language utterance and the server translates to Frame-QL before executing. Both are supported; deployments choose per their architecture. (For the full capability surface and protocol details, see the Manual's MCP chapter.)

What this looks like operationally¶

Imagine your team's AI assistant. A user asks: "what was peak weekly revenue in the West region last quarter, and which stores had any failed transactions?"

Without Coframe, the assistant has to figure out which tables hold what, construct a multi-table SQL query, get the joins right, get the aggregation right. It might succeed; it might produce a number that looks right but is wrong; it might fail. You don't know in advance.

With Coframe, the assistant queries the MCP server's family list. It identifies revenue (additive), transaction_failed (boolean OR-aggregation). It constructs a Frame-QL query with the right grains and filters. The framework's resolver picks the right schemas, navigates the FD-DAG, and produces results. If the AC has cousins or ambiguity, the framework refuses; the assistant asks the user for clarification or adjusts. The result is either correct or explicitly disambiguated; there's no third "looks right but isn't" outcome.

This is the architectural alignment: the structural commitments that make Coframe rigorous for human-authored ACs are exactly the structural commitments that make agent-mediated analytics trustworthy.

7. Skeptical questions¶

You've seen enough analytics tooling to be skeptical of new entrants. This section addresses the questions you're probably asking.

"Isn't this just another semantic layer?"¶

There's overlap with semantic layers; the architecture is different. A semantic layer encodes named metrics with operational logic, attaching each metric to a physical model. Coframe's grammar layer encodes structural metadata (family-names, DNA, FD-DAG, ColumnSpecs) without bundling business logic into metric definitions.

The practical difference: semantic-layer metrics are defined per metric per logical model. Coframe's structural declarations are defined per column once, with cross-grain navigation, cross-schema substitutability, and integrity checking falling out of the structure rather than requiring per-metric configuration.

Where they're similar: both let business users (and now agents) query without knowing physical schemas. Where they differ: semantic layers expose curated metric menus through BI tools; Coframe exposes a query language directly to analysts and agents, with the family vocabulary as the unit of analytical thought.

A specific case: cousins. A semantic layer defining "revenue" doesn't know whether two warehouses' revenue columns are conceptually equivalent. Coframe distinguishes siblings (structurally equivalent) from cousins (independently observed metrics that happen to share a name) and refuses cousin queries as dubious. This catches a class of analytical errors semantic layers don't.

"What does authoring an AC cost?"¶

For a small AC focused on one analytical purpose: a few hours of focused work, with AI-assisted tooling (the MCP server) substantially helping with proposal drafts.

For a comprehensive enterprise AC: weeks rather than days. The cost depends heavily on the warehouse's existing structure: clean, well-documented data with clear FD-DAGs, bounded missingness, and consistent dimension hierarchies authors quickly. Messy data with inconsistent FDs, opaque missingness, and time-varying attribute values (what data warehousing calls "slowly changing dimensions") requires more deliberate work.

The DQ process surfaces what's there. Engineers respond to violations and advisories iteratively. The cost is partly authoring (declaring structural commitments) and partly remediation (fixing data or declaring scope where the data doesn't match commitments).

The ratio matters: AC authoring cost vs. ongoing query cost. Once the AC is in place, queries cost zero engineer-time (analysts and agents query directly). For an organization with 10+ analytical questions per week, the payback on AC authoring is fast.

"Why column-level rather than table-level?"¶

Tables are organizational artifacts. They reflect how data was loaded, partitioned, denormalized for performance. They change as systems evolve. Tying analytical correctness to table layout means correctness changes when table layout changes.

Columns are conceptual artifacts. The family revenue is the same conceptual thing whether it appears in transactions, weekly_summary, or a new pre-aggregation tomorrow. Column-level governance means analytical correctness is decoupled from physical layout. Tables can come and go; the AC's family declarations stay stable. New ColumnSpecs in new schemas, declared with appropriate DNA, integrate into existing families seamlessly.

"What about ratios and percentages?"¶

Ratios are common; they have a structural subtlety. SUM(revenue) / SUM(cost) at any grain is a different question from MEAN(revenue / cost) over rows. Conflating them is a frequent error.

Coframe supports ratios in two forms:

Ad-hoc: written in a query directly: SUM(revenue) / COUNT_DISTINCT(customer) AS arpu.
Singletons: registered ratio columns in the AC, with names the AC author chooses, computed via MAP_DIV(c1, c2) or similar multi-input operators.

Singletons are first-class columns at their declared anchor; they can be referenced in queries by name. They don't participate in family-genealogy reasoning beyond their own definition (they're leaves in the metric genealogy).

For staged computations (revenue as a percentage of regional total), Coframe's WITH-block construct handles them naturally. The pattern requires explicit aggregation stages but expresses cleanly.

"How does this play with my existing dbt / Looker / Tableau / Snowflake setup?"¶

Coframe sits in a different layer than these tools.

dbt + Coframe: dbt produces the tables Coframe reads. The AC's schemas point to dbt-managed tables. Standard pattern.
Coframe alongside Looker: queries that fit Looker's strengths (curated dashboards) stay in Looker; queries that fit Coframe's strengths (direct ad-hoc analytical querying by analysts and agents) go through Coframe. They serve different audiences.
Tableau on top of Coframe: Tableau visualizes Coframe query results via a connector.
Snowflake / BigQuery: Coframe Core's DuckDB and Polars backends handle local data; Coframe Pro supports arbitrary backends including Snowflake and BigQuery via the data-API protocol.

Coframe doesn't displace existing infrastructure. It adds a query surface that didn't exist before — direct querying by analysts and agents, against a structured AC that handles physical complexity.

"What about correctness errors that semantic layers also address?"¶

Many analytical errors — fan traps, chasm traps, double-counting under denormalization, aggregating at the wrong grain — show up in any analytics workflow. Coframe addresses them at the framework level: declare the structural facts (FD-DAG edges, column anchors, combination laws), and the framework derives correctness. Errors are caught at AC-load time (integrity violations) or at query parse time (dubious queries, type mismatches), not at query result time.

This is structurally stronger than per-metric configuration in semantic layers. The metric-level approach is correct when configured correctly but wrong when configured incorrectly. Coframe's column-level structural governance makes correctness conditions a property of the AC, not of each metric definition.

Where Coframe is comparable to semantic layers: both rely on the AC author / metric-definer to declare structural facts correctly. Garbage in, garbage out. The difference is in how the declarations are organized and how systematic the framework's reasoning is over them.

"What about data quality?"¶

Coframe's correctness reasoning is grammatical: it ensures queries are structurally well-formed and produce results consistent with declared structural facts. It doesn't ensure the data values themselves are correct. That's data quality, a separate concern.

Coframe's DQ process flags some data-quality-like issues via quasi-metadata: declared FD-DAG edges that don't hold against the data, cross-schema integrity violations, coverage gaps. These surface at AC-load time as integrity errors. Whether to fix the data or adjust declarations is the engineer's call.

One specific DQ-adjacent concern is part of Coframe's verification by default: per-DNA-edge value attestation. When a metric exists in two schemas — say revenue at transaction grain and revenue at (store, month) grain in a pre-aggregated summary — Coframe's DQ Phase 3 verifies that aggregating the finer-grained sibling produces values matching the coarser sibling, within tolerance, on shared keys. This catches the most common silent-correctness failure in real warehouses: pre-aggregation drift from late-arriving data, partial ETL failures, or manual corrections that didn't propagate. The verification is enabled by default in Coframe Core; engineers can opt out per AC, in which case query results carry a coherence-asserted-not-verified annotation propagating the rigor posture to consumers.

Coframe characterizes each AC's verification status as one of three ordinal levels — A, AA, or AAA:

Level A — Structural well-formedness. The AC's metadata is internally consistent. Declarations are mutually coherent. No data has been examined and no function evaluation required. Free to author; useful as a baseline that any well-formed AC achieves automatically.
Level AA — Verified structural integrity. Level A plus every dimensional structural commitment is verified — declared functional dependencies hold (whether attested against data in referential tables or established by deterministic operator catalog functions like MONTH_OF or BUCKET), schema scopes match observed data, cross-schema dimension and attribute mappings agree. Most existing semantic-layer products effectively claim AA when they verify FK relationships and value mappings, though typically without articulating that some structural commitments can be verified by construction (operator semantics) rather than by data inspection.
Level AAA — Verified cross-schema metric coherence. Level AA plus every metric coherence commitment is verified. For data-stored metric siblings (e.g., revenue at transaction grain and revenue in a pre-aggregated weekly summary), per-DNA-edge value attestation verifies they agree on shared keys within tolerance. For function-derived metrics computed at query time via Frame-QL expressions (e.g., profit = SUM(revenue) - SUM(cost)), coherence is verified by construction through operator catalog semantics — no per-edge attestation is needed because nothing data-side could disagree with a deductive consequence of the operator's definition. Pre-aggregation drift is verified absent on data-attested edges. The Multi-Table Invariance theorem is unconditional within scope at AAA. This is the rigor level Coframe Core's defaults are designed to make achievable.

The level is reported in the verification status, propagated to query results via MCP's coherence_posture field, and visible to AI-agent and BI-tool consumers branching on result trust. The levels are informational in v1.0 (documented and reported, with v1.x stabilizing the taxonomy after field experience). The intent is not to gate adoption — any AC that loads cleanly is at least Level A, most working ACs reach AA easily, and AAA is achievable for ACs whose structural commitments are either verifiable by construction (function-derived metrics) or whose engineers commit to ETL-side coherence (data-stored siblings) — but to make the rigor posture legible. A consumer asking "how much can I trust this AC's cross-schema results?" gets an ordinal answer that reflects what's verified, regardless of which mechanism verified it.

Beyond what attestation covers, data quality remains a separate concern handled by data quality tooling (Great Expectations, dbt tests, custom monitoring). Coframe doesn't replace these; it integrates with them via the data pipeline that loads source schemas.

"What about slowly-changing dimensions?"¶

Coframe's structural commitments — the FD-DAG, the column trichotomy, the family genealogy — are timeless: they assume the structural facts hold uniformly. What's traditionally called "slowly-changing dimensions" in data-warehousing literature is, in Coframe's vocabulary, more precisely a Slowly Changing Attribute (SCA): the entity (the customer, the store, the product) is identity-stable, but an attribute of it (the customer's segment, the store's region, the product's category) varies over time. The FD store → region may not hold across all of history because what's actually changing is the region attribute attached to a stable store entity.

Coframe Core handles attribute time-variance through ETL-side flattening: present the AC with surrogate-keyed history (e.g., store_at_date as the grain dimension, with store_at_date → region as a clean FD). Alternatively, model time-variance as event data — region-change events anchored at event-time — which Coframe Core supports natively. Both approaches require some upstream pipeline work but keep the AC's structural commitments clean within Core's scope.

Coframe Pro provides first-class SCA support: attributes can be declared with multi-entity anchoring E(a, S) = {d, t} where one entity is a slow-granularity time dimension. This treats time-varying attributes as a structural concern rather than a modeling workaround — the framework's existing FD-DAG, four-rule filter, and MTI machinery handle the multi-key joins natively. SCA is on Coframe Pro's roadmap; it's not in Coframe Core's scope.

"What's the lock-in risk?"¶

The AC declarations are YAML/JSON. Frame-QL is open. The data-API protocol is specified. Coframe Core is open-source.

You can read the AC catalog, modify it, version it in source control, audit it. Frame-QL queries are declarative and portable in concept. Backends are pluggable via the data-API protocol.

Coframe Pro is commercial, but it's a separate codebase that reads Coframe Core's AC format. If you start with Coframe Core and don't need Coframe Pro's advanced features, you stay on Coframe Core indefinitely. If you do need them, the upgrade path is well-defined.

The lock-in risks practitioners worry about — proprietary metadata formats, vendor-specific query languages, undocumented internals — are addressed: open formats, open language specification, open Coframe Core implementation, fully documented upgrade path.

8. Coframe Core and Coframe Pro¶

Coframe ships in two editions: Coframe Core (open-source, in active development) and Coframe Pro (commercial, in parallel development). They share foundational principles — the grammar layer thesis, the (E, M) paired declaration, the column trichotomy, the FD-DAG, the four-rule filter, the structural-rigor posture, the family vocabulary, MTI. Coframe Core is a strict subset of Coframe Pro's surface area; Coframe Core ACs are valid Coframe Pro ACs.

What Coframe Core provides¶

Coframe Core is the open-source edition focused on querying analytical content with lower friction to author and adopt. It preserves the structural-rigor posture in its full thesis form — binary correctness, integrity-condition verification, dubious-query rejection, principled missing-value handling — while omitting capabilities that practitioners with bounded analytical needs don't require.

Coframe Core provides:

The full grammar layer with foundational principles, ColumnSpec, FD-DAG, integrity conditions, MTI, dubious-query mechanism.
A defined operator catalog with deterministic missing-value treatment and partition_invariant flags.
Comprehensive function catalog (numeric, string, date/time, boolean operations).
Frame-QL with supported rungs (0, 1, 2, 6, 7, 9), WITH-chained queries, lightweight registered ratios, HAVING, ORDER BY, LIMIT.
Multiple ACs over the same data (independent, no federation).
Backends: coframe-polars and coframe-duckdb, both production-quality. Each backend ships its own AI-assisted authoring toolchain for AC bootstrapping.
coframe-mcp server for LLM integration, including a server-side natural-language-to-Frame-QL dialogue layer.
DQ verification machinery (schema.init → verification → AC), including per-DNA-edge value attestation by default — making MTI an unconditional guarantee within scope rather than a conditional one resting on an unverified lemma.
AC Verification Levels (A, AA, AAA) reported on every AC and propagated to query results, so consumers (analysts, BI tools, AI agents) see the AC's rigor posture at a glance.

The structural-rigor posture is preserved: Coframe Core doesn't compromise on correctness within its scope. The omissions are convenience-oriented and capability-oriented, not rigor-oriented.

What Coframe Pro adds¶

Coframe Pro extends Coframe Core with capabilities engineers need for sophisticated analytical work:

Custom operator registration: engineers declare operators with their own semantics, partition-invariance properties, and missing-value treatment.
Broadcast as a first-class operator type: enabling persistent broadcast columns rather than only query-time broadcast.
Slowly Changing Attributes (SCA): time-varying attribute values modeled as a structural concern via multi-entity anchoring with a slow-time-grain component, rather than handled through ETL flattening.
Generalized functional grammar layer: lifting the empirical/deductive verification duality (data-attestation alongside verification-by-construction) from Coframe Core's special-case handling to the framework's primary architectural framing, with user-defined deterministic functions as first-class structural objects.
Recursive (self-referential) hierarchies: first-class support for employee-manager organizational hierarchies, parent-part bills-of-materials, message-thread reply structures, and similar self-referential patterns, with recursive query primitives in Frame-QL.
Cross-AC federation: queries spanning multiple ACs, with explicit reconciliation rules.
Multi-backend support: schemas in an AC can source from different engines; the framework coordinates cross-backend operations.
Configurable strictness: AC-level default with query-level override; three levels (Strict / Standard / Lenient) for engineer-controlled deviation under declared-assumption discipline.
Sensitivity analysis machinery: bounded estimates rather than point estimates for analytically-questionable queries.
Persistent re-ingestion of Frame-QL outputs: query results become AC schemas for subsequent queries.
Attestation extensions: federated-edge attestation across multi-backend ACs; attestation-driven sensitivity analysis for partially-attestable genealogies; incremental attestation; richer sampling strategies for very large fact tables. (The base attestation capability — verified cross-schema metric coherence — is in Coframe Core.)
Sophisticated AI-assisted authoring tooling: advanced multi-pass refinement, schema-evolution detection, complex AC-construction workflows.

These are real features for real needs, and they're commercial. The boundary is principled: Coframe Core has the grammar layer in its full thesis form; Coframe Pro adds extensibility, advanced capabilities, federation, and engineer-controlled override.

The upgrade path¶

The upgrade path is additive. Coframe Pro reads Coframe Core's AC format directly. Existing ACs and queries continue to work; Coframe Pro adds capabilities on top. If you're on Coframe Core and find the limits, you upgrade to Coframe Pro. If Coframe Core suffices, you stay there indefinitely.

Both editions present a unified coframe import namespace. The packages are distributed through different channels: Coframe Core is open-source, distributed via PyPI as coframe-core, coframe-connect, coframe-polars, coframe-duckdb, and coframe-mcp. Coframe Pro is commercially licensed, distributed through commercial channels, with its own package family. The upgrade is a configuration change for which packages to install; ACs and queries remain unchanged.

What's available now¶

Coframe is in active development. The Coframe Core Manual is published as a complete specification. The Coframe Core codebase is targeted for an open-source v1.0 release in 2026, with ongoing community development thereafter.

Coframe Pro is in commercial development on a parallel track. Initial customers will be invited to early access; v1.x is targeted for after Coframe Core's v1.0 release.

The platform is organized into five Coframe Core packages: coframe-core (the framework engine, including Frame-QL parsing and the dialogue layer), coframe-connect (the backend interface package), coframe-polars and coframe-duckdb (execution backends, each shipping its own authoring toolchain), and coframe-mcp (the MCP server). They are at varying stages of implementation. See the project's status page for current progress.

If you're evaluating Coframe for production use, the realistic timeline is: read the Coframe Core Manual now, follow the project, plan for a Coframe Core deployment when v1.0 ships, evaluate Coframe Pro when it's available for your use cases.

If you're interested in shaping the direction — early access, design feedback, contributing to Coframe Core — that's actively welcomed. The project is at the stage where practitioner input meaningfully shapes priorities.

9. Path to action¶

If Coframe's architecture resonates, here are paths to engagement.

Read the Coframe Core Manual. The manual is the authoritative specification of Coframe Core. It's structured for both reading (Foundations chapter) and reference (subsequent chapters). For a practitioner evaluating the platform, the Foundations chapter and the AC Authoring Workflow chapter give you enough to assess the architecture seriously.

Try the early implementations. As the open-source codebase progresses, early implementations of coframe-polars and coframe-duckdb become available for experimentation. You can author a small AC, write some Frame-QL queries, and see how the architecture behaves on real (or synthetic) data. The MCP server is available for LLM integration experiments.

Build a small AC for an analytical scope you own. The fastest way to evaluate Coframe is to build an AC for a specific analytical purpose you currently support — a department's analytics, a product team's metrics, a recurring reporting need. A few hours to author with the backend's authoring CLI; a few hours of evaluation queries. You'll quickly know whether the architecture fits your situation.

Engage with the project. Issues, discussions, and contributions are welcomed. The architecture has been developed with substantial design effort, but every implementation choice benefits from practitioner feedback. If you have specific requirements, edge cases, or use patterns Coframe should accommodate, say so.

Consider where Coframe fits in your data architecture. Coframe doesn't replace your transformation pipeline (dbt, etc.) or your visualization tools. It's a query layer that didn't exist before — direct querying by analysts and AI agents, against a structured AC that handles physical complexity. The integration points are with your existing pipeline (which produces the tables Coframe reads) and with your existing analytical surface (which can include Coframe-driven analyst querying alongside, or replacing portions of, your current setup).

The platform is designed for serious data engineering practitioners building analytical tooling for their organizations. If that's you, and if the architecture's central commitments — column-level governance, the grammar/semantics separation, the family vocabulary, AC as a deliberate authoring artifact, AI-agent-native query surface — feel right, Coframe is worth your engagement.

The Coframe project is led by reeeneeee. The Coframe Core Manual, project repository, and progress updates are at coframe.tech. Inquiries about commercial use, early access to Coframe Pro, or design feedback are welcomed at hello@coframe.tech.