CGS & CML Reference¶
Authoring vs determinism¶
Writing domain.yaml is not a deterministic process. OpenAPI (and friends) do not uniquely define a CGS: entity boundaries, capability grouping, relations, parameter roles, abstract embed-only types, and which values: keys exist (including whether unrelated fields share one value_ref vs split into distinct slots) are semantic choices. Tools may assist (e.g. an LLM reading the spec), but there is no canonical auto-generator in-repo and no guarantee that two valid domain models for the same API are equivalent.
After the YAML is written, validation and compilation are deterministic: schema checks, CML parse/compile, and runtime request shaping are mechanical consequences of what you authored.
Obsolete or unsupported (do not teach)¶
| Topic | Status |
|---|---|
domain_projection_fields |
Removed — use domain_projection_examples, optional primary_read, and explicit ordered provides: on the primary Get (see Entities). |
output.type: none |
Removed — actions need provides: and/or output: { type: side_effect, description: … }. |
CGS as .json |
Not loaded — load_schema rejects JSON paths; use a directory with domain.yaml + mappings.yaml, a combined authoring .yaml, or .cgs.yaml interchange. |
apis/<api>/eval/coverage.yaml exclude: |
Not implemented — only required_extra exists in plasm-eval coverage overrides. |
string + string_semantics: blob |
Legacy — the loader normalizes this to blob in the resolved CGS and clears blob string semantics. Prefer a values: row with type: blob. |
Inline field_type: / type: on entity fields: or on parameters: rows |
Removed from split domain.yaml authoring — wire shapes live only under top-level values:; slots use value_ref:. Exception: input_schema.input_type.fields remain full InputFieldSchema rows (value_ref + field_type mirrors); they must agree with values[value_ref] (see Value domains). |
How CGS, CML, and runtime fit together¶
| Layer | Artifact / crate | Role in list queries |
|---|---|---|
| CGS | domain.yaml |
Declares entities and capability kinds (query, get, …). Whether an entity has both query and get determines whether hydration is eligible — not how HTTP works. |
| CML | mappings.yaml |
Compiles each capability to HTTP/GraphQL. Optional composable pagination: (PaginationConfig: params, location, …) on query mappings drives multi-request pagination; list decode shape lives in response: / decoder config. |
| Runtime | plasm-runtime, plasm-agent |
Evaluates CML, loops pages when CLI --limit/--all (or internal caps) ask for more, decodes rows, merges into GraphCache. LLM execute uses opaque page(pg#) continuations (session-scoped) instead of exposing raw API pagination fields. Then (by default) runs concurrent GET per row to upgrade summary → complete when CGS has a get for that entity. |
Pagination wiring is a CML concern; opaque LLM paging handles are minted by plasm-agent execute sessions. Hydration is a runtime policy gated by CGS capability pairs (query + get).
CGS (Capability Graph Schema) — domain.yaml¶
The CGS is the semantic domain model. It declares what entities exist, how they relate, and what operations are available. It contains no HTTP details.
CRITICAL: Versioning is mandatory¶
- Every
apis/<api>/domain.yamlmust declare top-levelversion: <n>wheren > 0. - Version defaulting is forbidden; omitted/zero versions are invalid for authoring and plugin packaging.
- Increment
versionwhenever domain semantics change (entities, fields, relations, capability signatures, parameter typing/roles, auth contract, output/provides behavior). - Keep version unchanged only for non-semantic text edits (comments/prose) that do not affect runtime behavior, prompts, compile/decode, or dispatch.
Value domains (values:) and value_ref¶
Split domain.yaml declares a catalog-local registry of named semantic slots under top-level values: (stable keys, usually snake_case). Each row carries the wire type: and gloss-related keys—the same vocabulary as the former inline field_type / param type—but the key is a semantic identity for this catalog, not “dedupe by primitive wire shape alone”:
type:—string,integer,number,boolean,select,multi_select,date,array,entity_ref,blob,uuid.- Type-specific keys on the value row:
target(entity_ref),allowed_values(select/multi_select; multi_select must be non-empty),value_format(date),string_semantics(string),items: { value_ref: <key> }(array— element shape is anothervaluesrow).
Entity fields: and capabilities.*.parameters: list entries declare only how that slot uses a shape:
value_ref: <key>— required; must exist invalues:.required,description,path,derive— on fields (and parameter-specific keys:role,descriptionon parameters).- Presentation / attachment hints (
agent_presentation,mime_type_hint,attachment_media) live on the field slot when they apply (not duplicated on every reuse of the same value key).
Semantic slots (authoring judgement): A values: key is not “the type string” or “the type integer” in the abstract—it is a catalog-local semantic identity: what DOMAIN gloss, string_semantics, description, and validation say that value means in this API. Two different columns can share the same on-wire JSON type (string, RFC3339 date, …) yet must remain different keys when their meaning differs (e.g. owner vs repo vs html_url). Sharing one key across multiple value_ref sites is the same class of decision as relation cardinality or whether two endpoints are one capability: there is no deterministic rule from the wire alone—authors choose when two sites are intentionally the same domain value space (one enum, one id space, one taxonomy, aligned gloss). Prefer distinct keys per field/param by default; merge only when that identity story is obvious and descriptions stay compatible.
Sharing values keys: Only point multiple slots at the same values key when they are intentionally the same domain concept (e.g. one shared enum, or the same entity_ref target meaning the same id space) and gloss text is compatible. Never merge unrelated strings, integers, or dates solely because the wire type matches — use distinct keys per slot (nv_<entity>_<field>, nv_<capability>_<param>) so description / string_semantics stay truthful.
input_schema (create / update / action body): YAML uses full InputFieldSchema interchange: each object field has name, value_ref, field_type: (singleton map), plus mirrors (value_format, allowed_values, array_items, string_semantics, …). Those mirrors must match CGS::values[value_ref] — CGS::validate / registry denormalization rejects drift. Prefer defining the shape once under values: and copying the mirrored keys from that row.
Combined .cgs.yaml interchange may still show denormalized field_type on entity fields for serde round-trips; authoring new split domains should use values: + value_ref.
description on values: rows: Optional prose for tooling and DOMAIN gloss. The loader maps DomainNamedValue.description into NamedValueSchema.description. For entity fields, if the field slot’s description is empty, field_schema_from_domain_field uses the named value’s description as FieldSchema.description; a non-empty slot description overrides. For parameters, the same precedence applies via input_field_schema_from_domain_parameter. Prefer one canonical gloss on the values: row when a value domain is dedicated to a single slot; use the slot only when you need a one-off override. Do not dedupe unrelated primitives into one values key just because the wire type matches — conflicting glosses are a sign you split keys incorrectly.
Entities¶
An entity is a typed domain object with a primary key, fields, and relations.
values:
<value_key>:
type: <scalar type> # same vocabulary as Field Types below
target: <EntityName> # when type is entity_ref
allowed_values: [...] # select / multi_select (multi_select: non-empty)
value_format: <scalar or { temporal: ... }> # required when type is date
string_semantics: <...> # on string rows — prompts / summaries
items:
value_ref: <element_value_key> # when type is array
entities:
<EntityName>: # PascalCase
id_field: <field_name> # logical primary key for refs / CLI; must exist in fields unless id_from is set
id_from: <path> # optional — when list/detail JSON rows have no top-level id, take identity from nested keys
fields:
<field_name>:
value_ref: <value_key>
required: <bool> # default false
path: ... # optional wire path (see below)
derive: ... # optional
description: "..." # optional
relations:
<relation_name>:
target: <EntityName> # must be a defined entity
cardinality: one|many
domain_projection_examples: false # optional — default true; false = omit `[field,…]` projection list on the DOMAIN entity heading
primary_read: <get_capability_id> # optional — which Get’s ordered `provides` drives projection teaching; default = primary Get (see plasm-core)
DOMAIN projection teaching (default on): For each entity with a primary Get and non-empty ordered F from CGS::domain_projection_heading_fields in plasm-oss/crates/plasm-core/src/schema.rs (same as provides / default field order), the prompt renderer puts F in a single bracket on the entity heading line after ;;, before the description: Entity ;; [f1,f2,…,fN] - …. This applies even when DOMAIN teaches fetch as a zero-arity method (Entity.m#()) instead of Entity($). Expressions still use Entity(…)[subset] for actual reads. The Valid expressions preamble states that any non-empty subset of those fields is valid for trimming payloads; DOMAIN does not enumerate every prefix. F comes from that Get’s explicit provides: list (order preserved); if provides is empty, F defaults to id_field first, then remaining fields lexicographically. Set domain_projection_examples: false to suppress heading brackets (replaces the old empty domain_projection_fields: []). Optional primary_read: names the Get capability id when you must override which Get defines F (otherwise the same primary Get as the CLI manifest). This is prompt teaching only; runtime decode still uses per-capability provides / effective_provides.
from_parent_get pitfall: The JSON path must match the parent GET response for that relation. Array-of-ref shapes differ by API (e.g. PokéAPI Pokémon moves[].move vs Type moves[] as bare {name,url}). Copying one entity’s materialize.path to another without checking the wire JSON yields empty relations at decode time.
Cardinality one + nested child: When the child ref is not top-level {relation_name}.name (e.g. under meta.ailment on a move), declare materialize: { kind: from_parent_get, path: [...] } on that one relation. Only from_parent_get is allowed on cardinality one; query-scoped materialization remains for many relations.
id_from (optional): sequence of JSON object keys from the row object to a scalar string or number used as the stable id (e.g. a canonical URL). YAML may be a list [location_area, url] or a dotted string location_area.url. When id_from is present and non-empty, you do not need a fields entry named id_field solely for decoding; the runtime injects id_field into decoded rows from this path when missing.
Constraints:
- id_field must name a field in fields, or id_from must be a non-empty path as above
- Every relation target must be a defined entity (no dangling refs)
- Entity names are case-sensitive and must be unique
path and derive (wire response shaping)¶
By default, each field is read from a top-level JSON key matching the field name on the decoded row. Override the location with path on the field slot (next to value_ref) in domain.yaml (loads as FieldSchema.wire_path): either a dotted string (owner.login) or a YAML list of object keys ([payload, headers]).
derive runs on the extracted JSON value before optional scalar Transform steps. Rules (FieldDeriveRule, type tag, snake_case):
type |
Input shape | Behavior |
|---|---|---|
segments_after_prefix |
JSON string | Strip a URL prefix, split on /, take part_index (GitHub Issue repository_url → owner / repo). |
name_value_array_lookup |
JSON array of objects | Find the first element where match_key_field equals equals (defaults: match_key_field = name, value_field = value). Optional case_insensitive ASCII fold on that comparison (RFC 5322 header names). Return value_field from that object; if no match, the field decodes as null. Fits Gmail payload.headers, AWS-style [{ "Key": "…", "Value": "…" }] tags when match_key_field / value_field are set to Key / Value, and similar EAV-lite arrays. |
object_key_lookup |
JSON object | Return obj[key]; optional case_insensitive resolution of the key string against object keys. |
provides vs full row decode: HTTP GET responses are decoded using all entity fields that have path/derive wiring. Capability provides controls summary-vs-complete detection for list/search (CGS::effective_provides) and DOMAIN projection teaching; it does not strip extra decoded fields from the cached entity row by itself. If you need two different agent-facing projections over the same HTTP operation, duplicate capabilities with different provides is only a documentation narrowing unless the runtime adds explicit field filtering.
Other apis/* adopters: prefer name_value_array_lookup wherever a vendor exposes metadata as an array of small records (tags, headers, key/value rows). This repository’s Gmail CGS uses it for message headers; AWS/GCP-style tag arrays are a natural next candidate when those surfaces are modeled.
CML HTTP body (beyond object / var): the CML expression enum includes vendor-specific builders. Example: gmail_rfc5322_send_body evaluates to the JSON body for Gmail users.messages.send (raw plus optional threadId) from CML env keys from, to, subject, plainBody, and optional threadId, inReplyTo, references — see apis/gmail/mappings.yaml message_send_simple and plasm-oss/crates/plasm-cml/src/gmail_send_body.rs. gmail_rfc5322_reply_send_body is the same wire shape but derives defaults from preflight parent_* keys (see invoke_preflight on CapabilitySchema and Gmail message_reply).
description on entities and capabilities: Optional but recommended when it helps agents and humans. Write short domain prose—what the thing is or what the operation means to a user or integrator. Do not paste HTTP methods, URL paths, OpenAPI operation ids, bare http:// / https:// links, or “maps to …” wiring notes; those belong in mappings.yaml (comments may name products and hosts in words, not pasted URIs) or external API docs. The same rule applies to output.description for side_effect actions: state the domain effect (e.g. “message moves to Trash”), not the transport shape (“PATCH, empty body”, “returns 204”). Exception: auth.token_url and similar machine OAuth/OpenID fields may contain a provider token URL string—those are runtime wiring, not human DOMAIN copy.
Field Types¶
In split domain.yaml, the type: column below is the keyword you put on a values: row. Entity fields and capability parameters resolve that type via value_ref. Runtime FieldType / operator tables are unchanged.
| Type | YAML value | CLI parser | Operators | Description |
|---|---|---|---|---|
| String | string |
string | =, !=, contains, exists |
Free text |
| UUID | uuid |
string | =, !=, contains, exists |
Canonical UUID primary keys — wire values are strings; use for stable opaque ids (e.g. Linear id). No string_semantics (that is for string only). |
| Integer | integer |
i64 |
=, !=, >, <, >=, <=, exists |
64-bit integer |
| Number | number |
f64 |
=, !=, >, <, >=, <=, exists |
Floating point |
| Boolean | boolean |
--flag/--no-flag |
=, !=, exists |
True/false |
| Select | select |
PossibleValuesParser |
=, !=, in, exists |
Single enum. Requires allowed_values. |
| MultiSelect | multi_select |
repeatable string | contains, in, exists |
Multiple enum. Requires allowed_values. |
| Date | date |
string or integer (see value_format) |
=, !=, contains, exists |
Requires value_format: rfc3339, iso8601_date, unix_ms, or unix_sec. Predicate / expression inputs are normalized to that wire shape (forgiving parse; UTC for full datetimes). Display of API responses is not rewritten via value_format. Prompt DOMAIN lines use a generic …=datetime hint, not the wire-format token. |
| Array | array |
repeatable string | contains, in, exists |
Homogeneous list. Requires nested items: describing each element (see below). |
| EntityRef | entity_ref |
string | =, !=, exists |
Foreign key to another entity. Requires target: EntityName. ID values may be string or number at runtime. |
| Blob | blob |
string (opaque / base64 / attachment-shaped JSON) | =, !=, exists |
Opaque binary or base64-heavy payloads (file bytes, RFC822 raw, attachment contentBytes, GitHub Contents content, …). Do not use string_semantics on blob fields (omit the key). Legacy string + string_semantics: blob loads as blob. |
Blob / binary (values: row type: blob)¶
Use a values: entry with type: blob when the wire value is not human prose (base64/base64url, opaque octets, or the reserved attachment object), including:
- Entity fields populated from APIs that return base64 attachment bodies, binary-safe strings, or a JSON object with reserved
__plasm_attachmentmetadata (uri,mime_type/media_type, optionalbytes_base64). - Capability parameters with the same shape (e.g. Gmail
raw, GitHub Contentscontentas base64 in JSON).
Do not use blob for HTML/markdown message bodies meant to be read as text (keep type: string on the value row with string_semantics: markdown or document as appropriate).
Authoring knobs (entity field slots — alongside value_ref):
| Key | Applies when resolved type is | Notes |
|---|---|---|
mime_type_hint |
string or blob |
Hint for MCP/HTTP tabular summaries when the cell is reference-only or split (see below). Example: application/octet-stream, message/rfc822 (describe in description if you cannot set a single hint). |
attachment_media |
blob only |
Optional coarse class: generic, image, audio, video, document — for prompts/tooling; wire shape unchanged. |
agent_presentation |
string or blob |
Optional override; blob defaults to reference-only summaries (same as non-short strings) when unset. |
Execute summaries (table / TSV): for CGS blob entity fields, the agent formatter emits two columns, {field}_ref and {field}_mime, so URI (or (in artifact)) and MIME stay split. Non-blob columns that hold a full __plasm_attachment object still use a single cell uri (mime) or (in artifact) (mime) when the payload is bytes-only.
HTTP runtime: on 2xx responses whose body is not JSON, the default transport may coerce the body into a JSON object
{ "__plasm_attachment": { "bytes_base64": "…", "mime_type": "<Content-Type or application/octet-stream>" } }
unless the body looks like HTML/XML (error path preserved). Design decoders / provides so this shape can land on a blob field when APIs return raw octets.
Fixtures (interchange CGS): see fixtures/schemas/test_schema.cgs.yaml — entity BlobAsset declares two blob fields: payload (attachment_media: generic, octet-stream hint) and icon_png (attachment_media: image, PNG hint), plus a minimal blob_asset_get capability so the catalog validates. fixtures/schemas/capability_with_input.cgs.yaml includes an optional artifact blob field on update_account input for interchange coverage.
Array element typing (items: under values:)¶
Every values: row with type: array must include items: { value_ref: <key> } where <key> names another values: row for the element shape. Array slots (entity fields or parameters) only value_ref: the array row.
values:
url_string:
type: string
string_semantics: short
photo_urls:
type: array
items:
value_ref: url_string
label_id_string:
type: string
string_semantics: short
label_ids:
type: array
items:
value_ref: label_id_string
user_ref:
type: entity_ref
target: User
assignee_ids:
type: array
items:
value_ref: user_ref
flag_enum:
type: select
allowed_values: [a, b]
flags_arr:
type: array
items:
value_ref: flag_enum
instant_rfc3339:
type: date
value_format: rfc3339
dates_arr:
type: array
items:
value_ref: instant_rfc3339
entities:
Pet:
fields:
photoUrls:
value_ref: photo_urls
required: true
capabilities:
message_modify_labels:
parameters:
- name: labelIds
value_ref: label_ids
required: false
- name: assignee_ids
value_ref: assignee_ids
required: false
- name: flags
value_ref: flags_arr
required: false
- name: dates
value_ref: dates_arr
required: false
Loader constraints: the element values: row must not be type: array or multi_select. For element type: select, allowed_values on that row is required and non-empty. For element type: date, value_format belongs on the element value row. Do not put allowed_values on the array row unless you intend a different shape (arrays of enums use an element select row as above).
multi_select: on the values: row itself, allowed_values is required and must be non-empty (this is not the same as array of select).
CLI Flag Generation¶
Query subcommands: flags are generated from the capability's parameters: — one flag per declared parameter, typed by the resolved shape from value_ref (same vocabulary as entity fields). No parameters declared = no filter flags (just pagination controls from the CML pagination block). Entity fields do not generate query flags.
Resolved values[].type |
Flag generated | Parser |
|---|---|---|
string / uuid / date / entity_ref / blob |
--param |
string |
integer |
--param |
i64 |
number |
--param |
f64 |
boolean |
--param (flag, no value) |
SetTrue |
select |
--param with [possible values: ...] |
PossibleValuesParser |
multi_select / array |
--param repeatable |
string (Append) |
Relation subcommands: flags are generated from the target entity's query capability parameters: — same rules as query subcommands. If the target has no query capability or no parameters, the relation subcommand shows only pagination flags (if available).
Create / update / action subcommands: flags come from the capability's input_schema (same vocabulary, different purpose — these are write inputs, not query filters).
EntityRef Composition (CLI auto-derived subcommands)¶
Beyond query filter flags, entity_ref fields drive three additional CLI features:
| Feature | CLI example | How it works |
|---|---|---|
| FK navigation | order 5 pet-id |
Subcommand per EntityRef field (when target has Get cap). Resolves to full target entity. |
| Reverse traversal | pet 10 orders |
Auto-derived when a query capability on another entity has an entity_ref parameter targeting this entity. Injects petId=10 as predicate. |
| Cross-entity filter | order query with predicate pet.status=available |
Dot-path predicates decomposed: push-left (query foreign first, inject IDs) or pull-right (client-side N+1 filter). |
Naming conventions:
- FK navigation: petId → subcommand pet-id (camelCase → kebab-case)
- Reverse traversal: target entity Order → subcommand orders (pluralized lowercase)
Authoring for reverse traversal: Ensure the query capability parameter’s value_ref points at a values: row with type: entity_ref whose target matches the entity field’s entity_ref target. The CGS validator checks alignment between those slots.
Capabilities¶
A capability declares an operation available on an entity.
capabilities:
<entity>_<operation>: # unique name, conventionally entity_verb
kind: <kind> # see Capability Kinds below
entity: <EntityName> # must be a defined entity
parameters: # optional, for capabilities with typed params
- name: <param>
value_ref: <value_key>
required: <bool>
description: <string> # optional; short human meaning for agents and DOMAIN gloss (see below)
role: <role> # optional semantic role — see Parameter Roles below
Wire shape for each parameter is values[value_ref] — same scalar vocabulary as entity field value rows (type, target, allowed_values, items, value_format, string_semantics, …).
description on capability parameters: Optional. When the prompt uses a symbolic PromptRenderMode (compact or tsv, via --symbol-tuning compact|tsv on plasm-mcp / plasm-repl / plasm-eval — not a legacy symbol_tuning: true flag), each parameter gets a p# gloss line in DOMAIN (compact: line before first use; tsv: folded into the teaching table). The gloss shows the parameter type and, after a middle dot (·), either this description (trimmed, possibly truncated) or, if omitted, the wire name from YAML. Use the same style as entity field descriptions: short domain prose, not HTTP or mapping trivia.
Parameter Roles¶
The optional role: annotation declares the semantic purpose of a parameter. This helps agents and LLM tooling understand how the parameter affects results, beyond just its data type.
role: |
Semantics | Examples |
|---|---|---|
filter |
Equality/range predicate on entity field values (default) | status, archived, due_date_gt |
search |
Free-text relevance query — server ranks results | q, query, search |
sort |
Sort field selector | order_by, sort_by |
sort_direction |
Ascending/descending companion to sort |
sort, direction |
response_control |
Payload shape/detail control — does not filter results | embed, fields, inc, exc |
scope |
Parent-entity pivot wired into the URL path (always entity_ref, required) |
team_id, space_id |
role: is informational metadata — it does not change how the parameter is transmitted over HTTP. Transmission is controlled entirely by the CML query: or path: block in mappings.yaml.
Foreign key fields (entity_ref)¶
Use entity_ref when a field stores another entity’s primary key. Declare the referenced entity in target. The CGS validates that target names a defined entity.
For query capabilities, if a parameter has the same name as an entity field and both are entity_ref, their target values must match. That ties the HTTP/query parameter to the domain FK and enables static reverse-traversal lookup: CGS::find_reverse_traversal_caps("Pet") returns every query capability whose parameters include EntityRef(Pet).
CML does not change: variables (e.g. team_id) are still bound from the compiled environment. Typing is enforced in the CGS only.
Example (two-sided pattern):
values:
scalar_i64:
type: integer
pet_entity_ref:
type: entity_ref
target: Pet
entities:
Order:
id_field: id
fields:
id:
value_ref: scalar_i64
required: true
petId:
value_ref: pet_entity_ref
required: false
capabilities:
order_findByPetId:
kind: query
entity: Order
parameters:
- name: petId
value_ref: pet_entity_ref
required: true
Capability Kinds¶
| Kind | Semantics | CLI position | Requires ID |
|---|---|---|---|
query |
Filter/list a collection by field predicates | entity query --flags |
No |
search |
Full-text relevance search; primary input is a q/query/search param |
entity search --flags |
No |
get |
Fetch single by ID/key | entity <id> (implicit) |
Yes |
create |
Create new entity | entity create --flags |
No |
update |
Modify existing entity | entity <id> update --flags |
Yes |
delete |
Remove entity | entity <id> delete |
Yes |
action |
Any other operation | entity <id> actionName --flags |
Yes |
Action output: provides: vs output.side_effect¶
kind: action must declare how the response is modeled:
- Entity projection — non-empty
provides:lists which entity fields the HTTP response populates (same rules as other kinds that return entity-shaped JSON). - No projection — the call is effectful (something changes) but the response is empty, opaque, or not mapped onto entity fields. Declare
outputwithtype: side_effectand a non-emptydescription:string that states what changes in the domain (not generic “updates resource”, and not HTTP status or path trivia).
There is no output.type: none in the schema: it invited silent, incomplete modeling. Side-effect actions must always say what they do.
capabilities:
message_trash:
description: Move a message to TRASH
kind: action
entity: Message
output:
type: side_effect
description: "Moves the message to Trash; the response carries no fields mapped onto this entity."
page_get_markdown:
kind: action
entity: Page
provides: [id, markdown, truncated] # projection path — no side_effect block needed
Validation: CGS validate rejects (a) action with neither provides nor output, and (b) side_effect with missing or whitespace-only description.
query vs search: Use query when the API filters by field equality/range predicates (status=available, archived=true). Use search when the primary input is a free-text relevance query (q=pikachu) and results are ranked, not field-filtered. Search capabilities are excluded from reverse-traversal FK lookups (find_reverse_traversal_caps). CLI verb is search not query.
Multiple query capabilities per entity (primary vs named)¶
An entity can have multiple kind: query (or kind: search) capabilities. The CLI automatically determines which one gets the query/search verb and which get named subcommands:
| Capability shape | CLI position | Example |
|---|---|---|
| No required params (or only optional filters) | entity query --flags (primary) |
spell query --level 1 |
Required params but no role: scope |
First one: entity query --flags (primary); others: entity cap-name --flags |
pet query --status available (primary), pet findbytags --tags fluffy (named) |
Required role: scope param |
Always named: entity cap-name --scope_param value |
spell class-spells --class_index wizard --level 1 |
Detection is automatic from parameter roles — no extra annotation needed. If a query capability has a required role: scope parameter, it always gets a named subcommand. Among non-scoped caps, the parameterless one (or the first with required params) becomes primary.
Validation rule: At most one parameterless (no required params) query/search per entity.
Required Parameters¶
When a capability has a parameter with required: true, the CLI enforces it:
values:
pet_status:
type: select
allowed_values: [available, pending, sold]
capabilities:
pet_findByStatus:
kind: query
entity: Pet
parameters:
- name: status
value_ref: pet_status
required: true # CLI will reject if --status not provided
CLI behavior:
$ pet query
error: the following required arguments were not provided: --status <status>
$ pet query --status INVALID
error: invalid value 'INVALID' [possible values: available, pending, sold]
$ pet query --status available
(executes)
Relations and Navigation¶
Relations create navigation subcommands. The target entity's query capability parameters: become filter flags.
# In domain.yaml:
values:
tag_name:
type: string
string_semantics: short
entities:
Pet:
relations:
tags:
target: Tag
cardinality: many
Tag:
fields:
name:
value_ref: tag_name
CLI behavior:
$ pet 10 tags # navigate Pet→Tag relation
$ pet 10 tags --name Fluffy # navigate + filter by Tag's fields
Scoped many-relations — materialize: query_scoped / query_scoped_bindings¶
When a REST API uses a sub-resource URL pattern (/parent/{parent_id}/children) or a scoped list query, declare materialize on the many relation so chain traversal knows which target capability parameters to fill from the parent row.
Single scope parameter (query_scoped) — capability names the exact target query / search capability; param is its scope field; the value comes from the parent entity’s id_field (same behavior the runtime historically called “via_param”):
entities:
Page:
relations:
blocks:
target: Block
cardinality: many
materialize:
kind: query_scoped
capability: block_children_query
param: block_id
Multiple scope parameters (query_scoped_bindings) — same required capability, plus map each target capability parameter name to a parent entity field name:
entities:
Calendar:
relations:
events:
target: Event
cardinality: many
materialize:
kind: query_scoped_bindings
capability: event_list
bindings:
calendarId: id
CLI behavior: scope arguments are not separate flags — they are filled from the parent entity id / fields:
$ page "abc-123" blocks
$ calendar "cal-1" events
REPL / expression syntax:
Page("abc-123").blocks
Page~"Agentium".blocks
Multiline / structured string values in predicates and method arguments use a bash-inspired tagged <<TAG heredoc (not Rust r# strings): <<TAG\n … \nTAG\n with TAG alone on a closing line (trimmed), or TAG) / TAG, / TAG} glued on that line. DOMAIN prompts echo this when string_semantics is not plain short.
Compound entity_ref scope parameters (one param that unpacks to several path/query slots, e.g. repository identity) use runtime scope splat and optional scope_aggregate_key_policy on the capability — distinct from query_scoped_bindings (several named params bound to parent fields).
Multiple projections of the same entity — provides: and auto-resolution¶
When multiple API endpoints return disjoint field subsets of the same logical resource (same id, different fields), model them as one entity with required: false on projection-only fields. Declare provides: on each capability to enable auto-resolution.
values:
nv_page_url:
type: string
string_semantics: short
nv_page_created_time:
type: date
value_format: rfc3339
nv_page_in_trash:
type: boolean
nv_page_markdown:
type: string
string_semantics: markdown
nv_page_truncated:
type: boolean
entities:
Page:
id_field: id
fields:
url:
value_ref: nv_page_url
created_time:
value_ref: nv_page_created_time
in_trash:
value_ref: nv_page_in_trash
markdown:
value_ref: nv_page_markdown
required: false
truncated:
value_ref: nv_page_truncated
required: false
capabilities:
page_get:
kind: get
entity: Page
provides: [id, url, public_url, created_time, last_edited_time, in_trash, archived]
page_get_markdown:
kind: action
entity: Page
provides: [id, markdown, truncated]
provides: declares which entity fields a capability populates in its response. The runtime builds a reverse index (field → capability) and uses it to auto-invoke the correct capability when a projected field is absent from cache.
Auto-resolution in action:
plasm> Page("abc")[markdown]
# "markdown" absent from cache → auto-invokes page_get_markdown("abc")
# additive merge → Page:abc now has markdown + all metadata fields
{"markdown": "# Full page content..."}
provides: defaults when omitted (backward-compatible):
- get / query / search → provides all entity fields (optimistic)
- create / update / delete / action → provides nothing (declare explicitly)
Recommendation for kind: get: Declare an explicit ordered provides: listing every scalar field the detail response materializes (same names as entities.<Entity>.fields), with id_field first and the rest in the same order as in the entity block. That keeps decode / field_providers accurate and fixes the DOMAIN heading projection list (Entity ;; [f1,…,fN] - …) to that order instead of id_field + lexicographic fallback.
For action, if you rely on the default empty provides, you must add output: { type: side_effect, description: "…" } (see Action output above). Other kinds do not require output when provides is empty unless you add structured output for documentation.
Three-way capability contract — full field-level provenance:
| Annotation | Direction | Meaning |
|---|---|---|
parameters: |
input | What the API endpoint accepts |
provides: |
output | Which entity fields the response populates |
mutates: |
write set | Which entity fields this capability changes (roadmap) |
Recognition: path /resource/{id} + /resource/{id}/suffix; both return same id; disjoint fields.
CML (Capability Mapping Language) — mappings.yaml¶
CML defines how each capability translates to an HTTP request (or GraphQL over HTTP when transport: graphql). It is a declarative template language — no loops; conditionals are if with exists, equals, or bool conditions (see below), total evaluation.
Structure¶
Each capability name from domain.yaml gets one entry:
<capability_name>:
method: GET|POST|PUT|PATCH|DELETE
path: <path_segments>
query: <cml_expr> # optional
body: <cml_expr> # optional
headers: <cml_expr> # optional
pagination: <pagination_block> # optional; query capabilities only
Pagination (CML) — mappings.yaml only¶
Pagination is transparent in the domain model: domain.yaml still uses kind: query for list capabilities. HTTP pagination is declared only in CML so the execution engine can merge page parameters, decode the configured items path, and loop until completion.
When a mapping includes pagination, plasm-agent adds built-in CLI flags (not from entity fields). --limit and --all are always present; starting-position flags are derived from the pagination.params map (counter / fixed / from_response keys and location) so --help only lists what applies to that capability.
LLM / HTTP MCP execute: paginated queries return one upstream page by default. When more pages exist, the host mints an opaque session handle (pg1, pg2, …) and surfaces has_more plus a compact page(pgN) follow-up (and _meta.plasm.paging when MCP meta is enabled). Models continue with page(pgN) or page(pgN, limit=50); transport-specific param names stay out of the prompt.
| Flag | Effect (CLI) |
|---|---|
--limit N |
Return at most N entities total (may issue multiple upstream requests). |
--all |
Fetch until the API reports no next page (runtime safety cap: 10_000 pages). |
Default when neither is set: first page only (LLM execute matches this unless the model issues page(pg#) continuations).
Pagination block schema (composable PaginationConfig)¶
Rust ground truth: PaginationConfig in mappings.yaml under pagination:.
pagination:
location: query # query | body | link_header | block_range
body_merge_path: [variables, o, paginate] # optional; when location: body
response_prefix: [data, issues, pageInfo] # optional; scope for stop_when / from_response
stop_when:
field: hasNextPage
eq: false
params:
offset:
counter: 0
step: 20
limit:
fixed: 20
after:
from_response: endCursor
Decode shape for list bodies remains on the mapping’s response: / decoder (items, items_path, …) — not inside pagination:.
location (summary)¶
location |
Role |
|---|---|
query (default) |
Merge params into the query string. |
body |
Merge params under body_merge_path (or top-level JSON body). |
link_header |
Next page from Link: …; rel="next" (Live mode; replay caveats without stored headers). |
block_range |
EVM log ranges (from_block / to_block). |
Inference heuristics (LLM / authoring)¶
| OpenAPI / response signal | Likely pagination.params / location shape |
|---|---|
Query params offset + limit |
Counters + fixed limit, location: query |
Query param page (no offset) |
page counter + optional per_page / size fixed |
Params cursor, start_cursor, after |
from_response continuation fields |
Params starting_after / ending_before |
Keyset-style after / before params |
Schema Paginated* with count, next, previous, results |
Offset/page + response_prefix if nested |
has_more + data |
Often stop_when + from_response on nested pageInfo |
next_cursor + results |
Cursor param + from_response |
| No list pagination parameters | omit pagination |
GraphQL (transport: graphql)¶
GraphQL list capabilities use the same composable pagination: shape as HTTP (see apis/graphqlzero, apis/linear, etc.):
location: typicallybodywith variables merged underbody_merge_path(e.g.[variables]or[variables, o, paginate]).params: maps keys merged at that path — e.g. Relayfirst/afterwith{ from_response: endCursor }.response_prefix: optional path from the root JSON response forstop_whenandfrom_response(e.g.[data, issues, pageInfo]).--limit/--all: same CLI behavior as HTTP whenpaginationis present.
CML object fields: Value::Null keys are omitted at eval time. In eval_cml, when building a type: object, any key whose sub-expression evaluates to Value::Null is not inserted into the parent object. So the common optional pattern type: if / condition: exists / else_expr: { type: const, value: null } produces no key for missing inputs—well-typed omit semantics in the compiled Value, not only on the wire. Nested objects are evaluated recursively with the same rule.
HTTP JSON body: null keys are still stripped before POST (strip_null_fields in plasm-oss/crates/plasm-runtime/src/http_transport.rs) as a safety net for any remaining null in nested JSON (e.g. from non-object paths). Together, CML omit + transport strip match typical partial GraphQL mutation inputs (omit field = leave unchanged). Mappings YAML may still show else: null; that evaluates to null, then the key is dropped at object construction.
Explicit JSON null to clear a field: A key whose value must be a literal null in JSON (e.g. clear an optional assignee) is not representable if the only way to express it is Value::Null inside a CML object (it will be omitted). A future extension could add a dedicated CML/Value form for explicit null; replay RequestFingerprint hashes the compiled CompiledRequest body, which after object-omit no longer carries null entries for omitted optionals—aligned with the wire.
Query result hydration (runtime + plasm-agent)¶
This is not part of CML or domain.yaml. After a query succeeds, if the CGS defines a get capability on the same entity as the query’s entity, the runtime defaults to:
- Merging decoded list rows into
GraphCacheascompleteness: summary. - For each returned
Ref, issuing the get mapping (concurrent, up toExecutionConfig::hydrate_concurrency, default 5) unless the cache already holdscompletefor that ref. - Merging GET responses as
completeand returning entities in query result order.
Opt out (list-shaped output only):
plasm-agent:--summaryonentity query …and on relation subcommands that dispatch a targetQueryExpr. The flag exists only when the queried entity has a get capability.- IR / programmatic:
QueryExpr.hydrate = Some(false)for one query; orExecutionConfig.hydrate = falsefor the whole engine.
When hydration does not run: the entity has query but no get mapping (nothing to upgrade with).
Interaction with pagination: pagination collects the ordered list of refs first; hydration runs after pages are merged (same concurrency and skip rules per row).
Cache semantics: CachedEntity.completeness is summary or complete. Merge never overwrites complete with summary. execute_get returns a cache hit only for complete rows; summary forces a GET so pet 10 after pet query still deepens the payload.
Path Segments¶
An ordered list of literal strings and variable references:
path:
- type: literal
value: pet # → /pet
- type: var
name: id # → /pet/{id}
- type: literal
value: uploadImage # → /pet/{id}/uploadImage
CML Expressions¶
Variable reference¶
type: var
name: <variable_name>
Constant¶
type: const
value: <any_value>
Object (key-value pairs)¶
type: object
fields:
- - key_name
- type: var
name: value_var
- - another_key
- type: const
value: fixed_value
Conditional (if)¶
type: if
condition:
type: exists
var: <variable_name>
then_expr: <cml_expr>
else_expr: <cml_expr>
Conditions (CmlCond in plasm-cml): exists (variable bound), equals (compare two expressions), bool (truthy eval of a sub-expression). Prefer exists for optional query params; use equals / bool when the API needs explicit sentinels or flags.
Array join (CSV / pipe serialisation)¶
Join an array variable into a single delimited string. Use when the API expects a comma-separated or pipe-separated list rather than repeated query keys.
type: join
sep: "," # separator (use "|" for pipe-delimited)
expr:
type: var
name: genres # must resolve to Value::Array
In the query: block:
# Emits ?genres=1,2,3 (CSV)
query:
type: object
fields:
- - genres
- type: join
sep: ","
expr: { type: var, name: genres }
# Emits ?ids=1|2|3 (pipe)
- - ids
- type: join
sep: "|"
expr: { type: var, name: ids }
Repeated-key arrays (?embed=a&embed=b): Use a plain Var without join. The HTTP execution layer automatically expands Value::Array query param values into repeated key=value pairs:
# Emits ?embed=cast&embed=episodes
- - embed
- type: var
name: embed
Variable Resolution¶
The execution engine populates the CML environment before template evaluation:
| Operation | Variables set |
|---|---|
| Query | filter (compiled BackendFilter), each predicate field=value pair (e.g. status="available"), projection |
| Get | id, plus all path var names from the CML template set to the ID value |
| Create | input (Value::Object from CLI flags) |
| Delete | id, plus all path var names |
| Update/Action | id, path var names, input |
This means: if the spec uses {petId} in the path, the CML template should use name: id (normalized) OR name: petId (the engine sets both).
Compilation: CML → HTTP Request¶
CML template + environment variables
↓ eval_path_segment() per segment
→ URL path string
↓ eval_cml() on query expr
→ URL query parameters
↓ eval_cml() on body expr (when body_format is json or form_urlencoded)
→ JSON or scalar-map request body
↓ eval_cml() on each multipart.parts[].content (when body_format is multipart)
→ compiled multipart parts (null parts omitted)
↓ assemble
→ CompiledRequest { method, path, query, body, body_format, multipart, headers }
The compiled request is deterministic: same template + same env = same HTTP request. This enables blake3 fingerprinting for record/replay.
Example: Full Mapping¶
OpenAPI endpoint:
GET /pet/findByStatus?status=available
CML mapping:
pet_findByStatus:
method: GET
path:
- type: literal
value: pet
- type: literal
value: findByStatus
query:
type: object
fields:
- - status
- type: var
name: status
When the user runs pet query --status available:
1. Predicate: status = "available"
2. CML env: {status: "available"}
3. Path: /pet/findByStatus
4. Query: ?status=available
5. HTTP: GET /pet/findByStatus?status=available
Example: Path Variable¶
OpenAPI endpoint:
DELETE /pet/{petId}
CML mapping:
pet_delete:
method: DELETE
path:
- type: literal
value: pet
- type: var
name: id
When the user runs pet 10 delete:
1. Entity ID: "10"
2. CML env: {id: "10", petId: "10"} (engine sets both)
3. Path: /pet/10
4. HTTP: DELETE /pet/10
Example: Request Body¶
OpenAPI endpoint:
POST /pet (body: Pet schema)
CML mapping:
pet_create:
method: POST
path:
- type: literal
value: pet
body:
type: var
name: input
When the user runs pet create --name Fido --status available:
1. Input: {name: "Fido", status: "available"}
2. CML env: {input: {name: "Fido", status: "available"}}
3. Body: {"name": "Fido", "status": "available"}
4. HTTP: POST /pet with JSON body
Request body formats (body_format)¶
Default is json: body: is evaluated to a Plasm Value and POSTed as application/json (nulls stripped on the wire).
form_urlencoded: body: must evaluate to a flat object of string/number/bool fields; the runtime sends application/x-www-form-urlencoded.
multipart: do not set body:. Instead set multipart: with a parts: list. Each part has:
name: form field name (required).file_name: optionalContent-Dispositionfilename (typical for file parts).content_type: optional MIME for the part (JSON object/array parts default toapplication/jsonwhen omitted).content: a CML expression evaluated likebody:fields. If it evaluates to null, the part is omitted (optional metadata).
File bytes: evaluate content to an attachment-shaped JSON object with reserved __plasm_attachment and non-empty bytes_base64 (same shape as decoded HTTP binary and CGS blob fields). URI-only attachments are rejected for outbound multipart. In domain.yaml, model the slot as type: blob when you want strict typing; type: json is also accepted for attachment-shaped values (e.g. to keep DOMAIN prompts minimal in small demo catalogs).
Example (OpenAPI-style upload + optional string field):
body_format: multipart
multipart:
parts:
- name: additionalMetadata
content:
type: if
condition: { type: exists, var: additionalMetadata }
then_expr: { type: var, name: additionalMetadata }
else_expr: { type: const, value: null }
- name: file
file_name: upload.png
content:
type: var
name: file
Rust ground truth: HttpBodyFormat, MultipartBodySpec, wire build in http_transport.rs.
Authentication¶
Declare authentication once at the top level of domain.yaml under the auth: key. For public HTTP APIs (no outbound credentials), use scheme: none so tooling can tell intentional “no auth” from a missing block. Omitting auth entirely is still accepted for backward compatibility but is ambiguous for UX (tool-model cannot distinguish “public” from “not yet modeled”). Credential-bearing schemes read secrets at request time from environment variables or hosted KV via SecretProvider. No secrets are stored in schema files.
Constraint: auth: { scheme: none } cannot be combined with a top-level oauth: block (OAuth implies delegated auth).
Supported schemes¶
| Scheme | YAML scheme: value |
Injected as | Env var fields |
|---|---|---|---|
| No outbound credentials (public API) | none |
(nothing) | — |
| Static API key in a header | api_key_header |
<header>: <value> |
header, env |
| Static API key in query param | api_key_query |
?<param>=<value> |
param, env |
| Bearer token | bearer_token |
Authorization: Bearer <token> |
env |
| OAuth 2.0 client credentials | oauth2_client_credentials |
Authorization: Bearer <token> (token cached + auto-refreshed) |
token_url, client_id_env, client_secret_env, scopes (optional) |
Examples¶
# Public / open HTTP API (e.g. PokéAPI, D&D 5e)
auth:
scheme: none
# API key sent as a query parameter (e.g. RAWG, OMDB)
auth:
scheme: api_key_query
param: key # query param name
env: RAWG_API_KEY # name of the env var holding the secret
# API key sent as a query parameter with a different param name (e.g. OMDb)
auth:
scheme: api_key_query
param: apikey
env: OMDB_API_KEY
# API key sent as a query parameter for NYTimes
auth:
scheme: api_key_query
param: api-key
env: NYTIMES_API_KEY
# Bearer token (e.g. ClickUp personal API token, Notion, Tavily)
auth:
scheme: bearer_token
env: CLICKUP_API_TOKEN
# Static API key in a named header
auth:
scheme: api_key_header
header: X-Api-Key
env: MY_SERVICE_API_KEY
# OAuth 2.0 client credentials (e.g. Spotify)
auth:
scheme: oauth2_client_credentials
token_url: https://accounts.spotify.com/api/token
client_id_env: SPOTIFY_CLIENT_ID
client_secret_env: SPOTIFY_CLIENT_SECRET
scopes:
- user-read-private # optional; omit if not needed
How auth injection works¶
Auth is injected before CML-declared headers: so that per-capability mappings can override credentials if ever needed. Pagination continuation requests (Link header follow-ups) receive the same credentials automatically.
For oauth2_client_credentials, the runtime:
1. Checks a per-AuthResolver in-memory cache (tokio::sync::RwLock<Option<CachedToken>>).
2. If the cached token is still valid (with a 30-second safety margin), uses it directly.
3. Otherwise exchanges client_id + client_secret for a fresh token via POST token_url, caches it, then proceeds.
Runtime extension¶
The SecretProvider trait in plasm-runtime::auth is dyn-compatible. To use a secret store other than env vars, implement SecretProvider and pass it to AuthResolver::new(scheme, Arc::new(my_provider)).
Execution Pipeline¶
CLI args
→ clap parses typed flags (rejects invalid values/types/missing required)
→ dispatch builds Expr (Query/Get/Create/Delete/Invoke)
→ type_check_expr validates against CGS
→ normalize predicate (flatten, DeMorgan, dedup)
→ compile predicate to BackendFilter
→ populate CML environment
→ eval CML template → CompiledRequest
→ execute HTTP (live/replay/hybrid)
→ normalize response (bare array → {results: [...]})
→ decode response via schema-driven decoder (fields from CGS entity)
→ merge decoded entities into graph cache (stable Ref identity)
→ after **query**, optional concurrent **GET** per row when entity has **get** (unless `--summary` / `QueryExpr.hydrate == Some(false)`)
→ format output (json/table/compact)
Per compiled capability, the same CGS + CML + input yields the same primary HTTP request (fingerprint-based replay). Pagination and hydration add further requests whose count depends on result size, cache state, and flags — each follow-up request is still compiled and replayed like any other GET.