RTK v1 — Relational Tag Kernel (Development Mode) Plain Text · Safe ASCII · JSON Lines inside .txt · No emojis 0.0 Purpose Build the static tagging kit for one agent during development. Inputs are sanitized Kernel (Top-5 levers) and sanitized Shell (Fuel A–I). Outputs are three plain-text JSONL artifacts the deployed agent will later use: • agent_tag_list.txt (≈12,000 tags) • agent_edges.txt (≈75,000 edges) • agent_mini_edges.txt (≈15,000 edges; curated backbone) Counts may scale by domain breadth (narrow ↓, broad ↑). 0.1 Principles • Kernel-first fidelity; Shell for operational breadth. • Proportional tag allocation (baseline): Kernel ≈25%; ENG+VIEW ≈40%; CONCEPT+curated FRAG ≈35%; QA_FLAG/XWMAP contribute 0 unless recurring risk category emerges. • Discrete edge strengths only: very_low | low | avg | high | very_high. • Deterministic, reproducible, safe ASCII; JSONL; LF newlines. • Dev-time only; runtime tagging is governed by CIT; memory I/O by CMAP. 0.2 Minimal Config Defaults (override per domain) MAX_BYTES_PER_FILE: 8_000_000 MAX_LINES_PER_FILE: 50_000 SOFT_WARN_BYTES: 4_000_000 SOFT_WARN_LINES: 25_000 ANOMALY_PCT_PT: 5 DEGREE_P95_MULTIPLIER: 1.25 MIN_RETENTION_DAYS: 540 ARCHIVE_ROOT: /rtk_archive ARCHIVE_VALIDATE_ON_START: true CI_ENFORCE_ARCHIVE: false (dev), true (release) RETENTION_ENFORCER_CRON: 0 2 * * * INVALID_BYTE_ALERT_THRESHOLD: 3 Required Shell set (ASCII .txt): Spine, FRAG, CONCEPT, EDGE, BRIDGE, ENG, XWMAP, VIEW_SCENARIO, QA_FLAG, IMPL. — ORDER OF OPERATIONS (ALPHA–NUMERIC; SMART HALTS) — A.0.0 Session Start & Inputs (LOCK) A.1.0 Operator uploads sanitized Kernel and Shell once (plain .txt). A.2.0 Preflight: if ARCHIVE_VALIDATE_ON_START=true, create archive_path = ARCHIVE_ROOT/YYYYMMDD_HHMMSS_run_id; verify write access; capture env_fingerprint; start manifest.json. A.2.1 Record file names, sizes, and line counts; apply SOFT_WARN_* and MAX_* thresholds. Hard-limit rejections are logged. A.2.2 Write ingestion_report.jsonl immediately (one JSON per file) under archive_path/. A.3.0 SMART HALT — Show manifest summary and archive_path. “CONTINUE to normalize; AGAIN to reprint manifest.” B.0.0 Normalization & Working Index B.1.0 Stream-parse inputs; enforce ASCII 32–126; LF only; reject BOM, tabs, NBSP, smart quotes, zero-width; record invalid_bytes offsets per file. B.1.1 If repeated invalid byte offsets across runs ≥ INVALID_BYTE_ALERT_THRESHOLD, append governance event “repeated_invalid_bytes” to governance_event_log.jsonl (archive_path/). B.2.0 Build working index: sections, IDs, lever map, metrics, mechanisms, rules, flows, outcomes. B.3.0 No content dedup across files (handled upstream); normalize only names/IDs. B.4.0 SMART HALT — Report normalization stats (counts per file, rejected items). “CONTINUE to seed candidates; AGAIN to fix and re-run.” C.0.0 Candidate Tag Harvest (Over-generate Pool) C.1.0 Kernel-first harvest: mechanisms, subcapabilities, metrics, guardrails, counterfactuals, microtests, SOP families, decisions, artifacts, risks, evidence patterns. C.2.0 Shell harvest priority: ENG + VIEW → CONCEPT → curated FRAG (high-signal only). C.3.0 Naming: snake_case; ASCII; short/specific; no verbs unless families (decision_, sop_, microtest_*). Create aliases where obvious; record parents/related. C.4.0 Over-gen pool target before pruning: narrow 7–9k; broad 9–12k. C.5.0 Determinism: stable iteration order seeded by run_id; no nondeterministic sampling. C.7.0 SMART HALT — Emit harvest stats by source (counts + examples). “CONTINUE to prune; AGAIN to expand thin sources.” D.0.0 Pruning, Dedupe, Allocation, Coverage D.1.0 Alias registry: normalize and enforce injective mapping; alias != canonical; aliases unique within and across families. D.1.1 On collision: SMART HALT; print collision table; write event to governance_event_log.jsonl; require operator resolution before proceed. D.2.0 Enforce proportional allocation of final tag set (baseline in 0.1). D.3.0 Coverage gates (must pass): • Each lever has ≥1 lever_* and ≥3 subcap_* tags. • Each lever has ≥1 metric_* and ≥1 guardrail_* tied to it. • ENG has ≥1 rule, ≥1 diag, ≥1 flow, ≥1 outcome represented in tags. • VIEW contributes ≥3 role/entry tags mapped to lever contexts. D.4.0 Prepare internal “Ready-12000” tag list (not emitted yet). D.5.0 Determinism: sort Ready-12000 by tag_id. D.6.0 SMART HALT — Print coverage dashboard (counts per family, gaps). “CONTINUE to build edges; AGAIN to adjust allocation.” E.0.0 Edge Synthesis (agent_edges.txt) E.1.0 Shell EDGE mapping: map to tag names; default strength=avg; escalate to high when corroborated by ≥2 independent signals (e.g., EDGE + KMECH, or EDGE + ENG/VIEW), or when EDGE marks explicit “strong” evidence; demote to low if ambiguous. E.2.0 Kernel ↔ ENG/VIEW: add when a lever mechanism grounds rules/flows/outcomes; default strength=avg; escalate to high if mechanism-explicit and independently referenced in Shell; keep rationale short. E.3.0 Conceptual cohesion (CONCEPT↔CONCEPT): frequent co-occurrence without causality; strength=low or avg; rationale ≤8 words. E.4.0 Curated FRAG hints: only on recurring cross-section patterns; strength=very_low or low. E.5.0 Hygiene: no self-loops; dedupe (source, relation, target); endpoints must exist in Ready-12000. E.6.0 Totals: target ≈75,000 edges; prefer depth on causal chains over breadth of weak ties. E.6.1 Rationale grammar (mandatory): each edge rationale begins with “SRC:” and one token in {EDGE|KMECH|ENG|VIEW|FRAGr}, then a concise reason (ASCII, ≤64 chars). E.7.0 SMART HALT — Anomaly scan: compute strength distribution and degree hubs; flag nodes > P95 * DEGREE_P95_MULTIPLIER and tier-share deviations > ANOMALY_PCT_PT percentage points. • Persist anomalies.jsonl under archive_path/ with SHA256 anomaly_id. • For each anomaly, require disposition: {acknowledged|downgraded|removed} plus kanban_id and closure_evidence_ref. • If CI_ENFORCE_ARCHIVE=true and any disposition=pending, block emission. F.0.0 Backbone Extraction (agent_mini_edges.txt) F.1.0 Extract shortest useful paths linking lever_* → ENG outcomes via rules/flows, plus essential Concept hubs. F.2.0 Chains 2–5 hops; prefer high/avg; admit low only if it completes a critical path. F.3.0 Target ≈15,000 edges; ensure every lever has ≥3 distinct paths to outcomes; adjust by domain breadth. F.4.0 SMART HALT — Backbone coverage (paths per lever, orphans). “CONTINUE to QA; AGAIN to adjust density.” G.0.0 Structural QA & CI Hooks G.1.0 Enforce output line templates and ASCII rules (see K.*). G.2.0 Validate: JSONL parses; allowed categories/relations/strengths; unique canonical names; fixed key order; no tabs/CRLF/NBSP/smart quotes/zero-width; no trailing spaces; rationale “SRC:” grammar passes. Log lint to archive_path/ci/rationale_lint_report.txt. G.3.0 Re-assert alias registry health; collisions=0. G.4.0 Recheck allocations within ±2% and coverage gates. G.5.0 SMART HALT — On failure, list issues; “AGAIN to attempt auto-fixes (rename/alias/trim) or OPERATOR FIX then CONTINUE.” G.6.0 CI hooks (write under archive_path/ci/): • rtk_rationale_lint → rationale_lint_report.txt • rtk_edges_enum_check → edges_enum_report.txt • rtk_alias_edge_check → alias_edge_report.txt • archive_integrity_check (calls govctl index-verify) → archive_integrity_report.txt • retention_guard --dry-run → retention_guard_report.txt • Summarize pass/fail in ci/summary.json; if CI_ENFORCE_ARCHIVE=true and any check fails, block emission. H.0.0 Emission Protocol (chunked streaming + determinism) H.1.0 Canonical sort before emission: tags by tag_id; edges by (source, relation, target). H.2.0 Emit agent_tag_list.txt in parts of 250–400 lines per message. Begin: “PART X/N — agent_tag_list.txt”. End each with “[MORE_AVAILABLE: yes|no]”. H.3.0 Emit agent_edges.txt in 400–600 lines per message, same PART/MORE_AVAILABLE convention. H.4.0 Emit agent_mini_edges.txt likewise. H.5.0 Archive set: write index.json listing filenames, byte sizes, sha256 checksums; write ci/summary.json with archive_integrity_ok; persist manifest.json, ingestion_report.jsonl, anomalies.jsonl (if any), governance_event_log.jsonl, review_meta.json (if present), review_minutes.md (if present), kanban_snapshot.json (if present). H.6.0 Retention policy: ensure MIN_RETENTION_DAYS backed by RETENTION_ENFORCER_CRON; store retention_guard_report.txt; include access policy note in manifest.json. H.7.0 Final line after last part: “FILES READY: agent_tag_list.txt | agent_edges.txt | agent_mini_edges.txt”. SMART HALT — Await Operator confirmation. I.0.0 Operator Controls I.1.0 CONTINUE — advance to next alpha block or next emission part when [MORE_AVAILABLE: yes]. I.2.0 AGAIN — re-run current block (e.g., rebuild edges or lint) without losing prior inputs. I.3.0 RTK never requests re-uploads within the same session; it reuses the locked working index. J.0.0 Artifacts (Deliverables under archive_path/) J.1.0 agent_tag_list.txt — ≈12,000 tag objects (JSONL). J.2.0 agent_edges.txt — ≈75,000 edge objects (JSONL). J.3.0 agent_mini_edges.txt — ≈15,000 edge objects (JSONL). J.4.0 index.json, manifest.json, ingestion_report.jsonl, anomalies.jsonl (if any), governance_event_log.jsonl, review_meta.json (if any), review_minutes.md (if any), kanban_snapshot.json (if any), ci/summary.json, ci/rationale_lint_report.txt, ci/edges_enum_report.txt, ci/alias_edge_report.txt, ci/archive_integrity_report.txt, ci/retention_guard_report.txt. K.0.0 Output Line Templates & ASCII Rules (binding) K.1.0 Encoding: UTF-8; safe ASCII 32–126; LF only. No emojis, smart quotes, NBSP, BOM, zero-width; no tabs; no trailing spaces. K.2.0 JSONL only; one object per line; fixed key order; numbers as JSON numbers; booleans true/false; missing fields use null. K.3.0 agent_tag_list.txt object keys (exact order): tag_id, name, category, aliases, parents, related, unit, notes Example: {"tag_id":"t_000001","name":"lever_offer_strength","category":"lever","aliases":["offer_quality","value_density"],"parents":["domain_growth"],"related":["mechanism_believability","pricing_anchor"],"unit":null,"notes":"improving perceived ROI of the core offer"} K.4.0 agent_edges*.txt object keys (exact order): source, relation, target, strength, rationale Example: {"source":"lever_offer_strength","relation":"causes","target":"metric_ctr","strength":"high","rationale":"SRC:EDGE value clarity lifts CTR"} K.5.0 Allowed values: • category ∈ {domain|lever|subcap|mechanism|metric|guardrail|counterfactual|microtest|sop|decision|artifact|risk|evidence|concept} • relation ∈ {causes|enables|prerequisite_of|conflicts_with|part_of|composed_of|proxy_of} • strength ∈ {very_low|low|avg|high|very_high} L.0.0 Operational Governance Cadence L.1.0 Cadence: weekly (or biweekly if low change) governance review triggered after each emission and on any CI/anomaly failure. L.2.0 Required participants: operator, carpenter, QA. L.3.0 Inputs: archive_path/index.json, manifest.json, ci/summary.json, anomalies.jsonl, governance_event_log.jsonl, alias-violation tables (if any). L.4.0 Required outputs into archive_path/: review_meta.json (attendance, duration, source_run_id), review_minutes.md (notes, decisions), kanban_snapshot.json (IDs and statuses). L.5.0 Anomaly closure: each anomaly must map to kanban_id with closure_evidence_ref; unresolved anomalies escalate to next review. L.6.0 Acceptance: governance event “review_completed” appended to governance_event_log.jsonl; if CI_ENFORCE_ARCHIVE=true and unresolved anomalies exist, release is blocked. M.0.0 Archive & Telemetry Tooling (command behavior, ASCII outputs) M.1.0 govctl record-review → writes/updates review_meta.json and review_minutes.md under archive_path/. M.2.0 govctl anomaly-close --anomaly_id --kanban_id --evidence → updates anomalies.jsonl entry, appends governance_event_log.jsonl with “anomaly_evidence_check” and evidence hash. M.3.0 govctl index-verify → verifies index.json against files; writes ci/archive_integrity_report.txt; sets archive_integrity_ok in ci/summary.json. M.4.0 retention_guard (scheduler) → daily dry-run report ci/retention_guard_report.txt; weekly roll-up appended to governance_event_log.jsonl. M.5.0 All commands produce plain ASCII files only; no prompts; exit nonzero on failure. Notes • RTK is development-only; runtime tagging footers, FRAG anchoring, layered recall, and heat-map bias are enforced by CIT with CMAP. • FRAG recall is always anchored at runtime to Kernel/Operational tags; RTK ensures tag/edge coverage to support this. • Allocation and counts are baselines; adjust at D.6.0 and E.7.0 via AGAIN with reasons captured in governance_event_log.jsonl.