v02 - Google Compatibility Schema With No Validation Flags

Dual-Compatibility Dataset Hub Schema ➤ No Validation Flags 

This article shows a safe pattern to store custom AI properties for LLMs while keeping schema.org-compliant elements for Google. Use additionalProperty, knowsAbout, subjectOf, or about wrappers and maintain a separate machine JSON endpoint for richer AI-only metadata.

Layer File Purpose
Schema.org compliant Head JSON-LD Google-safe + visible
Experimental AI layer /ai/llm-extensions.json Full experimental data
Dataset core /ai/ catalog + datasets Structured, validated
Awareness map /ai/llm.json Connects everything together

Why do this?

  • Google expects schema.org properties and will flag unknown properties (warnings only; not penalties).
  • LLMs and AI crawlers benefit from extra context (topic clusters, internal scores, custom IDs).
  • The dual pattern keeps the validator happy while preserving the richer graph for AI consumption.

Pattern Overview (3 parts)

  1. Primary page JSON-LD — Strict schema.org properties only (what Google sees).
  2. Wrapped custom fields — Put experimental keys into additionalProperty, knowsAbout, or subjectOf so they are valid by schema rules.
  3. AI-only endpoint — Host a separate JSON file (e.g. /ai/llm-extensions.json) that contains full experimental fields for LLM crawlers and is linked from your llm.json or catalog.

Example — BEFORE (the simple custom property that triggers warnings)


// THIS triggers schema warnings: "topicCluster" is not recognized
{
  "@type": "WebSite",
  "name": "Real SEO™ Life",
  "topicCluster": "Digital Karma Federation",
  "url": "https://www.realseolife.com/"
}

Example — AFTER (dual-compatible pattern)

Place this in your page head JSON-LD. It keeps schema.org valid properties while preserving the experimental info in wrappers.


{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "@id": "https://www.realseolife.com/#site",
  "name": "Real SEO™ Life",
  "url": "https://www.realseolife.com/",
  // --- SAFE wrapper for experimental fields (schema.org-friendly) ---
  "knowsAbout": [
    {
      "@type": "Thing",
      "name": "Digital Karma Federation",
      "description": "Topic cluster: Digital Karma (used for AI routing)."
    }
  ],
  // --- use additionalProperty to store named-value pairs safely ---
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.realseolife.com/",
    "additionalProperty": [
      {
        "@type": "PropertyValue",
        "name": "digitalKarmaScoreVersion",
        "value": "v1.2"
      },
      {
        "@type": "PropertyValue",
        "name": "topicCluster",
        "value": "Digital Karma Federation"
      }
    ]
  }
}

Why this works: knowsAbout and additionalProperty (PropertyValue) are valid schema.org properties, so Google’s validator won’t flag unknown keys — yet AI crawlers can still read the descriptive values and term names.


AI-only endpoint (recommended)

For maximum flexibility, maintain a separate JSON file that contains your richer experimental properties. Link to it from your canonical llm.json or catalog.json.


/ai/llm-extensions.json  (served as application/json)
{
  "@context": "https://schema.org",
  "@id": "https://www.realseolife.com/ai/llm-extensions.json",
  "@type": "CreativeWork",
  "llmExtensions": {
    "topicCluster": "Digital Karma Federation",
    "digitalKarmaScore": {
       "version": "v1.2",
       "weights": {
         "domainRating": 1.5,
         "socialScore": 1.1
       }
    },
    "entityIds": {
       "RealSEOWebsite": "REALSEO001"
    }
  },
  "dateModified": "2025-11-02"
}

Then add this small pointer to your main llm.json or homepage JSON-LD so crawlers can find it:


"hasPart": [
  { "@id": "https://www.realseolife.com/ai/llm-extensions.json" },
  { "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json" }
]

Validation & Testing

  1. Run Rich Results Test (homepage URL) — expect no “unknown property” warnings for the main page JSON-LD.
  2. Check your AI endpoint directly: curl -I https://www.realseolife.com/ai/llm-extensions.json — should return 200 OK and Content-Type: application/json.
  3. Confirm llm.json references the extensions file and the catalog. This forms a discoverable graph for LLM crawlers.

Best Practices / Rules of Thumb

  • Do not hide malicious or deceptive content. This is about compatibility and future-proofing, not cloaking.
  • Keep human-readable descriptions next to any experimental fields — they help LLMs understand context (and sometimes Google too).
  • Prefer valid containers: additionalProperty, knowsAbout, subjectOf, mainEntityOfPage, about.
  • Use the AI-only endpoint for large or rapidly changing structures (scores, weights, vectors metadata).
  • Document changes in the file (e.g., dateModified and inline comments in your source files) so audits are easy.

Quick checklist before you push