AI Dataset Hub Setup (Step-by-Step)

Real SEO™ Life — AI Dataset Hub Setup (Step-by-Step)

Goal: Make /ai/ a crawlable, AI-readable hub using Dataset, DataCatalog, and an LLM awareness file, then link it from the homepage JSON-LD.

What Each File Does

  • digital-karma-dataset.json — The scoring framework (signals, metrics) for Digital Karma.
  • entities-relationships-dataset.json — Canonical entities and how they relate across the network.
  • offers-services-dataset.json — Your offers, tiers, and bundles in machine-readable form.
  • catalog.json — The index that lists and describes every dataset in /ai/.
  • llm.json — The “AI awareness” file that points LLM crawlers to the catalog and key datasets.

Pro tip: When you add a new dataset, update both the file itself (isPartOf → catalog.json) and append it to catalog.json → dataset.


Folder & File Map

/home/webserver005/public_html/realseolife.com/ai/
  ├─ digital-karma-dataset.json
  ├─ entities-relationships-dataset.json
  ├─ offers-services-dataset.json
  ├─ catalog.json
  └─ llm.json
Public URLs:

1) Create the three Dataset files

Each Dataset includes required fields (name, description, distribution.contentUrl) and links back to the catalog.

Digital Karma Dataset

{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json",
  "name": "Digital Karma Dataset",
  "description": "Structured dataset describing the Digital Karma™ scoring framework...",
  "creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
  "distribution": [{ "@type": "DataDownload", "encodingFormat": "application/json",
    "contentUrl": "https://www.realseolife.com/ai/digital-karma-dataset.json" }],
  "keywords": ["Digital Karma","SEO Dataset","AI Marketing","Real SEO"],
  "dateModified": "2025-11-02"
}

Entities & Relationships Dataset

{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json",
  "name": "Entities & Relationships Dataset",
  "description": "Dataset mapping canonical entities, attributes, and relationships...",
  "creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
  "distribution": [{ "@type": "DataDownload", "encodingFormat": "application/json",
    "contentUrl": "https://www.realseolife.com/ai/entities-relationships-dataset.json" }],
  "dateModified": "2025-11-02"
}

Offers & Services Dataset

{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "@id": "https://www.realseolife.com/ai/offers-services-dataset.json",
  "name": "Offers & Services Dataset",
  "description": "Dataset outlining Real SEO™ Life’s offers, bundles, and service structures...",
  "creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
  "distribution": [{ "@type": "DataDownload", "encodingFormat": "application/json",
    "contentUrl": "https://www.realseolife.com/ai/offers-services-dataset.json" }],
  "dateModified": "2025-11-02"
}

2) Create the master DataCatalog

/ai/catalog.json lists the datasets and acts as the source of truth.

{
  "@context": "https://schema.org",
  "@type": "DataCatalog",
  "@id": "https://www.realseolife.com/ai/catalog.json",
  "name": "Real SEO™ Life Data Catalog",
  "description": "Central registry listing all machine-readable datasets...",
  "creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "dateModified": "2025-11-02",
  "dataset": [
    { "@type": "Dataset", "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json",
      "name": "Digital Karma Dataset" },
    { "@type": "Dataset", "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json",
      "name": "Entities & Relationships Dataset" },
    { "@type": "Dataset", "@id": "https://www.realseolife.com/ai/offers-services-dataset.json",
      "name": "Offers & Services Dataset" }
  ]
}

3) Add the LLM Awareness file

/ai/llm.json is a high-level “AI sitemap” that points to the catalog and key datasets.

{
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  "@id": "https://www.realseolife.com/ai/llm.json",
  "name": "Real SEO™ Life — AI Awareness Map",
  "description": "High-level map of datasets, catalog, and the Digital Karma framework.",
  "creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isBasedOn": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
  "hasPart": [
    { "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json" },
    { "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json" },
    { "@id": "https://www.realseolife.com/ai/offers-services-dataset.json" }
  ],
  "dateModified": "2025-11-02"
}

4) Reference from the Homepage (Helix SP PageBuilder → Custom HTML in <head>)

Add this block alongside your existing JSON-LD so crawlers discover the local AI hub immediately:

{
  "@context": "https://schema.org",
  "@type": "CreativeWork",
  "@id": "https://www.realseolife.com/ai/llm.json",
  "name": "Real SEO™ Life AI Awareness Map",
  "description": "Entry point for Real SEO™ Life’s AI datasets and catalog.",
  "isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
  "hasPart": [
    { "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json" },
    { "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json" },
    { "@id": "https://www.realseolife.com/ai/offers-services-dataset.json" }
  ],
  "creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "dateModified": "2025-11-02"
}

Optional federation link: If you also want to acknowledge your parent hub at digitalkarmaweb.com, include a second block referencing its catalog (already present on your site) or add:

{
  "@context": "https://schema.org",
  "@type": "DataCatalog",
  "@id": "https://digitalkarmaweb.com/ai/catalog.json#v01",
  "name": "Digital Karma Web"
}

5) Verify Everything (CLI)

Run these headers-only checks (expect HTTP/1.1 200 OK):

curl -I https://www.realseolife.com/ai/digital-karma-dataset.json
curl -I https://www.realseolife.com/ai/entities-relationships-dataset.json
curl -I https://www.realseolife.com/ai/offers-services-dataset.json
curl -I https://www.realseolife.com/ai/catalog.json
curl -I https://www.realseolife.com/ai/llm.json

6) Validate in Google

  1. Open Google Rich Results Test and paste your homepage URL.
  2. Confirm that three Dataset items, one DataCatalog, and the CreativeWork (llm.json) are detected.
  3. In Search Console → Structured Data issues → Validate Fix.

Troubleshooting Quick Wins

  • 404 on a dataset: filename mismatch—fix the distribution.contentUrl or the file name.
  • “Missing description/creator/license”: add those fields to each Dataset (see examples above).
  • Wrong origin: prefer canonical https://www.realseolife.com/ (www + https).
  • Plugin collisions: If a plugin injects extra JSON-LD, leave it if it’s valid; avoid duplicate @id values.
  • Cache/CDN: purge site cache after updates.