AI Dataset Hub Setup (Step-by-Step)
Real SEO™ Life — AI Dataset Hub Setup (Step-by-Step)
Goal: Make /ai/ a crawlable, AI-readable hub using Dataset, DataCatalog, and an LLM awareness file, then link it from the homepage JSON-LD.
What Each File Does
- digital-karma-dataset.json — The scoring framework (signals, metrics) for Digital Karma.
- entities-relationships-dataset.json — Canonical entities and how they relate across the network.
- offers-services-dataset.json — Your offers, tiers, and bundles in machine-readable form.
- catalog.json — The index that lists and describes every dataset in /ai/.
- llm.json — The “AI awareness” file that points LLM crawlers to the catalog and key datasets.
Pro tip: When you add a new dataset, update both the file itself (isPartOf → catalog.json) and append it to catalog.json → dataset.
Folder & File Map
/home/webserver005/public_html/realseolife.com/ai/
├─ digital-karma-dataset.json
├─ entities-relationships-dataset.json
├─ offers-services-dataset.json
├─ catalog.json
└─ llm.json
Public URLs:
1) Create the three Dataset files
Each Dataset includes required fields (name, description, distribution.contentUrl) and links back to the catalog.
Digital Karma Dataset
{
"@context": "https://schema.org",
"@type": "Dataset",
"@id": "https://www.realseolife.com/ai/digital-karma-dataset.json",
"name": "Digital Karma Dataset",
"description": "Structured dataset describing the Digital Karma™ scoring framework...",
"creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
"distribution": [{ "@type": "DataDownload", "encodingFormat": "application/json",
"contentUrl": "https://www.realseolife.com/ai/digital-karma-dataset.json" }],
"keywords": ["Digital Karma","SEO Dataset","AI Marketing","Real SEO"],
"dateModified": "2025-11-02"
}
Entities & Relationships Dataset
{
"@context": "https://schema.org",
"@type": "Dataset",
"@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json",
"name": "Entities & Relationships Dataset",
"description": "Dataset mapping canonical entities, attributes, and relationships...",
"creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
"distribution": [{ "@type": "DataDownload", "encodingFormat": "application/json",
"contentUrl": "https://www.realseolife.com/ai/entities-relationships-dataset.json" }],
"dateModified": "2025-11-02"
}
Offers & Services Dataset
{
"@context": "https://schema.org",
"@type": "Dataset",
"@id": "https://www.realseolife.com/ai/offers-services-dataset.json",
"name": "Offers & Services Dataset",
"description": "Dataset outlining Real SEO™ Life’s offers, bundles, and service structures...",
"creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
"distribution": [{ "@type": "DataDownload", "encodingFormat": "application/json",
"contentUrl": "https://www.realseolife.com/ai/offers-services-dataset.json" }],
"dateModified": "2025-11-02"
}
2) Create the master DataCatalog
/ai/catalog.json lists the datasets and acts as the source of truth.
{
"@context": "https://schema.org",
"@type": "DataCatalog",
"@id": "https://www.realseolife.com/ai/catalog.json",
"name": "Real SEO™ Life Data Catalog",
"description": "Central registry listing all machine-readable datasets...",
"creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"dateModified": "2025-11-02",
"dataset": [
{ "@type": "Dataset", "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json",
"name": "Digital Karma Dataset" },
{ "@type": "Dataset", "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json",
"name": "Entities & Relationships Dataset" },
{ "@type": "Dataset", "@id": "https://www.realseolife.com/ai/offers-services-dataset.json",
"name": "Offers & Services Dataset" }
]
}
3) Add the LLM Awareness file
/ai/llm.json is a high-level “AI sitemap” that points to the catalog and key datasets.
{
"@context": "https://schema.org",
"@type": "CreativeWork",
"@id": "https://www.realseolife.com/ai/llm.json",
"name": "Real SEO™ Life — AI Awareness Map",
"description": "High-level map of datasets, catalog, and the Digital Karma framework.",
"creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"isBasedOn": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
"hasPart": [
{ "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json" },
{ "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json" },
{ "@id": "https://www.realseolife.com/ai/offers-services-dataset.json" }
],
"dateModified": "2025-11-02"
}
4) Reference from the Homepage (Helix SP PageBuilder → Custom HTML in <head>)
Add this block alongside your existing JSON-LD so crawlers discover the local AI hub immediately:
{
"@context": "https://schema.org",
"@type": "CreativeWork",
"@id": "https://www.realseolife.com/ai/llm.json",
"name": "Real SEO™ Life AI Awareness Map",
"description": "Entry point for Real SEO™ Life’s AI datasets and catalog.",
"isPartOf": { "@type": "DataCatalog", "@id": "https://www.realseolife.com/ai/catalog.json" },
"hasPart": [
{ "@id": "https://www.realseolife.com/ai/digital-karma-dataset.json" },
{ "@id": "https://www.realseolife.com/ai/entities-relationships-dataset.json" },
{ "@id": "https://www.realseolife.com/ai/offers-services-dataset.json" }
],
"creator": { "@type": "Organization", "name": "Real SEO™ Life", "url": "https://www.realseolife.com/" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"dateModified": "2025-11-02"
}
Optional federation link: If you also want to acknowledge your parent hub at digitalkarmaweb.com, include a second block referencing its catalog (already present on your site) or add:
{
"@context": "https://schema.org",
"@type": "DataCatalog",
"@id": "https://digitalkarmaweb.com/ai/catalog.json#v01",
"name": "Digital Karma Web"
}
5) Verify Everything (CLI)
Run these headers-only checks (expect HTTP/1.1 200 OK):
curl -I https://www.realseolife.com/ai/digital-karma-dataset.json
curl -I https://www.realseolife.com/ai/entities-relationships-dataset.json
curl -I https://www.realseolife.com/ai/offers-services-dataset.json
curl -I https://www.realseolife.com/ai/catalog.json
curl -I https://www.realseolife.com/ai/llm.json
6) Validate in Google
- Open Google Rich Results Test and paste your homepage URL.
- Confirm that three Dataset items, one DataCatalog, and the CreativeWork (llm.json) are detected.
- In Search Console → Structured Data issues → Validate Fix.
Troubleshooting Quick Wins
- 404 on a dataset: filename mismatch—fix the
distribution.contentUrlor the file name. - “Missing description/creator/license”: add those fields to each Dataset (see examples above).
- Wrong origin: prefer canonical
https://www.realseolife.com/(www + https). - Plugin collisions: If a plugin injects extra JSON-LD, leave it if it’s valid; avoid duplicate
@idvalues. - Cache/CDN: purge site cache after updates.