← Blog
explainx / blog

OKF Sample Bundles: GA4 E-commerce & Bitcoin BigQuery Datasets Guide

Google OKF ships three sample bundles built from BigQuery public datasets. Hands-on guide to ga4_obfuscated_sample_ecommerce and bitcoin_blockchain— sample queries, limitations, and how OKF turns raw tables into agent knowledge.

7 min readYash Thakker
Open Knowledge FormatBigQueryGoogle AnalyticsBitcoinPublic DatasetsAI Agents

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

OKF Sample Bundles: GA4 E-commerce & Bitcoin BigQuery Datasets Guide

When Google launched Open Knowledge Format (OKF) v0.1 in June 2026, it did not ship an empty spec—it included three living sample bundles produced by a reference BigQuery enrichment agent:

OKF bundleUnderlying BigQuery public dataset
GA4 e-commercebigquery-public-data.ga4_obfuscated_sample_ecommerce
Stack OverflowStack Overflow public dataset
Bitcoinbigquery-public-data.bitcoin_blockchain

This guide goes hands-on on the two datasets you linked: GA4 e-commerce and Bitcoin in BigQuery—what they contain, how to query them, and how OKF turns raw tables into agent-readable knowledge graphs.


TL;DR

QuestionAnswer
RepoGoogleCloudPlatform/knowledge-catalog
GA4 datasetga4_obfuscated_sample_ecommerce — Nov 2020–Jan 2021
Bitcoin datasetbitcoin_blockchain — full history, updates ~every 10 min
GA4 tablesevents_* (sharded by date)
Bitcoin tablesblocks_raw, transactions
CostBigQuery Sandbox / free tier sufficient for samples
OKF valuePre-linked concept pages vs raw schema discovery

Why Public Datasets Make Good OKF Demos

BigQuery public datasets are ideal OKF teaching material because they are:

  • Real-world shape — schemas, joins, and metrics agents actually need
  • Free to query — no data ingestion required
  • Well documented — Google publishes schemas and sample SQL
  • Diverse domains — web analytics (GA4) vs immutable ledger (Bitcoin)

OKF's bet: an agent answering "How do we compute weekly active users?" or "What columns link transactions to blocks?" should read curated concept pages with cross-links—not rediscover schema from INFORMATION_SCHEMA every session. That is the Karpathy LLM Wiki compile-once pattern at organizational scale.


Bundle 1: GA4 E-commerce (Google Merchandise Store)

What it is

The Google Merchandise Store sells Google-branded merchandise. It uses GA4's standard web ecommerce implementation plus enhanced measurement.

The public BigQuery dataset ga4_obfuscated_sample_ecommerce contains a sample of obfuscated event export data for three months:

PropertyValue
Date range2020-11-01 → 2021-01-31
Project pathbigquery-public-data.ga4_obfuscated_sample_ecommerce
Primary tableevents_* (date-sharded)
DocsGA4 BigQuery sample dataset

Prerequisites

  1. Google Cloud project with BigQuery API enabled (BigQuery Quickstart)
  2. BigQuery Sandbox or free usage tier—enough for exploration
  3. Optional: billing if you exceed free limits

Limitations (read before trusting numbers)

Google is explicit about obfuscation:

  • Placeholder values: <Other>, NULL, ''
  • Internal consistency may be limited due to obfuscation
  • Not comparable to the GA4 Demo Account for Google Merchandise Store—the underlying data differs

OKF bundles document these caveats in concept pages so agents do not treat demo metrics as production truth.

Starter query: dataset overview

Open BigQuery Console, compose a new query, and run:

SELECT
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_pseudo_id) AS user_count,
  COUNT(DISTINCT event_date) AS day_count
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

BigQuery shows bytes processed before you run—useful for cost estimation. Valid queries display a check mark.

What agents learn from the OKF bundle

The GA4 OKF bundle (in knowledge-catalog) typically maps concepts like:

OKF concept typeExamples
Datasetga4_obfuscated_sample_ecommerce overview
Tableevents_* schema, sharding pattern
MetricSessions, purchases, user_pseudo_id uniqueness
Eventpurchase, add_to_cart, page_view parameters

Each file carries YAML frontmatter (type, title, description, resource, tags) and markdown cross-links—e.g., from a purchase event page to the ecommerce parameters page.

Next steps for GA4


Bundle 2: Bitcoin Blockchain

What it is

Since February 2018, Google has published the full Bitcoin blockchain to BigQuery for analytics. Allen Day and Colin Bookman announced on the Google Cloud Blog:

The Bitcoin blockchain data are now available for exploration with BigQuery. All historical data are in the bigquery-public-data:bitcoin_blockchain dataset, which updates every 10 minutes.

PropertyValue
Project pathbigquery-public-data.bitcoin_blockchain
Tablesblocks_raw, transactions
Update cadence~every 10 minutes as new blocks broadcast
Also onKaggle (BigQuery Python client in Kernels)

Why BigQuery for blockchain?

Bitcoin is an immutable distributed ledger with strong OLTP properties (atomic transactions, durability) but weak OLAP for ad-hoc reporting on money flows. BigQuery adds:

  • Real-time extraction from the ledger
  • Denormalized storage for easier exploration
  • Data Studio / Looker visualizations

OKF bundles capture this OLTP vs OLAP distinction so agents understand why the dataset exists—not just table names.

Network fundamentals queries

Google's original post highlighted network valuation analytics:

AnalysisInsight
BTC transacted per dayEconomic activity on-network
Recipient addresses per dayUser growth proxy (Metcalfe's Law)
NVT RatioNetwork Value to Transactions—valuation metric
Mining difficulty vs search volumeFundamental + attention correlation

These become metric concept pages in the OKF bundle with formulas and caveats pre-documented.

Famous transaction: Bitcoin pizza (May 17, 2010)

Laszlo Hanyecz bought two pizzas for 10,000 BTC. The transaction is permanently recorded:

  • Transaction ID: a107...d48d (full hash in blockchain)
  • From: 1XPT...rvH4To: 17Sk...xFyQ

Google visualized input transfers up to 4 degrees before the pizza purchase—red circle for Hanyecz's address, blue for others, arrow width proportional to BTC flow. The OKF bundle can link entity pages (Hanyecz, pizza purchase event) to transaction schema pages.

Anomaly detection: duplicate transactions

One transaction appears in two blocks—impossible under current rules:

#standardSQL
SELECT *
FROM (
  SELECT
    transaction_id,
    COUNT(transaction_id) AS dup_transaction_count
  FROM `bigquery-public-data.bitcoin_blockchain.transactions`
  GROUP BY transaction_id
)
WHERE dup_transaction_count > 1

Why? Early Bitcoin used BerkeleyDB (non-unique keys). After Satoshi left, the team switched to LevelDB and implemented BIP-0030 to prevent duplicate transaction IDs. Legacy duplicate entries remain in historical data.

This is exactly the kind of contradiction / anomaly an OKF lint pass or concept page flags—agents get context without rediscovering Bitcoin history each query.

What agents learn from the OKF bundle

OKF concept typeExamples
Datasetbitcoin_blockchain overview
Tabletransactions, blocks_raw columns
MetricNVT Ratio, daily transaction volume
EntityNotable addresses, pizza purchase
AnomalyPre-BIP-0030 duplicate transaction_ids

Bundle 3: Stack Overflow (brief)

The third OKF sample maps the Stack Overflow public dataset on BigQuery—questions, answers, tags, votes. Same pattern: enrichment agent drafts concept docs for tables and common join paths (e.g., questions → answers → users). Browse the bundle in knowledge-catalog for the full file tree.


Raw BigQuery → OKF Bundle: The Pipeline

Google's reference BigQuery enrichment agent (in the repo) automates:

BigQuery dataset
    ↓  (walk tables/views)
Draft OKF concept .md per table/metric
    ↓  (second LLM pass)
Enrich with citations, joins, descriptions from authoritative docs
    ↓
Commit conformant OKF bundle → git
    ↓
Visualize with static HTML graph viewer
    ↓
Ingest into Cloud Knowledge Catalog for GCP agents

You can replicate this for your own datasets—the samples are templates, not products.

Browse a bundle without BigQuery

  1. Clone knowledge-catalog
  2. Open the GA4, Bitcoin, or Stack Overflow bundle directory
  3. Start at index.md — progressive disclosure catalog
  4. Run the HTML visualizer — single self-contained file, no backend

For Claude Code or Cursor: point CLAUDE.md at the bundle path—read index.md before analytics tasks on this dataset.


Sample Queries Cheat Sheet

GA4: events by name

SELECT
  event_name,
  COUNT(*) AS event_count
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
GROUP BY event_name
ORDER BY event_count DESC
LIMIT 20

GA4: purchase events only

SELECT *
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE event_name = 'purchase'
LIMIT 100

Bitcoin: daily transaction count

SELECT
  DATE(block_timestamp) AS day,
  COUNT(*) AS tx_count
FROM `bigquery-public-data.bitcoin_blockchain.transactions`
GROUP BY day
ORDER BY day DESC
LIMIT 30

Bitcoin: largest outputs (recent)

SELECT
  transaction_id,
  output_value,
  block_timestamp
FROM `bigquery-public-data.bitcoin_blockchain.transactions`
ORDER BY output_value DESC
LIMIT 10

OKF vs Raw SQL for Agents

ApproachAgent behaviorTradeoff
Raw BigQueryAgent queries INFORMATION_SCHEMA, guesses joinsFlexible, slow, error-prone
RAG on docsRetrieves doc chunks per questionNo cross-link maintenance
OKF bundleReads pre-compiled concept graphUpfront enrichment cost; reliable at moderate scale

For personal wikis under ~100K tokens, context often beats RAG. For org-wide data catalogs with hundreds of tables, OKF + Knowledge Catalog hybrid search scales further.


Getting Started Checklist

  1. Read OKF specokf/SPEC.md in the repo
  2. Run GA4 starter querydeveloper docs
  3. Run Bitcoin duplicate-tx queryCloud Blog post
  4. Browse OKF bundles — compare markdown concept pages to raw tables
  5. Open HTML visualizer — see the knowledge graph
  6. Point your agent — add bundle path to CLAUDE.md or project instructions

Summary

Google's OKF sample bundles are not synthetic toys—they map real BigQuery public datasets agents will encounter in the wild:

  • GA4 e-commerce — obfuscated Google Merchandise Store events (Nov 2020–Jan 2021), ideal for learning ecommerce event schemas
  • Bitcoin blockchain — full ledger history since 2018, ideal for OLAP-on-ledger patterns and anomaly documentation

Query the raw data to build intuition. Read the OKF bundles to see how compile-once knowledge beats per-query schema discovery. Then apply the same enrichment pipeline to your own BigQuery project.


Related Reading

Dataset details cited from Google Analytics developer documentation and Google Cloud Blog: Bitcoin in BigQuery as of June 14, 2026.

Related posts