cb-analytics-schemalisted
Install: claude install-skill celticht32/Couchbase-Skills-for-Claude.ai
# Schema introspection
Three tools cover dataset discovery:
- `list_dataverses(cluster)` — every dataverse in metadata
- `list_datasets(dataverse, cluster)` — datasets, optionally scoped
- `infer_schema(dataset, sample_size, cluster)` — sample N docs, summarise
observed top-level fields
## Inferring a useful schema
`infer_schema` reads up to `sample_size` documents (default 100) and
returns:
```json
{
"dataset": "Default.Users",
"rows_sampled": 100,
"fields": {
"id": {"present_count": 100, "presence_pct": 100.0, "types": ["str"]},
"name": {"present_count": 100, "presence_pct": 100.0, "types": ["str"]},
"age": {"present_count": 87, "presence_pct": 87.0, "types": ["int"]},
"addresses": {"present_count": 62, "presence_pct": 62.0, "types": ["list"]}
}
}
```
Notes:
- The sample is **unordered**; don't infer cardinality or ordering from it.
- A field with `presence_pct < 100` is optional in the dataset.
- Multiple entries in `types` mean the dataset is heterogeneous — flag this
to the user.
## Safety
The dataset name is interpolated into a SQL++ FROM clause because SQL++
doesn't support parameterised identifiers. The server validates the name
with a strict regex first; you don't need to worry about escaping. Names
like `Default.\`my dataset\`.sub` (backtick-quoted) are accepted.
## Building a data dictionary
A typical workflow:
1. `list_dataverses` → choose one
2. `list_datasets(dataverse="X")` → enumerate datasets
3