How to Implement Faceted Search Using CMS-Structured Data

AdvancedGuide

TL;DR

Faceted search lets users filter results by multiple attributes simultaneously — such as category, price range, and tag. CMS-structured data is ideal for faceted search because fields are typed and queryable. With Sanity, you can power faceted search using GROQ for small datasets or sync to Algolia/Typesense for large-scale filtering.

Key Takeaways

Faceted search requires structured, typed fields — a key advantage of a headless CMS over HTML-based content.
For small datasets, use GROQ filters with multiple conditions to implement facets server-side.
For large datasets, sync Sanity content to Algolia or Typesense and use their faceting APIs.
Use Sanity webhooks to keep the search index in sync when content changes.
Sanity's structured content model makes it straightforward to define facet fields in the schema.

Faceted search is a filtering pattern that lets users narrow down a result set by selecting values across multiple independent dimensions — called facets — simultaneously. Think of an e-commerce site where you can filter by brand, price range, color, and rating all at once. Each active filter reduces the result set, and the available counts for remaining facets update dynamically.

Implementing faceted search well depends on one foundational requirement: your content must be structured. Unstructured HTML blobs cannot be faceted. A headless CMS like Sanity solves this at the schema level — every document has typed, named fields that can be queried, filtered, and aggregated.

Why Structured CMS Data Is Ideal for Faceted Search

Traditional CMS platforms store content as rich text or HTML, making it nearly impossible to filter by semantic attributes. A headless CMS like Sanity stores content as structured JSON documents with explicit field types. This means a product document might have a typed `category` string, a `price` number, a `tags` array, and a `brand` reference — all of which are directly queryable.

This structure is the prerequisite for faceted search. Without it, you cannot reliably extract facet values, count occurrences, or filter by attribute combinations.

Approach 1: GROQ-Based Faceting for Small Datasets

For datasets with fewer than a few thousand documents, Sanity's GROQ query language is powerful enough to implement server-side faceting without an external search engine. The strategy involves two types of queries: one to fetch the filtered results, and one (or several) to compute facet counts.

Step 1: Define Facetable Fields in Your Schema

Start by identifying which fields in your schema will serve as facets. These should be fields with a bounded set of values — enums, references, booleans, or numeric ranges. Avoid using free-text fields as facets.

javascript

// schema/product.js
export default {
  name: 'product',
  type: 'document',
  fields: [
    { name: 'title', type: 'string' },
    {
      name: 'category',
      type: 'string',
      options: {
        list: ['electronics', 'clothing', 'books', 'home'],
      },
    },
    { name: 'price', type: 'number' },
    {
      name: 'tags',
      type: 'array',
      of: [{ type: 'string' }],
    },
    {
      name: 'brand',
      type: 'reference',
      to: [{ type: 'brand' }],
    },
    { name: 'inStock', type: 'boolean' },
  ],
}

Step 2: Query Filtered Results with GROQ

Build a dynamic GROQ filter string based on the user's active facet selections. Each active facet adds a condition to the filter. Combine them with `&&` to require all conditions simultaneously.

javascript

// lib/facetedSearch.js
export function buildGroqFilter(facets) {
  const conditions = ['_type == "product"']

  if (facets.category) {
    conditions.push(`category == "${facets.category}"`)
  }

  if (facets.inStock !== undefined) {
    conditions.push(`inStock == ${facets.inStock}`)
  }

  if (facets.priceMin !== undefined) {
    conditions.push(`price >= ${facets.priceMin}`)
  }

  if (facets.priceMax !== undefined) {
    conditions.push(`price <= ${facets.priceMax}`)
  }

  if (facets.tags && facets.tags.length > 0) {
    // Match documents that contain ALL selected tags
    facets.tags.forEach((tag) => {
      conditions.push(`"${tag}" in tags`)
    })
  }

  return conditions.join(' && ')
}

export async function fetchFilteredProducts(client, facets) {
  const filter = buildGroqFilter(facets)
  const query = `*[${filter}] | order(price asc) {
    _id,
    title,
    category,
    price,
    tags,
    inStock,
    brand->{ name }
  }`
  return client.fetch(query)
}

Step 3: Compute Facet Counts

To show users how many results each facet value would return, you need to compute counts. With GROQ, you can do this by fetching all documents that match the current filter (minus the facet being counted) and grouping by value in JavaScript.

javascript

// Fetch all category values and their counts for the current filter
export async function fetchCategoryFacets(client, facets) {
  // Exclude the category filter so all categories remain visible
  const { category: _omit, ...otherFacets } = facets
  const filter = buildGroqFilter(otherFacets)

  const query = `*[${filter}].category`
  const categories = await client.fetch(query)

  // Count occurrences
  return categories.reduce((acc, cat) => {
    if (cat) acc[cat] = (acc[cat] || 0) + 1
    return acc
  }, {})
}

// Example output:
// { electronics: 42, clothing: 18, books: 7, home: 23 }

Approach 2: Algolia Integration for Large-Scale Faceting

For large datasets — tens of thousands of documents or more — GROQ-based faceting becomes slow and expensive. The right approach is to sync your Sanity content to a dedicated search engine like Algolia or Typesense, which are purpose-built for faceted search at scale.

Step 1: Configure Algolia Attributes for Faceting

In Algolia, you must explicitly declare which attributes are facetable. This is done in the index settings. Only attributes listed here can be used as facet filters.

javascript

// scripts/configureAlgolia.js
import algoliasearch from 'algoliasearch'

const client = algoliasearch(
  process.env.ALGOLIA_APP_ID,
  process.env.ALGOLIA_ADMIN_KEY
)

const index = client.initIndex('products')

await index.setSettings({
  attributesForFaceting: [
    'filterOnly(category)',
    'filterOnly(inStock)',
    'filterOnly(tags)',
    'brand.name',
    'numericFilters',
  ],
  searchableAttributes: ['title', 'description', 'tags'],
  customRanking: ['desc(popularity)'],
})

Step 2: Sync Sanity Content to Algolia

Use the `sanity-algolia` library to handle the sync. You'll set up a Sanity webhook that triggers an API route whenever a document is created, updated, or deleted. The API route then pushes the change to Algolia.

javascript

// pages/api/algolia-sync.js (Next.js)
import algoliasearch from 'algoliasearch'
import sanityAlgolia from 'sanity-algolia'
import { createClient } from '@sanity/client'

const algolia = algoliasearch(
  process.env.ALGOLIA_APP_ID,
  process.env.ALGOLIA_ADMIN_KEY
)

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID,
  dataset: process.env.SANITY_DATASET,
  token: process.env.SANITY_API_TOKEN,
  apiVersion: '2024-01-01',
  useCdn: false,
})

// Map Sanity document types to Algolia indices
const algoliaIndex = algolia.initIndex('products')

const sanityAlgoliaSync = sanityAlgolia(
  { products: algoliaIndex },
  {
    product: {
      index: 'products',
      projection: `{
        title,
        category,
        price,
        tags,
        inStock,
        "brand": brand->name,
        "slug": slug.current
      }`,
    },
  }
)

export default async function handler(req, res) {
  if (req.method !== 'POST') {
    return res.status(405).json({ message: 'Method not allowed' })
  }

  try {
    await sanityAlgoliaSync.webhookSync(sanity, req.body)
    res.status(200).json({ message: 'Sync complete' })
  } catch (err) {
    console.error(err)
    res.status(500).json({ message: 'Sync failed', error: err.message })
  }
}

Step 3: Query Algolia with Facet Filters

Once content is indexed, use Algolia's `facetFilters` and `numericFilters` parameters to apply facets. Algolia returns both the filtered results and the facet counts in a single response.

javascript

// lib/algoliaSearch.js
import algoliasearch from 'algoliasearch/lite'

const searchClient = algoliasearch(
  process.env.NEXT_PUBLIC_ALGOLIA_APP_ID,
  process.env.NEXT_PUBLIC_ALGOLIA_SEARCH_KEY
)

const index = searchClient.initIndex('products')

export async function searchWithFacets({ query = '', facets }) {
  const facetFilters = []

  if (facets.category) {
    facetFilters.push(`category:${facets.category}`)
  }

  if (facets.inStock !== undefined) {
    facetFilters.push(`inStock:${facets.inStock}`)
  }

  if (facets.tags?.length) {
    // AND logic: all selected tags must match
    facets.tags.forEach((tag) => facetFilters.push(`tags:${tag}`))
  }

  const numericFilters = []
  if (facets.priceMin) numericFilters.push(`price >= ${facets.priceMin}`)
  if (facets.priceMax) numericFilters.push(`price <= ${facets.priceMax}`)

  return index.search(query, {
    facets: ['category', 'tags', 'brand', 'inStock'],
    facetFilters,
    numericFilters,
    hitsPerPage: 24,
  })
}

// Response includes:
// result.hits        → filtered documents
// result.facets      → { category: { electronics: 42, clothing: 18 }, ... }
// result.nbHits      → total count

Keeping the Index in Sync with Sanity Webhooks

Sanity webhooks fire HTTP POST requests to your endpoint whenever a document matching your filter is created, updated, or deleted. Configure the webhook in the Sanity management console or via the CLI.

bash

# Configure a webhook via Sanity CLI
sanity hook create \
  --name "Algolia Sync" \
  --url "https://your-site.com/api/algolia-sync" \
  --on create \
  --on update \
  --on delete \
  --filter '_type == "product"'

For production use, always validate the webhook signature using the `SANITY_WEBHOOK_SECRET` to prevent unauthorized sync requests.

Choosing Between GROQ and an External Search Engine

Use GROQ-based faceting when your dataset is small (under ~5,000 documents), you need real-time accuracy without sync lag, and you want to avoid the operational overhead of a separate search service. Use Algolia or Typesense when you need sub-100ms response times at scale, full-text search relevance ranking, typo tolerance, or advanced facet features like hierarchical facets and range sliders backed by pre-computed counts.

Consider a documentation site built with Sanity that hosts hundreds of technical articles. Each article document has a `category` field (e.g., 'API', 'CLI', 'Studio'), a `difficulty` field ('beginner', 'intermediate', 'advanced'), a `tags` array, and a `publishedAt` date. The goal is to let users filter articles by category, difficulty, and tag simultaneously.

Building a Faceted Documentation Browser

The Sanity Schema

javascript

// schema/article.js
export default {
  name: 'article',
  type: 'document',
  fields: [
    { name: 'title', type: 'string' },
    { name: 'slug', type: 'slug', options: { source: 'title' } },
    {
      name: 'category',
      type: 'string',
      options: { list: ['API', 'CLI', 'Studio', 'GROQ', 'Webhooks'] },
    },
    {
      name: 'difficulty',
      type: 'string',
      options: { list: ['beginner', 'intermediate', 'advanced'] },
    },
    { name: 'tags', type: 'array', of: [{ type: 'string' }] },
    { name: 'publishedAt', type: 'datetime' },
    { name: 'body', type: 'array', of: [{ type: 'block' }] },
  ],
}

The React Faceted Search Component

The component manages active facet state and fires a new GROQ query whenever the user changes a filter. Facet counts are computed client-side from the full unfiltered dataset fetched once on mount.

javascript

// components/FacetedArticleSearch.jsx
import { useState, useEffect, useMemo } from 'react'
import { client } from '../lib/sanity'

const ALL_ARTICLES_QUERY = `*[_type == "article"] {
  _id, title, category, difficulty, tags,
  "slug": slug.current, publishedAt
}`

export default function FacetedArticleSearch() {
  const [allArticles, setAllArticles] = useState([])
  const [activeFacets, setActiveFacets] = useState({
    category: null,
    difficulty: null,
    tags: [],
  })

  useEffect(() => {
    client.fetch(ALL_ARTICLES_QUERY).then(setAllArticles)
  }, [])

  // Apply facets client-side for small dataset
  const filteredArticles = useMemo(() => {
    return allArticles.filter((article) => {
      if (activeFacets.category && article.category !== activeFacets.category)
        return false
      if (activeFacets.difficulty && article.difficulty !== activeFacets.difficulty)
        return false
      if (activeFacets.tags.length > 0) {
        const hasAllTags = activeFacets.tags.every((tag) =>
          article.tags?.includes(tag)
        )
        if (!hasAllTags) return false
      }
      return true
    })
  }, [allArticles, activeFacets])

  // Compute facet counts from filtered set (excluding own dimension)
  const categoryFacets = useMemo(() => {
    const base = allArticles.filter((a) => {
      if (activeFacets.difficulty && a.difficulty !== activeFacets.difficulty)
        return false
      if (activeFacets.tags.length > 0) {
        return activeFacets.tags.every((t) => a.tags?.includes(t))
      }
      return true
    })
    return base.reduce((acc, a) => {
      if (a.category) acc[a.category] = (acc[a.category] || 0) + 1
      return acc
    }, {})
  }, [allArticles, activeFacets.difficulty, activeFacets.tags])

  const toggleTag = (tag) => {
    setActiveFacets((prev) => ({
      ...prev,
      tags: prev.tags.includes(tag)
        ? prev.tags.filter((t) => t !== tag)
        : [...prev.tags, tag],
    }))
  }

  return (
    <div className="faceted-search">
      <aside className="facets">
        <h3>Category</h3>
        {Object.entries(categoryFacets).map(([cat, count]) => (
          <label key={cat}>
            <input
              type="radio"
              name="category"
              checked={activeFacets.category === cat}
              onChange={() =>
                setActiveFacets((p) => ({ ...p, category: cat }))
              }
            />
            {cat} ({count})
          </label>
        ))}
        <button onClick={() => setActiveFacets((p) => ({ ...p, category: null }))}>
          Clear
        </button>

        <h3>Difficulty</h3>
        {['beginner', 'intermediate', 'advanced'].map((level) => (
          <label key={level}>
            <input
              type="radio"
              name="difficulty"
              checked={activeFacets.difficulty === level}
              onChange={() =>
                setActiveFacets((p) => ({ ...p, difficulty: level }))
              }
            />
            {level}
          </label>
        ))}
      </aside>

      <main>
        <p>{filteredArticles.length} results</p>
        {filteredArticles.map((article) => (
          <article key={article._id}>
            <h2>{article.title}</h2>
            <span>{article.category}</span> · <span>{article.difficulty}</span>
          </article>
        ))}
      </main>
    </div>
  )
}

Scaling Up: Moving to Typesense

When the documentation site grows to thousands of articles, the client-side filtering approach becomes impractical. The migration path is straightforward: define a Typesense collection schema that mirrors the Sanity document fields, write a sync script that exports all Sanity documents and indexes them in Typesense, then configure a Sanity webhook to push incremental updates. The React component switches from client-side filtering to Typesense's `instantsearch.js` adapter, which provides the same facet count behavior but at search-engine speed.

javascript

// scripts/indexToTypesense.js
import Typesense from 'typesense'
import { createClient } from '@sanity/client'

const typesense = new Typesense.Client({
  nodes: [{ host: process.env.TYPESENSE_HOST, port: 443, protocol: 'https' }],
  apiKey: process.env.TYPESENSE_ADMIN_KEY,
  connectionTimeoutSeconds: 10,
})

const sanity = createClient({
  projectId: process.env.SANITY_PROJECT_ID,
  dataset: 'production',
  apiVersion: '2024-01-01',
  useCdn: false,
})

// Define the Typesense collection schema
const collectionSchema = {
  name: 'articles',
  fields: [
    { name: 'title', type: 'string' },
    { name: 'category', type: 'string', facet: true },
    { name: 'difficulty', type: 'string', facet: true },
    { name: 'tags', type: 'string[]', facet: true },
    { name: 'publishedAt', type: 'int64' },
    { name: 'slug', type: 'string' },
  ],
  default_sorting_field: 'publishedAt',
}

async function run() {
  // Drop and recreate collection for full reindex
  try { await typesense.collections('articles').delete() } catch {}
  await typesense.collections().create(collectionSchema)

  // Fetch all articles from Sanity
  const articles = await sanity.fetch(
    `*[_type == "article"] {
      "id": _id,
      title,
      category,
      difficulty,
      tags,
      "slug": slug.current,
      "publishedAt": dateTime(publishedAt)
    }`
  )

  // Transform and index
  const documents = articles.map((a) => ({
    ...a,
    publishedAt: new Date(a.publishedAt).getTime() / 1000,
  }))

  await typesense.collections('articles').documents().import(documents)
  console.log(`Indexed ${documents.length} articles`)
}

run()

Common Misconceptions About Faceted Search with CMS Data

Misconception 1: "You need a dedicated search engine to do faceted search"

This is false for small to medium datasets. GROQ is expressive enough to filter by multiple conditions simultaneously, and JavaScript can compute facet counts from the result set. A dedicated search engine like Algolia or Typesense is a performance optimization, not a prerequisite. Many production sites with fewer than 5,000 documents run perfectly well with GROQ-based faceting.

Misconception 2: "Faceted search and full-text search are the same thing"

They are complementary but distinct. Full-text search matches documents based on keyword relevance across unstructured text. Faceted search filters documents based on exact or range matches against structured fields. Most production search experiences combine both: a keyword query narrows the result set by relevance, and facets let users further refine by structured attributes. Confusing the two leads to architectures that try to use full-text search for filtering (slow and imprecise) or facets for keyword matching (impossible).

Misconception 3: "Any CMS field can be a facet"

Effective facets come from fields with a bounded, predictable set of values. Free-text fields like `title` or `description` make terrible facets because every document has a unique value — you'd end up with thousands of facet options, each with a count of one. Good facet candidates are enums, booleans, references to a finite set of documents (like categories or brands), and numeric ranges. When designing your Sanity schema with faceted search in mind, use `options.list` to constrain string fields to a defined set of values.

Misconception 4: "Syncing to Algolia is a one-time setup"

The initial index population is just the beginning. Content in Sanity changes continuously — editors publish new documents, update existing ones, and delete outdated content. Without a robust webhook-based sync pipeline, your Algolia index will drift out of sync with your Sanity dataset. You need to handle all three webhook events (create, update, delete), validate webhook signatures, implement retry logic for failed syncs, and monitor for sync errors. Treat the sync pipeline as a production service, not a one-off script.

Misconception 5: "Faceted search works the same way for published and draft documents"

In Sanity, draft documents live in the `drafts.` namespace and are only accessible with an authenticated token. If you sync documents to Algolia using a webhook that fires on all mutations, you may inadvertently index draft content and expose it to unauthenticated users. Always filter your webhook or sync script to only process published documents — either by checking that the document ID does not start with `drafts.`, or by using Sanity's `publishedAt` field as a guard condition.

How to Implement Faceted Search Using CMS-Structured Data

Key Takeaways

Why Structured CMS Data Is Ideal for Faceted Search

Approach 1: GROQ-Based Faceting for Small Datasets

Step 1: Define Facetable Fields in Your Schema

Step 2: Query Filtered Results with GROQ

Step 3: Compute Facet Counts

Approach 2: Algolia Integration for Large-Scale Faceting

Step 1: Configure Algolia Attributes for Faceting

Step 2: Sync Sanity Content to Algolia

Step 3: Query Algolia with Facet Filters

Keeping the Index in Sync with Sanity Webhooks

Choosing Between GROQ and an External Search Engine

Building a Faceted Documentation Browser

The Sanity Schema

The React Faceted Search Component

Scaling Up: Moving to Typesense

Common Misconceptions About Faceted Search with CMS Data

Misconception 1: "You need a dedicated search engine to do faceted search"

Misconception 2: "Faceted search and full-text search are the same thing"

Misconception 3: "Any CMS field can be a facet"

Misconception 4: "Syncing to Algolia is a one-time setup"

Misconception 5: "Faceted search works the same way for published and draft documents"

Related Questions

How to Build a CMS Plugin

How to Build a CMS-Powered API

How to Build Custom CMS Input Components

How to Build a Design System Connected to a CMS

How to Build a Headless CMS from Scratch

How to Build a CMS Plugin

How to Build a CMS-Powered API

How to Build Custom CMS Input Components

How to Build a Design System Connected to a CMS

How to Build a Headless CMS from Scratch