> ## Documentation Index
> Fetch the complete documentation index at: https://phidatainc-studio-tools-doc.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Semantic Chunking

Semantic chunking is a method of splitting documents into smaller chunks by analyzing semantic similarity between text segments using embeddings. It uses the Chonkie library to identify natural breakpoints where the semantic meaning changes significantly, based on a configurable similarity threshold. Learn more about [semantic chunking](https://docs.chonkie.ai/oss/chunkers/semantic-chunker). This helps preserve context and meaning better than fixed-size chunking by ensuring semantically related content stays together in the same chunk, while splitting occurs at meaningful topic transitions.

Semantic chunking supports three embedder configurations: **Agno Embeddings** uses an Agno Embedder, **Chonkie Embeddings** uses Chonkie's built-in embeddings handlers, and **AutoEmbeddings** uses Chonkie's AutoEmbeddings for automatic selection based on the model string. Learn more about [Chonkie embeddings](https://docs.chonkie.ai/oss/embeddings/overview).

<Steps>
  <Step title="Create a Python file">
    <CodeGroup>
      ```python Agno Embeddings theme={null}
      from agno.agent import Agent
      from agno.knowledge.chunking.semantic import SemanticChunking
      from agno.knowledge.embedder.openai import OpenAIEmbedder
      from agno.knowledge.knowledge import Knowledge
      from agno.knowledge.reader.pdf_reader import PDFReader
      from agno.vectordb.pgvector import PgVector

      db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

      embedder = OpenAIEmbedder(id="text-embedding-3-small")

      knowledge = Knowledge(
          vector_db=PgVector(
              table_name="recipes_semantic_chunking", db_url=db_url, embedder=embedder
          ),
      )
      knowledge.insert(
          url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
          reader=PDFReader(
              name="Semantic Chunking Reader",
              chunking_strategy=SemanticChunking(
                  embedder=embedder,  # Use same Agno embedder for chunking
                  chunk_size=500,
                  similarity_threshold=0.5,
                  similarity_window=3,
                  min_sentences_per_chunk=1,
                  min_characters_per_sentence=24,
                  delimiters=[". ", "! ", "? ", "\n"],
                  include_delimiters="prev",
                  skip_window=0,
                  filter_window=5,
                  filter_polyorder=3,
                  filter_tolerance=0.2,
              ),
          ),
      )

      agent = Agent(
          knowledge=knowledge,
          search_knowledge=True,
      )

      agent.print_response("How to make Thai curry?", markdown=True)
      ```

      ```python Chonkie Embeddings theme={null}
      from agno.agent import Agent
      from agno.knowledge.chunking.semantic import SemanticChunking
      from agno.knowledge.embedder.openai import OpenAIEmbedder
      from agno.knowledge.knowledge import Knowledge
      from agno.knowledge.reader.pdf_reader import PDFReader
      from agno.vectordb.pgvector import PgVector
      from chonkie.embeddings import OpenAIEmbeddings

      db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

      agno_embedder = OpenAIEmbedder(id="text-embedding-3-small")  # For vector database
      chonkie_embedder = OpenAIEmbeddings(
          model="text-embedding-3-small"
      )  # For semantic chunking

      knowledge = Knowledge(
          vector_db=PgVector(
              table_name="recipes_semantic_chunking", db_url=db_url, embedder=agno_embedder
          ),
      )
      knowledge.insert(
          url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
          reader=PDFReader(
              name="Semantic Chunking Reader",
              chunking_strategy=SemanticChunking(
                  embedder=chonkie_embedder,  # Use Chonkie embedder for chunking
                  chunk_size=500,
                  similarity_threshold=0.5,
                  similarity_window=3,
                  min_sentences_per_chunk=1,
                  min_characters_per_sentence=24,
                  delimiters=[". ", "! ", "? ", "\n"],
                  include_delimiters="prev",
                  skip_window=0,
                  filter_window=5,
                  filter_polyorder=3,
                  filter_tolerance=0.2,
              ),
          ),
      )

      agent = Agent(
          knowledge=knowledge,
          search_knowledge=True,
      )

      agent.print_response("How to make Thai curry?", markdown=True)
      ```

      ```python AutoEmbeddings theme={null}
      from agno.agent import Agent
      from agno.knowledge.chunking.semantic import SemanticChunking
      from agno.knowledge.knowledge import Knowledge
      from agno.knowledge.reader.pdf_reader import PDFReader
      from agno.vectordb.pgvector import PgVector

      db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

      knowledge = Knowledge(
          vector_db=PgVector(table_name="recipes_semantic_chunking", db_url=db_url),
      )
      knowledge.insert(
          url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
          reader=PDFReader(
              name="Semantic Chunking Reader",
              chunking_strategy=SemanticChunking(
                  embedder="text-embedding-3-small",  # String model ID uses Chonkie's AutoEmbeddings
                  chunk_size=500,
                  similarity_threshold=0.5,
                  similarity_window=3,
                  min_sentences_per_chunk=1,
                  min_characters_per_sentence=24,
                  delimiters=[". ", "! ", "? ", "\n"],
                  include_delimiters="prev",
                  skip_window=0,
                  filter_window=5,
                  filter_polyorder=3,
                  filter_tolerance=0.2,
              ),
          ),
      )

      agent = Agent(
          knowledge=knowledge,
          search_knowledge=True,
      )

      agent.print_response("How to make Thai curry?", markdown=True)
      ```
    </CodeGroup>
  </Step>

  <Snippet file="create-venv-step.mdx" />

  <Step title="Install dependencies">
    ```bash theme={null}
    uv pip install -U agno sqlalchemy psycopg pgvector chonkie openai
    ```
  </Step>

  <Snippet file="set-openai-key.mdx" />

  <Snippet file="run-pgvector-step.mdx" />

  <Step title="Run the script">
    ```bash theme={null}
    python semantic_chunking.py
    ```
  </Step>
</Steps>

## Semantic Chunking Params

<Snippet file="chunking-semantic.mdx" />
