How to Build RAG in Laravel 12: Vector Embeddings, pgvector & AI Search (2026 Guide)

1. What Is RAG and Why Your SaaS Needs It

Retrieval-Augmented Generation (RAG) is a pattern where you retrieve relevant documents from your own data and inject them into the AI prompt as context — before the model generates a response. Instead of relying on the model's training data (which knows nothing about your product), the AI gets your actual documentation, help articles, or internal data as context.

The result: your AI answers with accurate, up-to-date information specific to your product — and it can cite sources. This is the difference between a generic chatbot and one that actually helps your customers.

Common RAG use cases in SaaS:

Customer support bot — answers questions using your help center and documentation
Internal knowledge search — employees search across wikis, Slack exports, meeting notes
Product search — "Find me a laptop under $1000 with 16GB RAM" searches by meaning, not keywords
Legal/compliance assistant — answers questions from policy documents, contracts, regulatory texts
Onboarding copilot — guides new users through your product based on your docs

2. RAG Architecture Overview

A RAG pipeline has two phases: ingestion (offline) and retrieval + generation (real-time).

Ingestion Pipeline (offline, queued)

Load — Import documents (Markdown, PDF, HTML, database records)
Chunk — Split large documents into smaller, meaningful pieces (500-1000 tokens)
Embed — Convert each chunk into a vector (array of numbers) using an embedding model
Store — Save the vectors in PostgreSQL with pgvector

Query Pipeline (real-time)

Embed the query — Convert the user's question into a vector
Search — Find the most similar document chunks using cosine similarity
Augment — Inject the top-k chunks into the AI prompt as context
Generate — The AI answers using the retrieved context

The entire stack: Laravel 12 + PostgreSQL 16 + pgvector + Laravel AI SDK. No Pinecone, no Weaviate, no additional infrastructure. Your existing PostgreSQL database handles everything.

3. Setting Up pgvector with Laravel

pgvector is a PostgreSQL extension that adds vector similarity search. It stores embedding vectors as a native column type and supports fast nearest-neighbor queries via HNSW and IVFFlat indexes.

Install pgvector

If you're using PostgreSQL 16+ on Ubuntu/Debian:

sudo apt install postgresql-16-pgvector

On macOS with Homebrew:

brew install pgvector

With Docker (the easiest option):

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: your_saas
      POSTGRES_USER: laravel
      POSTGRES_PASSWORD: secret
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Enable the Extension in Laravel

Create a migration to enable pgvector:

php artisan make:migration enable_pgvector_extension

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Support\Facades\DB;

return new class extends Migration
{
    public function up(): void
    {
        DB::statement('CREATE EXTENSION IF NOT EXISTS vector');
    }

    public function down(): void
    {
        DB::statement('DROP EXTENSION IF EXISTS vector');
    }
};

Run php artisan migrate and you're ready to store vectors.

4. Document Model and Migration

We need two tables: documents (the source files) and document_chunks (the embedded pieces). This separation lets you re-chunk and re-embed documents without losing the original source.

php artisan make:model Document -m
php artisan make:model DocumentChunk -m

<?php

// database/migrations/xxxx_create_documents_table.php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('documents', function (Blueprint $table) {
            $table->id();
            $table->foreignId('team_id')->constrained()->cascadeOnDelete();
            $table->string('title');
            $table->string('source_type'); // markdown, pdf, html, url
            $table->string('source_path')->nullable();
            $table->longText('content');
            $table->string('checksum', 64); // detect changes
            $table->json('metadata')->nullable();
            $table->timestamp('last_embedded_at')->nullable();
            $table->timestamps();

            $table->index(['team_id', 'source_type']);
        });
    }
};

<?php

// database/migrations/xxxx_create_document_chunks_table.php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('document_chunks', function (Blueprint $table) {
            $table->id();
            $table->foreignId('document_id')->constrained()->cascadeOnDelete();
            $table->foreignId('team_id')->constrained()->cascadeOnDelete();
            $table->text('content');
            $table->integer('chunk_index');
            $table->integer('token_count');
            $table->json('metadata')->nullable();
            $table->timestamps();

            $table->index(['team_id']);
            $table->index(['document_id', 'chunk_index']);
        });

        // Add the vector column (1536 dimensions for text-embedding-3-small)
        DB::statement(
            'ALTER TABLE document_chunks ADD COLUMN embedding vector(1536)'
        );

        // Create an HNSW index for fast similarity search
        DB::statement(
            'CREATE INDEX document_chunks_embedding_idx ON document_chunks
             USING hnsw (embedding vector_cosine_ops)
             WITH (m = 16, ef_construction = 64)'
        );
    }

    public function down(): void
    {
        Schema::dropIfExists('document_chunks');
    }
};

Why 1536 dimensions? That's the output size of OpenAI's text-embedding-3-small model, which offers the best balance of quality, speed, and cost for most RAG applications. If you use Anthropic's or Cohere's embedding models, adjust the dimension accordingly.

The HNSW index (Hierarchical Navigable Small World) gives you sub-millisecond similarity queries even with millions of rows. It's the recommended index type for most workloads.

The Eloquent Models

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;
use Illuminate\Database\Eloquent\Relations\HasMany;

class Document extends Model
{
    protected $guarded = [];

    protected $casts = [
        'metadata' => 'array',
        'last_embedded_at' => 'datetime',
    ];

    public function team(): BelongsTo
    {
        return $this->belongsTo(Team::class);
    }

    public function chunks(): HasMany
    {
        return $this->hasMany(DocumentChunk::class);
    }

    public function needsReembedding(): bool
    {
        return is_null($this->last_embedded_at)
            || $this->updated_at->gt($this->last_embedded_at);
    }
}

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;
use Illuminate\Support\Facades\DB;

class DocumentChunk extends Model
{
    protected $guarded = [];

    protected $casts = [
        'metadata' => 'array',
    ];

    public function document(): BelongsTo
    {
        return $this->belongsTo(Document::class);
    }

    public function team(): BelongsTo
    {
        return $this->belongsTo(Team::class);
    }

    /**
     * Find the most similar chunks to a query embedding.
     */
    public static function searchByEmbedding(
        array $embedding,
        int $teamId,
        int $limit = 5,
        float $threshold = 0.3
    ): \Illuminate\Support\Collection {
        $vectorString = '[' . implode(',', $embedding) . ']';

        return static::query()
            ->select('document_chunks.*')
            ->selectRaw(
                '1 - (embedding <=> ?) as similarity',
                [$vectorString]
            )
            ->where('team_id', $teamId)
            ->whereRaw(
                '1 - (embedding <=> ?) > ?',
                [$vectorString, $threshold]
            )
            ->orderByDesc('similarity')
            ->limit($limit)
            ->with('document:id,title,source_type')
            ->get();
    }
}

The <=> operator is pgvector's cosine distance operator. We compute 1 - distance to get a similarity score from 0 (unrelated) to 1 (identical). The threshold parameter filters out irrelevant results.

5. Document Chunking Strategies

Chunking is the most important step in a RAG pipeline. Bad chunks = bad retrieval = bad answers. The goal: each chunk should contain one coherent idea with enough context to be useful on its own.

<?php

namespace App\Services\Rag;

class DocumentChunker
{
    public function __construct(
        private int $maxTokens = 500,
        private int $overlapTokens = 50,
    ) {}

    /**
     * Chunk a document into overlapping pieces.
     */
    public function chunk(string $content, string $sourceType = 'markdown'): array
    {
        return match ($sourceType) {
            'markdown' => $this->chunkMarkdown($content),
            'html' => $this->chunkMarkdown(strip_tags($content)),
            default => $this->chunkByTokens($content),
        };
    }

    /**
     * Smart chunking for Markdown: split by headings first,
     * then by paragraphs, then by token count.
     */
    private function chunkMarkdown(string $content): array
    {
        // Split by headings (## or ###)
        $sections = preg_split('/(?=^#{1,3}\s)/m', $content);
        $chunks = [];

        foreach ($sections as $section) {
            $section = trim($section);
            if (empty($section)) continue;

            $tokenCount = $this->estimateTokens($section);

            if ($tokenCount <= $this->maxTokens) {
                $chunks[] = [
                    'content' => $section,
                    'token_count' => $tokenCount,
                ];
            } else {
                // Section too large — split by paragraphs with overlap
                $subChunks = $this->chunkByParagraphs($section);
                array_push($chunks, ...$subChunks);
            }
        }

        return $chunks;
    }

    private function chunkByParagraphs(string $text): array
    {
        $paragraphs = preg_split('/\n\n+/', $text);
        $chunks = [];
        $currentChunk = '';
        $currentTokens = 0;

        foreach ($paragraphs as $paragraph) {
            $paragraphTokens = $this->estimateTokens($paragraph);

            if ($currentTokens + $paragraphTokens > $this->maxTokens && $currentChunk !== '') {
                $chunks[] = [
                    'content' => trim($currentChunk),
                    'token_count' => $currentTokens,
                ];

                // Keep overlap from previous chunk
                $overlapText = $this->getOverlapText($currentChunk);
                $currentChunk = $overlapText . "\n\n" . $paragraph;
                $currentTokens = $this->estimateTokens($currentChunk);
            } else {
                $currentChunk .= ($currentChunk ? "\n\n" : '') . $paragraph;
                $currentTokens += $paragraphTokens;
            }
        }

        if (trim($currentChunk) !== '') {
            $chunks[] = [
                'content' => trim($currentChunk),
                'token_count' => $currentTokens,
            ];
        }

        return $chunks;
    }

    private function chunkByTokens(string $text): array
    {
        $words = explode(' ', $text);
        $chunks = [];
        $currentWords = [];

        foreach ($words as $word) {
            $currentWords[] = $word;
            if (count($currentWords) >= $this->maxTokens * 0.75) {
                $content = implode(' ', $currentWords);
                $chunks[] = [
                    'content' => $content,
                    'token_count' => $this->estimateTokens($content),
                ];

                // Keep last N words as overlap
                $overlapCount = (int) ($this->overlapTokens * 0.75);
                $currentWords = array_slice($currentWords, -$overlapCount);
            }
        }

        if (!empty($currentWords)) {
            $content = implode(' ', $currentWords);
            $chunks[] = [
                'content' => $content,
                'token_count' => $this->estimateTokens($content),
            ];
        }

        return $chunks;
    }

    private function getOverlapText(string $text): string
    {
        $words = explode(' ', $text);
        $overlapWords = array_slice($words, -(int) ($this->overlapTokens * 0.75));
        return implode(' ', $overlapWords);
    }

    private function estimateTokens(string $text): int
    {
        // Rough estimate: 1 token ≈ 0.75 words for English
        return (int) ceil(str_word_count($text) / 0.75);
    }
}

Key chunking rules:

500-1000 tokens per chunk — Large enough for context, small enough for precise retrieval
50-100 token overlap — Prevents information loss at chunk boundaries
Respect structure — Split by headings/sections first, then paragraphs, then raw tokens as a last resort
Keep metadata — Attach section titles, page numbers, or source URLs to each chunk for citation

6. Generating Embeddings with the Laravel AI SDK

The Laravel AI SDK provides a clean API for generating embeddings. An embedding is a vector (array of floats) that represents the semantic meaning of text. Similar texts produce similar vectors — which is how semantic search works.

<?php

namespace App\Services\Rag;

use Laravel\Ai\Facades\Ai;

class EmbeddingService
{
    /**
     * Generate an embedding for a single text.
     */
    public function embed(string $text): array
    {
        $response = Ai::embeddings()
            ->model('text-embedding-3-small')
            ->create($text);

        return $response->embedding;
    }

    /**
     * Generate embeddings for multiple texts in a single API call.
     * Much more efficient than embedding one at a time.
     */
    public function embedBatch(array $texts): array
    {
        $response = Ai::embeddings()
            ->model('text-embedding-3-small')
            ->create($texts);

        return $response->embeddings;
    }
}

Always batch your embeddings. Embedding 100 chunks in one API call is ~50x faster than 100 individual calls. The Laravel AI SDK handles this automatically when you pass an array.

Cost reference: text-embedding-3-small costs ~$0.02 per million tokens. A 10,000-word document (~13,000 tokens) costs less than $0.001 to embed. Embeddings are a one-time cost per document — you only re-embed when content changes.

7. Building the Ingestion Pipeline

Now we connect the pieces: load a document, chunk it, embed all chunks, and store the vectors. This runs as a queued job so it doesn't block the user.

<?php

namespace App\Jobs;

use App\Models\Document;
use App\Services\Rag\DocumentChunker;
use App\Services\Rag\EmbeddingService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\DB;

class EmbedDocument implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 3;
    public int $backoff = 30;

    public function __construct(
        public Document $document,
    ) {}

    public function handle(
        DocumentChunker $chunker,
        EmbeddingService $embeddings,
    ): void {
        $chunks = $chunker->chunk(
            $this->document->content,
            $this->document->source_type
        );

        if (empty($chunks)) return;

        // Generate embeddings in batch
        $texts = array_column($chunks, 'content');
        $vectors = $embeddings->embedBatch($texts);

        DB::transaction(function () use ($chunks, $vectors) {
            // Delete old chunks for this document
            $this->document->chunks()->delete();

            // Insert new chunks with embeddings
            foreach ($chunks as $index => $chunk) {
                $vectorString = '[' . implode(',', $vectors[$index]) . ']';

                DB::statement(
                    'INSERT INTO document_chunks
                     (document_id, team_id, content, chunk_index, token_count, embedding, created_at, updated_at)
                     VALUES (?, ?, ?, ?, ?, ?::vector, NOW(), NOW())',
                    [
                        $this->document->id,
                        $this->document->team_id,
                        $chunk['content'],
                        $index,
                        $chunk['token_count'],
                        $vectorString,
                    ]
                );
            }

            $this->document->update(['last_embedded_at' => now()]);
        });
    }
}

Triggering Ingestion

<?php

namespace App\Http\Controllers;

use App\Jobs\EmbedDocument;
use App\Models\Document;
use Illuminate\Http\Request;

class DocumentController extends Controller
{
    public function store(Request $request)
    {
        $request->validate([
            'title' => 'required|string|max:255',
            'content' => 'required|string',
            'source_type' => 'required|in:markdown,html,text',
        ]);

        $document = $request->user()->currentTeam->documents()->create([
            'title' => $request->title,
            'content' => $request->content,
            'source_type' => $request->source_type,
            'checksum' => hash('sha256', $request->content),
        ]);

        // Queue the embedding job
        EmbedDocument::dispatch($document);

        return response()->json([
            'document' => $document,
            'status' => 'processing',
            'message' => 'Document is being indexed for AI search.',
        ], 201);
    }

    /**
     * Bulk import from a directory (e.g., your docs folder).
     */
    public function importFromDirectory(Request $request)
    {
        $request->validate(['path' => 'required|string']);
        $team = $request->user()->currentTeam;

        $files = glob($request->path . '/*.md');
        $imported = 0;

        foreach ($files as $file) {
            $content = file_get_contents($file);
            $checksum = hash('sha256', $content);

            $document = Document::updateOrCreate(
                ['team_id' => $team->id, 'source_path' => $file],
                [
                    'title' => basename($file, '.md'),
                    'content' => $content,
                    'source_type' => 'markdown',
                    'checksum' => $checksum,
                ]
            );

            if ($document->needsReembedding()) {
                EmbedDocument::dispatch($document);
                $imported++;
            }
        }

        return response()->json([
            'total_files' => count($files),
            'queued_for_embedding' => $imported,
        ]);
    }
}

The checksum field ensures we only re-embed documents that have actually changed. If a user re-uploads the same content, we skip the expensive embedding step.

8. Semantic Search: Querying by Meaning

Traditional search (LIKE, full-text) matches keywords. Semantic search matches meaning. The query "how to cancel my account" finds a document titled "Subscription Cancellation Policy" even though they share zero keywords.

<?php

namespace App\Services\Rag;

use App\Models\DocumentChunk;

class SemanticSearch
{
    public function __construct(
        private EmbeddingService $embeddings,
    ) {}

    /**
     * Search documents by semantic similarity.
     */
    public function search(
        string $query,
        int $teamId,
        int $limit = 5,
        float $threshold = 0.3,
    ): \Illuminate\Support\Collection {
        // Convert the query to a vector
        $queryEmbedding = $this->embeddings->embed($query);

        // Find similar chunks
        $results = DocumentChunk::searchByEmbedding(
            $queryEmbedding,
            $teamId,
            $limit,
            $threshold,
        );

        return $results->map(fn ($chunk) => [
            'content' => $chunk->content,
            'similarity' => round($chunk->similarity, 4),
            'source' => $chunk->document->title,
            'source_type' => $chunk->document->source_type,
            'chunk_index' => $chunk->chunk_index,
        ]);
    }
}

Expose as an API Endpoint

// routes/web.php
Route::middleware(['auth', 'verified'])->group(function () {
    Route::post('/search', function (Request $request, SemanticSearch $search) {
        $request->validate(['query' => 'required|string|max:500']);

        $results = $search->search(
            query: $request->query,
            teamId: $request->user()->currentTeam->id,
            limit: $request->integer('limit', 5),
        );

        return response()->json(['results' => $results]);
    });
});

This endpoint alone is incredibly valuable. Even without the AI generation step, semantic search gives your users a powerful way to find information that keyword search would miss.

9. The RAG Agent: Retrieval + Generation

Now the main event: an AI agent that retrieves relevant documents and generates accurate, cited answers. We use the Laravel AI SDK's tool system to give the agent access to your knowledge base.

<?php

namespace App\Ai\Agents;

use App\Ai\Tools\SearchKnowledgeBase;
use App\Services\Rag\SemanticSearch;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Enums\Lab;
use Laravel\Ai\Promptable;

#[Provider(Lab::Anthropic)]
#[Model('claude-sonnet-4-5-20250514')]
#[Temperature(0.3)]
#[MaxTokens(2048)]
class KnowledgeBaseAgent implements Agent, HasTools
{
    use Promptable;

    public function __construct(
        private int $teamId,
    ) {}

    public function instructions(): string
    {
        return <<<'PROMPT'
        You are a helpful assistant for a SaaS application. Answer the user's question
        using ONLY the information retrieved from the knowledge base. Follow these rules:

        1. Always search the knowledge base before answering.
        2. If the knowledge base contains relevant information, answer based on it
           and cite your sources using [Source: document title].
        3. If the knowledge base has no relevant results, say "I don't have information
           about that in our documentation. Please contact support for help."
        4. Never make up information. Never answer from your general training data
           when the user asks about product-specific topics.
        5. Be concise and direct. Use bullet points for multi-step instructions.
        PROMPT;
    }

    public function tools(): array
    {
        return [
            new SearchKnowledgeBase($this->teamId),
        ];
    }
}

<?php

namespace App\Ai\Tools;

use App\Services\Rag\SemanticSearch;
use Laravel\Ai\Contracts\Tool;
use Laravel\Ai\Tool\ToolParameter;

class SearchKnowledgeBase implements Tool
{
    public function __construct(
        private int $teamId,
    ) {}

    public function name(): string
    {
        return 'search_knowledge_base';
    }

    public function description(): string
    {
        return 'Search the knowledge base for relevant documentation, ' .
               'help articles, and product information. Use this tool ' .
               'to find answers to user questions.';
    }

    public function parameters(): array
    {
        return [
            ToolParameter::string('query')
                ->description('The search query — rephrase the user question as a search query')
                ->required(),
            ToolParameter::integer('limit')
                ->description('Number of results to retrieve (default: 5)')
                ->required(false),
        ];
    }

    public function handle(string $query, int $limit = 5): string
    {
        $search = app(SemanticSearch::class);

        $results = $search->search(
            query: $query,
            teamId: $this->teamId,
            limit: $limit,
        );

        if ($results->isEmpty()) {
            return 'No relevant documents found in the knowledge base.';
        }

        // Format results for the AI context
        return $results->map(fn ($r) =>
            "---\nSource: {$r['source']} (similarity: {$r['similarity']})\n{$r['content']}\n---"
        )->implode("\n\n");
    }
}

Using the RAG Agent

// routes/web.php
Route::post('/ai/ask', function (Request $request) {
    $request->validate(['question' => 'required|string|max:1000']);

    $agent = new KnowledgeBaseAgent($request->user()->currentTeam->id);

    $response = $agent
        ->forUser($request->user())
        ->prompt($request->question);

    return response()->json([
        'answer' => $response->text,
        'conversation_id' => $response->conversationId,
    ]);
})->middleware(['auth', 'verified']);

// With streaming for real-time UX:
Route::post('/ai/ask/stream', function (Request $request) {
    $request->validate(['question' => 'required|string|max:1000']);

    $agent = new KnowledgeBaseAgent($request->user()->currentTeam->id);

    return $agent
        ->forUser($request->user())
        ->stream($request->question);
})->middleware(['auth', 'verified']);

The beauty of the tool-based approach: the agent decides when to search and what to search for. It might reformulate the user's question into a better search query, or make multiple searches for complex questions. The SDK handles the entire tool-use loop automatically.

10. Multi-Tenant RAG: Scoping per Team

In a SaaS application, each team has their own knowledge base. Team A's internal documents must never appear in Team B's search results. We've already built this into the data model — every query is scoped by team_id.

<?php

namespace App\Models\Scopes;

use Illuminate\Database\Eloquent\Builder;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Scope;

class TeamScope implements Scope
{
    public function apply(Builder $builder, Model $model): void
    {
        if ($team = currentTeam()) {
            $builder->where($model->getTable() . '.team_id', $team->id);
        }
    }
}

// Apply to both models:
// In Document.php and DocumentChunk.php:
protected static function booted(): void
{
    static::addGlobalScope(new TeamScope);
}

With global scopes, even raw queries through DocumentChunk::searchByEmbedding() are automatically filtered by team. It's impossible for a query to leak data across teams — the WHERE team_id = ? clause is always present.

For SaaS applications with different subscription tiers, you can limit the knowledge base size per plan:

// In your Team model
public function maxDocuments(): int
{
    return match (true) {
        $this->onBusinessPlan() => 10000,
        $this->onProPlan() => 1000,
        default => 50, // Free plan
    };
}

public function maxStorageMb(): int
{
    return match (true) {
        $this->onBusinessPlan() => 5000,
        $this->onProPlan() => 500,
        default => 25,
    };
}

11. Production Optimization

Cache Embeddings for Common Queries

Users often ask similar questions. Cache the query embedding to skip the API call:

public function search(string $query, int $teamId, int $limit = 5): Collection
{
    $cacheKey = 'emb:' . hash('xxh3', $query);

    $queryEmbedding = Cache::remember($cacheKey, now()->addHours(24), function () use ($query) {
        return $this->embeddings->embed($query);
    });

    return DocumentChunk::searchByEmbedding($queryEmbedding, $teamId, $limit);
}

Hybrid Search: Semantic + Full-Text

Pure semantic search occasionally misses exact keyword matches (e.g., product names, error codes). Combine both approaches for the best results:

public function hybridSearch(string $query, int $teamId, int $limit = 5): Collection
{
    // Semantic results
    $semanticResults = $this->search($query, $teamId, $limit);

    // Full-text results
    $keywordResults = DocumentChunk::query()
        ->where('team_id', $teamId)
        ->whereRaw(
            "to_tsvector('english', content) @@ plainto_tsquery('english', ?)",
            [$query]
        )
        ->limit($limit)
        ->get()
        ->map(fn ($chunk) => [
            'content' => $chunk->content,
            'similarity' => 0.5, // Fixed score for keyword matches
            'source' => $chunk->document->title,
            'source_type' => $chunk->document->source_type,
            'chunk_index' => $chunk->chunk_index,
        ]);

    // Merge, deduplicate, and re-rank
    return $semanticResults
        ->concat($keywordResults)
        ->unique('content')
        ->sortByDesc('similarity')
        ->take($limit)
        ->values();
}

Index Tuning

For production workloads with 100k+ chunks, tune your HNSW index:

-- For higher recall (more accurate, slightly slower)
SET hnsw.ef_search = 100; -- default is 40

-- If you have 1M+ rows, consider IVFFlat for faster inserts
CREATE INDEX ON document_chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);

-- Analyze after large batch imports
ANALYZE document_chunks;

Re-Embedding Pipeline

Create an Artisan command to re-embed stale documents:

php artisan make:command ReembedDocuments

<?php

namespace App\Console\Commands;

use App\Jobs\EmbedDocument;
use App\Models\Document;
use Illuminate\Console\Command;

class ReembedDocuments extends Command
{
    protected $signature = 'rag:reembed {--team= : Specific team ID} {--force : Re-embed all}';
    protected $description = 'Re-embed documents that have changed since last embedding';

    public function handle(): void
    {
        $query = Document::query();

        if ($teamId = $this->option('team')) {
            $query->where('team_id', $teamId);
        }

        if (!$this->option('force')) {
            $query->where(function ($q) {
                $q->whereNull('last_embedded_at')
                  ->orWhereColumn('updated_at', '>', 'last_embedded_at');
            });
        }

        $count = $query->count();
        $this->info("Queuing {$count} documents for re-embedding...");

        $query->each(function (Document $document) {
            EmbedDocument::dispatch($document);
        });

        $this->info('Done. Jobs are processing in the background.');
    }
}

Schedule it to run daily to catch any content updates:

// routes/console.php
Schedule::command('rag:reembed')->daily();

12. Testing Your RAG Pipeline

RAG has multiple moving parts. Here's how to test each one without hitting external APIs.

<?php

use App\Ai\Agents\KnowledgeBaseAgent;
use App\Jobs\EmbedDocument;
use App\Models\Document;
use App\Models\DocumentChunk;
use App\Services\Rag\DocumentChunker;
use App\Services\Rag\EmbeddingService;

// Unit test: chunking
test('markdown document is chunked by headings', function () {
    $chunker = new DocumentChunker(maxTokens: 100, overlapTokens: 10);

    $markdown = <<<'MD'
    ## Getting Started
    This is the getting started guide with enough content to stand alone.

    ## Installation
    Run the following command to install the package.

    ## Configuration
    Configure your environment variables.
    MD;

    $chunks = $chunker->chunk($markdown, 'markdown');

    expect($chunks)->toHaveCount(3)
        ->and($chunks[0]['content'])->toContain('Getting Started')
        ->and($chunks[1]['content'])->toContain('Installation')
        ->and($chunks[2]['content'])->toContain('Configuration');
});

// Unit test: large sections are split further
test('large sections are split into smaller chunks', function () {
    $chunker = new DocumentChunker(maxTokens: 50, overlapTokens: 10);
    $longSection = "## Big Section\n\n" . implode("\n\n", array_fill(0, 20, 'This is a paragraph with enough words to take up token space in the chunk.'));

    $chunks = $chunker->chunk($longSection, 'markdown');
    expect($chunks)->toHaveCountGreaterThan(1);
});

// Integration test: embedding and search
test('embedded documents are found via semantic search', function () {
    $team = Team::factory()->create();

    // Mock the embedding service to return deterministic vectors
    $this->mock(EmbeddingService::class, function ($mock) {
        $mock->shouldReceive('embedBatch')
            ->andReturn([
                array_fill(0, 1536, 0.1), // chunk 1 vector
                array_fill(0, 1536, 0.2), // chunk 2 vector
            ]);
        $mock->shouldReceive('embed')
            ->andReturn(array_fill(0, 1536, 0.1)); // query vector (similar to chunk 1)
    });

    $document = Document::factory()->create([
        'team_id' => $team->id,
        'content' => "## Refunds\nOur refund policy allows 14-day returns.\n\n## Pricing\nPlans start at \$49.",
        'source_type' => 'markdown',
    ]);

    // Run the embedding job synchronously
    (new EmbedDocument($document))->handle(
        new DocumentChunker,
        app(EmbeddingService::class),
    );

    expect(DocumentChunk::where('team_id', $team->id)->count())->toBe(2);
});

// Feature test: RAG agent returns cited answers
test('knowledge base agent cites sources', function () {
    KnowledgeBaseAgent::fake([
        'Based on our documentation, the refund policy allows returns within 14 days. [Source: Refund Policy]',
    ]);

    $user = User::factory()->withTeam()->create();

    $this->actingAs($user)
        ->postJson('/ai/ask', ['question' => 'What is your refund policy?'])
        ->assertOk()
        ->assertJsonFragment(['answer' => 'Based on our documentation, the refund policy allows returns within 14 days. [Source: Refund Policy]']);

    KnowledgeBaseAgent::assertPrompted(fn ($prompt) =>
        str_contains($prompt->prompt, 'refund policy')
    );
});

// Feature test: team isolation
test('team A cannot search team B documents', function () {
    $this->mock(EmbeddingService::class, function ($mock) {
        $mock->shouldReceive('embed')->andReturn(array_fill(0, 1536, 0.5));
    });

    $teamA = Team::factory()->create();
    $teamB = Team::factory()->create();

    DocumentChunk::factory()->create(['team_id' => $teamB->id, 'content' => 'Secret info']);

    $results = DocumentChunk::searchByEmbedding(
        array_fill(0, 1536, 0.5),
        $teamA->id,
        limit: 10,
    );

    expect($results)->toBeEmpty();
});

13. Conclusion

You now have a complete RAG pipeline in Laravel 12. Let's recap:

pgvector — Vector storage and similarity search in your existing PostgreSQL database
Document chunking — Smart splitting by headings, paragraphs, and token limits with overlap
Batch embeddings — Efficient vector generation with the Laravel AI SDK
Queued ingestion — Non-blocking document processing with change detection
Semantic search — Find documents by meaning, not just keywords
Hybrid search — Combine semantic and full-text for maximum recall
RAG agent — AI answers grounded in your actual documentation with source citations
Multi-tenancy — Team-scoped knowledge bases with plan-based limits
Production optimizations — Caching, index tuning, and automated re-embedding
Testing — Deterministic tests with mocked embeddings and agent fakes

This is the architecture behind AI-powered support bots, documentation search, and internal knowledge systems in production SaaS apps. The best part: it runs on your existing PostgreSQL instance with zero additional infrastructure. No Pinecone subscription, no Weaviate cluster, no vendor lock-in.

Building the SaaS foundation — authentication, teams, billing, admin panel — before adding RAG is a significant amount of work. LaraSpeed ships with that entire foundation pre-built, including multi-tenancy with team scoping, Stripe billing with plan limits, and the Filament admin panel. Combine it with the RAG pipeline from this tutorial and you have a production-ready AI-powered SaaS in days.

How to Build RAG in Laravel 12: Vector Embeddings, pgvector & AI Search