Level 4 · Advanced — Neural.Literacy

01 — Geometry

Embeddings — text as numbers

Every piece of text can be converted into a list of numbers that captures its meaning. These number lists are called embeddings, and they're the foundation of modern AI search, recommendation, and retrieval systems.

An embedding model takes text and outputs a vector, a list of hundreds or thousands of numbers representing semantic meaning in mathematical space:

vectors

"dog"    → [0.82, 0.15, -0.43, 0.67, ...]
"puppy"  → [0.79, 0.18, -0.41, 0.65, ...]
"car"    → [-0.23, 0.91, 0.34, -0.12, ...]

Notice: "dog" and "puppy" have similar numbers because they have similar meanings. "Car" is very different because it means something unrelated.

Query wordcanine

↑ cosine similarity to query · hover a chip

Traditional search matches exact words. Search "canine," it won't find documents that only say "dog." Embeddings match meaning. Search for "canine" and it finds "dog," "puppy," "hound," "K9", because they're all close in embedding space.

This is the technology behind:

Semantic search (search by meaning, not keywords)
Recommendation engines ("users who liked X also liked Y")
RAG systems (finding relevant documents for AI context)
Clustering (grouping similar documents automatically)

Embedding models you can use:

Model	Provider	Dimensions	Cost
text-embedding-3-small	OpenAI	1536	$0.02/1M tokens
text-embedding-3-large	OpenAI	3072	$0.13/1M tokens
embed-v3	Cohere	1024	Free tier available
all-MiniLM-L6-v2	Open source	384	Free (run locally)

02 — Storage

Vector databases — storing meaning

A regular database stores data in rows and columns, and you query it with exact conditions. A vector database stores embeddings and lets you query by similarity: "Find the 5 most semantically similar items to this query."

How vector search works:

Store docs as embeddings→ Embed your query→ Find nearest stored vectors→ Return matching docs

Popular vector databases:

Database	Type	Best for
ChromaDB	In-process library	Prototyping, small projects
pgvector	PostgreSQL extension	Existing PostgreSQL users
Pinecone	Managed cloud	Production, zero maintenance
Weaviate	Self-hosted or cloud	Complex queries, multi-modal
Qdrant	Self-hosted or cloud	High performance, filtering

✓ Use one when

You have documents/data an AI needs to search
Keyword search isn't finding what you need
You're building RAG
You need semantic similarity, not exact matching

✗ Skip it when

Your data is small enough to fit in a prompt directly
Exact matching works fine
You don't need semantic search
Simple filters (date ranges, categories) solve the problem

⚡ Try this now

Open platform.openai.com/playground and type:
Create embeddings for: "I love programming" and "I enjoy coding" and "I hate Mondays"
Look at the similarity scores. "Love programming" and "enjoy coding" score high. "Hate Mondays" scores low. That gap is embeddings working in front of you.

03 — Coordination

Multi-agent systems

One AI agent is useful. Multiple coordinated AI agents are powerful. Multi-agent systems divide complex tasks among specialized workers orchestrated by a coordinator.

architecture

        User
          ↓
   Orchestrator (coordinator)
     ┌──────┼──────┐
     ↓      ↓      ↓
  Worker A  Worker B  Worker C
  (research) (coding) (analysis)
     └──────┼──────┘
            ↓
   Orchestrator compiles → Final output

Why use multiple agents:

Parallelism: Three workers can work simultaneously. What takes one agent 30 minutes takes three agents 10 minutes.
Specialization: Each worker can have its own system prompt, tools, and model. A coding worker uses a code-optimized model. A research worker has web search access.
Context isolation: Each worker has its own context window. A research agent's context doesn't pollute a coding agent's context.
Fault tolerance: If one worker fails, the others continue. The orchestrator can retry or work around failures.

Common patterns:

↔

Fan-out / Fan-in

Orchestrator splits a task, sends pieces to workers in parallel, collects and merges results.

→

Pipeline

Worker A's output becomes Worker B's input. Sequential processing where each stage adds value.

§

Debate

Multiple agents analyze the same problem from different perspectives, orchestrator synthesizes the best answer.

⊙

Supervisor

One agent monitors and corrects another agent's work in real-time.

Start with one agent. Add more only when you hit the limits of a single agent. Multi-agent is overkill for simple tasks or tasks that require human input at every step.

⚡ Try this now

Think about a task you do repeatedly. Could you split it into three sub-tasks? Write it out: Agent 1 handles [task], Agent 2 handles [task], Agent 3 handles [task]. That sketch is your first multi-agent architecture, on paper.

04 — Specialization

Fine-tuning — customize your model

Fine-tuning means further training a pre-trained model on your own data to make it better at specific tasks. It's the difference between a general doctor and a specialist.

✓ Worth it when

You have 500+ examples of the exact output you want
Prompt engineering alone can't achieve consistent results
You need the model to learn a specific style, format, or domain
You'll use this specialized capability thousands of times

✗ Not worth it when

You have fewer than 100 examples (prompt engineering is better)
Your needs change frequently (fine-tuned models are static)
You just want the model to know your data (RAG is better)
You want better general reasoning (use a better base model)

Methods:

Method	Cost	Hardware	Quality
Full fine-tuning	Very high	GPU cluster	Best
LoRA	Low	Single GPU	Very good
QLoRA	Very low	Consumer GPU	Good
API fine-tuning	Low (pay per token)	None (managed)	Good

The comparison: which approach?

Approach	Best for	Cost	Flexibility
Prompt engineering	Behavior, format, persona	Free	High
RAG	Knowledge, data access	Low	High
Fine-tuning	Style, patterns, domain expertise	Medium-High	Low

Most people don't need fine-tuning. Prompt engineering + RAG solves 90% of use cases at a fraction of the cost. Fine-tuning is the optimization you pursue when the other two have been maxed out.

05 — Watch out

What can go wrong at this level

Advanced tools fail in advanced ways. These four are the ones that quietly wreck projects at this stage.

1. Rushing into advanced topics

You don't yet understand prompt engineering but you want to build a RAG system. You don't know how an API works but you want to fine-tune a model. That sequence is a recipe for frustration and broken builds.

How to avoid: If you can't explain "embeddings" to a friend in one sentence, you're not ready for Level 4 yet. Go back to Level 2 and lock the fundamentals first.

2. Building RAG on bad data

You stand up a RAG system and upload every document you have. The AI answers confidently and wrongly. The reason is usually upstream: your data is outdated, contradictory, or irrelevant to the questions being asked.

How to avoid: Data quality beats quantity. Clean first. Remove duplicates, update stale docs, resolve contradictions. Garbage in, garbage out still holds.

3. Fine-tuning without evaluation

You fine-tune a model and it looks great on the examples you trained it on. In production it performs worse. That's overfitting: the model memorized your training data instead of learning the underlying pattern.

How to avoid: Always hold back a test set the model never sees during training. Evaluate on that. If performance drops on the held-out set, you've overfit.

4. Building multi-agent without error handling

You wire up three agents that work together. One of them errors out. The other two keep running on empty data, and the final output is garbage that still looks plausible enough to ship.

How to avoid: Every agent must handle failures from the others: timeouts, retries, fallbacks. More agents means more failure points. Plan for each one.

06 — What you now know

What you should know after Level 4

You now understand the advanced landscape. Tap each as it clicks:

Embeddings turn text into numbers that capture meaning Vector databases store and search those meaning-vectors Multi-agent systems coordinate specialized workers Fine-tuning customizes models for specific tasks The right approach depends on your specific constraints

You've gone from zero knowledge to understanding the full stack of modern AI, from tokens to agents, from prompts to fine-tuning. The mental model you've built is the foundation for everything you'll build next. The Guide is complete.

Apply it: Workflows Hermes Playbook

The deepend.

Embeddings — text as numbers

Vector databases — storing meaning

Multi-agent systems

Fine-tuning — customize your model

What can go wrong at this level

What you should know after Level 4

The deep
end.