Stop Throwing Context at LLMs: Use Embeddings Instead

When it comes to bringing AI into your application, the easiest thing is to load a lot of context, pick the best LLM available, and let it do the job.

There is only one small problem:

It does not scale.

It becomes expensive very fast, and at some point the feature stops making economic sense. The good news is that if you know what you are doing, you can often get the same — or even better — results with a fraction of the cost.

I have talked about my side project, gogetcv.com, before. One of the problems I had was extracting job requirements from a job description and saving them in a consistent way, so they could later be matched against a user’s experience.

The extraction itself was not the hard part. Any decent LLM can extract requirements from a job ad.

The hard part was consistency.

For example:

“Own system design decisions”
“Design scalable distributed systems”

These are not the same words, but in my system they should probably both map to something like:

Software architecture

Now add multilingual job descriptions, translation to English as the base language, and a growing list of requirements. Suddenly this becomes a more interesting problem.

Option 1: No Context

The simplest version is to ask the LLM to extract whatever it finds.

That works, but the output is not useful enough.

One job description gives you:

Design scalable distributed systems

Another gives you:

Own system design decisions

Both are reasonable extractions, but now your system treats them as different things. They are not connected to Software architecture, and you slowly end up with a database full of almost-duplicates.

This is exactly what I wanted to avoid.

Option 2: Put All Allowed Keywords in the Prompt

The next option is to give the LLM a fixed list of allowed keywords.

Something like:

You are a job requirement extraction agent.
Your job is to find requirements from a job description and return them as a JSON array of strings.
Use only the allowed keywords below.
Allowed keywords:
- Technical leadership
- Software architecture
- Backend
- ...
Job description:
{ADD JOB DESCRIPTION HERE}

You are a job requirement extraction agent.
Your job is to find requirements from a job description and return them as a JSON array of strings.
Use only the allowed keywords below.
Allowed keywords:
- Technical leadership
- Software architecture
- Backend
- ...
Job description:
{ADD JOB DESCRIPTION HERE}

YAML

This works surprisingly well at small scale.

Even cheaper models can do a decent job here. You do not always need the most expensive frontier model if the task is constrained well enough.

But there are two problems.

First, the context grows.

If you have a few hundred allowed keywords, smaller models start to struggle. The quality drops, the output becomes inconsistent, and sometimes the model simply does not have enough room to reason and respond properly.

Second, if you allow the model to create new keywords when no match is found, it will often do that too eagerly.

So instead of mapping “Design scalable distributed systems” to “Software architecture”, it may just create a new keyword called: “Design scalable distributed systems”

Now your keyword list grows, your prompt grows, and your system gets worse over time.

That is not a scalable architecture. That is just pushing the mess into the prompt.

Option 3: Use Embeddings as Part of the System

The better solution is to introduce a matching layer between extracted requirements and your existing requirement database.

This is where embeddings become useful.

I am not going to explain embeddings from zero. The short version is: text gets represented as a point in a high-dimensional space. Similar meanings should end up closer together.

So ideally: Software architecture is closer to: Design scalable distributed systems than it is to Chemical engineering

But there is an important detail here.

You should not always embed just the raw keyword.

For short concepts, a single word or phrase often does not contain enough meaning. Something like Performance, Quality Control, or Architecture can mean very different things depending on the domain.

So I always thinking about embedding text with this formula and then develop it to use case:

[object] + [category] + [function] + [important properties]

[object] + [category] + [function] + [important properties]

YAML

For my use case, that evolved into:

[skill] + [what it involves] + [where it is used] + [related concepts] + [synonyms]

[skill] + [what it involves] + [where it is used] + [related concepts] + [synonyms]

YAML

The important thing is that I do not only store the enriched text. I also store metadata around it.

Something like:

{
  "skill": "software architecture",
  "embedding_text": "software architecture involves designing scalable, maintainable, and reliable software systems; it is used in backend platforms, distributed systems, cloud applications, and enterprise software; related concepts include system design, microservices, scalability, reliability, maintainability, technical leadership, and architectural patterns; synonyms include solution architecture, application architecture, system architecture, and software design",
  "category": "engineering",
  "type": "technical_skill",
  "domain": [
    "backend",
    "distributed systems",
    "cloud",
    "enterprise software"
  ],
  "synonyms": [
    "solution architecture",
    "application architecture",
    "system architecture",
    "software design"
  ]
}

{
  "skill": "software architecture",
  "embedding_text": "software architecture involves designing scalable, maintainable, and reliable software systems; it is used in backend platforms, distributed systems, cloud applications, and enterprise software; related concepts include system design, microservices, scalability, reliability, maintainability, technical leadership, and architectural patterns; synonyms include solution architecture, application architecture, system architecture, and software design",
  "category": "engineering",
  "type": "technical_skill",
  "domain": [
    "backend",
    "distributed systems",
    "cloud",
    "enterprise software"
  ],
  "synonyms": [
    "solution architecture",
    "application architecture",
    "system architecture",
    "software design"
  ]
}

JSON

Then I embed only the embedding_text, while the rest stays as metadata for filtering, search, and system logic.

You can use an LLM to generate this enrichment. The interesting part is that you do not need a very expensive model for this. Even a small model can do a decent job because the task is structured and repeatable.

Once this is saved in the database, you can store the embedding in something like Postgres with pgvector. That takes you quite far before you need anything more exotic.

Matching a New Requirement

Let’s say the job description contains:

Design scalable distributed systems

I first enrich it into something like:

{
  "skill": "design scalable distributed systems",
  "embedding_text": "design scalable distributed systems involves architecting software that runs across multiple services, nodes, or regions while handling high traffic, failures, and growth; it is used in backend platforms, cloud-native applications, microservices, data-intensive systems, and enterprise-scale software; related concepts include system design, scalability, reliability, fault tolerance, load balancing, distributed architecture, high availability, event-driven architecture, and performance optimization; synonyms include distributed system design, scalable system architecture, large-scale system design, cloud architecture, and resilient backend architecture",
  "category": "engineering",
  "type": "technical_skill",
  "domain": [
    "backend",
    "distributed systems",
    "cloud",
    "microservices",
    "platform engineering",
    "enterprise software"
  ],

  "synonyms": [
    "distributed system design",
    "scalable system architecture",
    "large-scale system design",
    "solution architecture",
    "resilient backend architecture"
  ]
}

{
  "skill": "design scalable distributed systems",
  "embedding_text": "design scalable distributed systems involves architecting software that runs across multiple services, nodes, or regions while handling high traffic, failures, and growth; it is used in backend platforms, cloud-native applications, microservices, data-intensive systems, and enterprise-scale software; related concepts include system design, scalability, reliability, fault tolerance, load balancing, distributed architecture, high availability, event-driven architecture, and performance optimization; synonyms include distributed system design, scalable system architecture, large-scale system design, cloud architecture, and resilient backend architecture",
  "category": "engineering",
  "type": "technical_skill",
  "domain": [
    "backend",
    "distributed systems",
    "cloud",
    "microservices",
    "platform engineering",
    "enterprise software"
  ],

  "synonyms": [
    "distributed system design",
    "scalable system architecture",
    "large-scale system design",
    "solution architecture",
    "resilient backend architecture"
  ]
}

JSON

In this example, solution architecture (line 19 above ) already appears as a synonym, so I may be able to match it directly to software architecture (line 13 of skill”: “software architecture”).

If there is no direct synonym match, I use embeddings.

The flow is:

Embed the enriched embedding_text
Find the most similar existing requirements
Send only those few candidates to a cheap LLM
Ask it whether this is the same concept, a synonym, or a new requirement
Update the existing requirement or create a new one

This is the important part:

I am not sending the whole database into the prompt. I am using embeddings to reduce the search space, then using the LLM only for the small judgment call.

That is where the cost difference comes from.

My Current Workflow

My workflow in gogetcv.com currently looks like this:

Extract industry and requirements from the job description
Usually around 20 requirements.
Try to match directly against existing database requirements
This catches a large part of the obvious matches.
Enrich the remaining requirements and match using synonyms
When a synonym match is found, update the existing requirement.
For the remaining ones:
- embed the enriched text
- find similar existing requirements
- send only the top candidates to an LLM
- decide if it is a synonym or a new concept
If it is new:
- find the most common name for it
- save it as a new requirement
- store the enriched text and embedding

The nice thing is that the system improves over time.

Early on, the system needs more help from the LLM. But as the database matures, more matches happen directly or through synonyms. That means fewer LLM calls, lower cost, and more consistent results.

Most of the work becomes programmatic.

The LLM is not the whole system. It is just one component in the system.

That distinction matters.

Bonus: Do Not Embed JSON With False Values

One small but important lesson.

A lot of AI-generated advice will suggest embedding something like this:

{
  "text": "apple",
  "kind": "fruit",
  "eatable": true,
  "not": "square shaped, banana"
}

{
  "text": "apple",
  "kind": "fruit",
  "eatable": true,
  "not": "square shaped, banana"
}

JSON

I would avoid this.

Most embedding models do not really understand false the way your application logic does. They still see the words.

So by writing: “not square shaped“you may actually make the embedding more related to square shaped, not less.

Use metadata for programmatic filtering.

Use clean descriptive text for embeddings.

That separation is important.

For me, the better pattern is a sentence starting with base idea :

[object] + [category] + [function] + [important properties]

[object] + [category] + [function] + [important properties]

YAML

and den develop it to spesific of the context and application.

Embeddings are not magic. LLMs are not magic either. If you undrestand andcombine them with a bit of system design, you can build AI features that are cheaper, more consistent, and much more scalable.