Terms in this glossary
- AI Agent
- Orchestration
- Human-in-the-loop
- RAG
- Digital Worker
- GaaS
- LLM
- MCP
- Agent Pipeline
- Persistent State
- Embedding
- GEO
- Fine-tuning
- Hallucination
- Tool
- Context window
- Data sovereignty
- GDPR and AI
AI Agent
AI Agent
An AI agent is a system that receives a goal, reasons about how to achieve it, and executes actions autonomously—consulting systems, making decisions, and acting without a human having to guide it step by step.
The critical difference from a chatbot or a conventional assistant is the ability to execute. A chatbot responds. An agent acts. An agent can create an order in the ERP, send an email, open a ticket, or update a database—not just generate text about how to do it.
A well-designed agent has: a clear identity (who it is and how it behaves), a specific objective (what counts as “work completed”), available tools (what actions it can perform), hard constraints (what it can never do), and an escalation mechanism (when to stop and request human assistance).
Example in B2B distribution: an order agent receives a PDF via email, extracts the data, validates the SKUs against the ERP, creates the order, and notifies the customer—without human intervention in standard cases.
Orchestration
Orchestration / Agentic Orchestration
Orchestration is the mechanism that coordinates multiple specialized agents so they work together coherently. In a well-orchestrated system, each agent does its part, writes the result to a central database, and the next agent starts when the previous one finishes.
Autonomous orchestration means this coordination occurs without human intervention in most steps. Agents do not call each other directly—the orchestrator (a system like Trigger.dev, n8n, or Celery) listens for state changes and triggers the next agent at the right time.
This ensures resilience (if an agent fails, the process resumes where it left off), traceability (every state transition is logged), and decoupling (agents do not need to know about each other).
Example: In Kelmia Offers, orchestration coordinates the agent that builds the customer profile, the one that searches for twins, the one that selects products, and the one that calculates prices—each triggered automatically when the previous one finishes.
Human-in-the-loop
HITL
Human-in-the-loop (HITL) is the mechanism by which an agent system pauses at specific points to request human validation or approval before continuing. It is not a system failure—it is a design decision that defines which decisions require human judgment and which do not.
A good HITL is triggered when: the action has direct financial consequences, the action is irreversible, the action affects the client’s reputation, or the agent detects something outside the usual pattern. The agent doesn’t guess—it scales with full context.
HITL is not a limitation of the system. It is the product. A system that knows when to ask generates more trust than one that acts blindly.
Example: In Kelmia Offers, the sales representative always approves the offer before it is sent. The system generates the complete proposal in seconds—the human reviews it, makes adjustments if necessary, and approves it. Sending it to the client is always the sales representative’s responsibility.
RAG
Retrieval-Augmented Generation
RAG is a technique that combines a language model (LLM) with an external knowledge base. Instead of relying solely on what the model learned during training, the system searches for relevant information in real time within documents, databases, or catalogs—and injects it into the model’s context before generating the response.
For B2B companies, RAG is the foundation of product assistants, semantic search systems in catalogs, and any application where the model needs to respond with specific, up-to-date company information—not general knowledge.
The difference between a well-implemented RAG and a mediocre one lies in how the information is indexed. A technical data sheet fragmented into chunks loses context. A complete data sheet treated as a single retrieval unit preserves meaning.
Example: Kelmia Products uses RAG so that the catalog assistant can answer questions like “Do you have powder-free gloves for food contact in boxes of 100?”—by consulting the distributor’s actual catalog, not generic knowledge about gloves.
Digital Worker
A digital worker is an AI agent designed to perform a complete operational role within a company—not isolated tasks, but entire workflows. An order worker processes orders from start to finish. An invoicing worker manages the full approval and accounting cycle.
The difference from traditional automation (RPA, scripts) is the ability to make decisions with context. A script follows instructions. A digital worker reasons: when faced with incomplete information, it applies business logic; when faced with an anomaly, it escalates; when faced with an exception, it decides.
Digital workers do not replace the human team—they refocus their attention. The team stops managing day-to-day routine tasks and focuses on exceptions, strategic decisions, and customer relationships.
Example: A B2B distributor with 300 daily orders via email and PDF. The digital worker processes them in seconds, validates each customer’s terms, and creates the order in the ERP. The human team only intervenes when there is a genuine exception.
GaaS
Agentic as a Service
GaaS is a service model in which a company continuously operates an agent system for a client—and the client receives value directly, without needing to learn a tool or configure anything.
The difference from traditional SaaS is fundamental: in SaaS, the client uses the tool. In GaaS, the system works for the client. The client doesn’t log into a dashboard to generate analytics—they receive the analytics when they’re relevant, approve plans when their judgment is required, and follow up when there’s something to report.
The metric for success changes radically. In SaaS, DAU (daily active users) is measured. In GaaS, real operational impact is measured: reduction in incidents, time saved, improved margin.
Example: Kelmia Pulse operates as GaaS. The system analyzes the customer’s operations every week, detects patterns, and proposes action plans. The customer doesn’t configure anything—they receive, decide, and follow up.
LLM
Large Language Model
An LLM is an artificial intelligence model trained on large volumes of text to understand and generate natural language. Claude, GPT, Gemini, and Llama are examples of LLMs.
In the context of business agent systems, the LLM is the reasoning engine—the component that interprets information, makes decisions, and generates responses. But an LLM alone is not an operating system. It needs tools, memory, orchestration, and constraints to function reliably in production.
Not all LLMs are equally suited for every use case. The most powerful models (Claude Opus, GPT-4) are better for complex strategic decisions. Lighter models (Haiku, on-premises models) are more efficient for high-volume, low-risk tasks. Using the right model for each task is an architectural decision, not a matter of preference.
MCP
Model Context Protocol
MCP is an open standard protocol, developed by Anthropic, that allows AI models to connect to external systems in a standardized way. It is the equivalent of USB for AI: a universal connector that enables any AI assistant (Claude, ChatGPT, Gemini) to access data and perform actions on external systems without the need for custom integrations for each combination.
For businesses, MCP means their team can use the AI assistant of their choice and connect it to their systems—ERP, CRM, catalog—through a single integration point. No copying and pasting data between applications. No losing control over which data is shared.
Example: Kelmia Connect uses MCP to give Claude real-time access to a distributor’s ERP. The sales rep types in Claude’s chat and receives real data from the ERP—without leaving Claude, without exporting to Excel.
Agent Pipeline
Agent Pipeline / Multi-agent Pipeline
An agent pipeline is a chain of specialized agents working in sequence, where each agent’s output serves as the input for the next. Each agent does one thing and does it well—rather than a generalist agent trying to do everything.
Specialization is key. An agent that builds customer profiles is better at that than one that also tries to select products and calculate prices. Specialization improves quality, facilitates maintenance, and allows an agent to be replaced or improved without affecting the rest.
A well-designed pipeline is resilient: if an agent fails, the state is saved in the database and the process can resume where it left off—without losing work or starting from scratch.
Persistent State
Persistent State
Persistent state is the mechanism by which an agent system remembers what happened between executions. LLMs have no memory between conversations—each execution starts from scratch. Persistent state solves this by saving the relevant context in an external database.
In practice, this means that an agent running today knows what happened yesterday—what decisions were made, what data was processed, what plans are active. The system doesn’t start from scratch in every cycle—it accumulates context and improves over time.
Persistent state also ensures resilience: if the server goes down in the middle of an execution, the process can resume exactly where it left off, with no data loss or duplicate work.
Embedding
Vector Embedding
An embedding is a numerical representation of a text (or image, or document) in a high-dimensional vector space. Texts with similar meanings have close embeddings in that space—allowing for semantic similarity search rather than exact keyword matching.
For B2B companies, embeddings are the foundation of semantic search in catalogs, the identification of similar customers, and any system that needs to find “the closest match” to something. Keyword search fails with synonyms, typos, and informal descriptions. Embedding-based search understands meaning.
Example: Kelmia Offers uses embeddings to find “twin” customers—companies with similar profiles in the distributor’s portfolio, even if they use different terminology to describe their business.
GEO
Generative Engine Optimization
GEO is the discipline of optimizing content so that generative AI models (ChatGPT, Claude, Perplexity, Gemini) can find it, understand it, and cite it as a reference source in their responses. It is the equivalent of SEO for the era of conversational search.
While traditional SEO optimizes so that Google displays your page in search results, GEO optimizes so that when someone asks an AI about a topic related to your business, the AI cites your content or recommends your solution.
The factors that most influence GEO are: clear semantic structure, explicit definitions, verifiable data, recognizable authorship, and content that answers specific questions—not generic filler content. Glossaries, technical guides, and use cases are the formats that perform best with GEO.
Fine-tuning
Fine-tuning
Fine-tuning is the process of further training a base language model with domain- or company-specific data so that it adopts a specific style, terminology, or behavior. It is like specialized training for an employee who already has a general education.
For most business use cases, fine-tuning is unnecessary—and often counterproductive. It is expensive, requires high-quality data in large volumes, and the benefits are usually achieved more efficiently with well-implemented RAG and well-designed prompts.
Fine-tuning makes sense when you need a very specific style, highly technical terminology that the base model doesn’t handle well, or when the volume of inferences is so high that the base model’s efficiency matters. In all other cases, RAG + good prompt engineering is the right answer.
Hallucination
Hallucination
A hallucination occurs when an AI model generates information that appears correct and is presented with confidence, but is false or fabricated. The model does not “know that it does not know”—it produces plausible text even when it lacks actual evidence to support it.
Hallucinations are the most critical risk in operational AI systems. A made-up figure in a commercial quote, an incorrect price on an order, or a wrong reference on an invoice have real and costly consequences.
The correct way to manage hallucinations is not to expect the model to “improve” over time—it is to design the system so that the model never has to make things up. A well-designed agent only responds with information that is within its context. If it doesn’t have the data, it says it doesn’t have it.
Example: The Kelmia Products assistant explicitly responds, “I cannot find any products in the catalog that exactly match these criteria” when there are no results—rather than recommending a product that does not exist or does not meet the requirements.
Tool
Tool / Function Calling
A tool is an action an agent can perform in the real world—querying an API, writing to a database, sending an email, creating an order in the ERP. Tools are what turn an LLM from a text generator into an agent that takes action.
A well-designed agent has an explicit and closed list of available tools. Outside of that list, the agent cannot do anything. This ensures that the system is predictable and auditable—and eliminates the risk of the agent performing unauthorized actions.
The principle of least privilege applies just as it does in computer security: each agent has access only to the tools it needs for its role, not to the entire system.
Context Window
Context Window
The context window is the maximum amount of information an LLM can process in a single run—the equivalent of the model’s “working memory.” Everything the model needs to know to respond must fit within that window.
For enterprise agent systems, context management is a critical architectural decision. Loading too much information increases costs and can degrade quality. Loading too little means the agent makes decisions without the necessary information.
The solution is to load exactly the context relevant to each run—not the entire history. An agent analyzing incidents from the last 8 weeks does not need the history from the last 3 years.
Data Sovereignty
Data Sovereignty
Data sovereignty is the control a company has over where its data is processed, stored, and transmitted—especially when using external AI services. In the context of generative AI, it involves deciding which data can be sent to external APIs and which must be processed locally.
For B2B companies, the most sensitive data includes: margins and commercial terms, customer data containing tax ID numbers and financial amounts, pricing strategies, and data subject to industry regulation. This data should not be processed via external APIs without prior anonymization.
Sovereignty levels range from using external APIs under data protection agreements (Standard), to anonymizing identifiable data before it leaves the system (Privacy), to complete processing on proprietary infrastructure without external APIs (Sovereign).
GDPR and AI
GDPR and AI
The General Data Protection Regulation (GDPR) establishes obligations regarding how companies collect, process, and store personal data of European citizens. With AI, these obligations extend to how that data is used to train models or generate responses.
The most relevant points for B2B companies using AI are: the legal basis for processing (consent or legitimate interest), the right to be forgotten (a model that has “learned” data from a customer may be difficult to “unlearn”), transparency regarding the use of AI in automated decisions, and the location of data processing.
The safest way to use generative AI with personal data is to ensure that identifiable data never reaches the external model—either by anonymizing it before it leaves or by processing it on proprietary infrastructure within the EU.