AI Glossary

Definitions for the model, agent, business, and infrastructure terms used across aipedia.wiki.

101 terms visible

A

Affiliate Marketing

Business terms

Affiliate marketing is earning commission by promoting third-party products or services, with compensation typically tied to sales, clicks, or conversions. In the AI tools ecosystem, this model creates financial incentives that can influence product recommendations and editorial objectivity. AI tool review platforms frequently rely on affiliate revenue from major AI vendors and SaaS suites, making disclosure of these relationships essential for reader trust. See also: SEO, GEO, SaaS

Agentic AI

Agent systems

Agentic AI is an autonomous artificial intelligence system that accomplishes specific goals by reasoning, planning, and executing multi-step actions across tools and systems without continuous human intervention. This capability enables AI to operate proactively in complex, dynamic environments rather than simply responding to prompts or generating content. Claude Opus 4.8 with Computer Use, Gemini 3.1 Pro agents, and GPT-5.5-class OpenAI agents demonstrate agentic capabilities by autonomously breaking down tasks, making contextual decisions, and coordinating across multiple specialized agents to reach defined outcomes.

AGI

Model terms

AGI (Artificial General Intelligence) refers to a hypothetical AI that matches or exceeds human ability across essentially any cognitive task, rather than excelling at narrow ones. There is no agreed definition or test, and timelines are heavily debated. Today's frontier models are powerful but remain narrow and fallible relative to this bar. See also: Frontier Model, Foundation Model, AI Alignment

AI Agent

Agent systems

An AI agent is a system that uses a model to plan and take actions toward a goal, calling tools, reading the results, and iterating with limited human input. Agents range from simple tool-using assistants to autonomous multi-step workers. Reliability depends on tool design, permissions, and guardrails against errors and prompt injection. See also: Agentic AI, Multi-agent, Function Calling

AI Alignment

Model terms

AI alignment is the field focused on making AI systems pursue their operators' and society's intended goals and values rather than unintended ones. It spans training techniques, evaluation, and oversight, and grows more important as models become more capable and autonomous. RLHF and Constitutional AI are practical alignment methods. See also: RLHF, Hallucination, Reasoning Models

AI Bias

Model terms

AI bias is systematic unfairness in a model's outputs that reflects skew in its training data, objectives, or design, which can disadvantage particular groups. It matters most in high-stakes uses like hiring, lending, and healthcare. Mitigation spans data curation, evaluation, and human oversight. See also: Synthetic Data, Hallucination, AI Alignment

AI Copilot

Model terms

An AI copilot is an assistant embedded inside an application that helps with tasks in context, suggesting, drafting, or acting while the human stays in control. The term spans coding, writing, and productivity tools. It contrasts with fully autonomous agents by keeping a human in the loop. See also: Code Completion, Coding Agent, AI Agent

AI Orchestration

Model terms

AI orchestration is the coordination of multiple models, tools, and steps into a single reliable workflow, deciding what runs when, passing data between stages, and handling errors. It is what turns individual model calls into a dependable product. Agent frameworks and workflow tools both provide orchestration. See also: AI Agent, Multi-agent, Workflow Automation

API

Build terms

An API (Application Programming Interface) is a set of rules and protocols that enables software applications to communicate, exchange data, and access features from other systems.

APIs enable developers to integrate AI services into apps and workflows by sending programmatic requests for responses.

The OpenAI API processes prompts to GPT-5.5-class models; the Claude API handles queries to Claude Opus 4.8.

App Builder

Model terms

An app builder is an AI tool that turns a plain-language description into a working, often deployable application, handling UI, logic, and sometimes a database and hosting. It targets non-developers and rapid prototyping. Output usually still needs review before it becomes production software. See also: Vibe Coding, No-code/Low-code, Coding Agent

ARR

Business terms

Annual Recurring Revenue (ARR) is the normalized annual value of predictable subscription revenue from contracts, excluding one-time fees and overages.

ARR gauges financial health and growth potential for SaaS companies, including AI tools, enabling accurate forecasting and investor evaluation.

For example, ChatGPT reportedly reached $4B ARR by 2026.

Attention Mechanism

Model terms

The attention mechanism lets a model weigh how much each token should influence every other token when producing output, so it can focus on the most relevant parts of the input. Self-attention is the core operation inside a Transformer, and it is what allows long-range context to shape each prediction. Larger context windows depend on making attention efficient. See also: Transformer, Context Window, Tokens

B

Batch Processing

Model terms

Batch processing submits many requests together for asynchronous handling, often at a lower price than real-time calls, in exchange for slower turnaround. It suits bulk jobs like classification, summarization, or dataset generation where latency does not matter. Several providers offer a discounted batch tier. See also: API, Inference Cost, Rate Limit

Benchmark

Model terms

A benchmark is a standardized test used to measure and compare model capabilities on tasks like coding, math, reasoning, or tool use. Benchmarks guide model selection but can be gamed or saturated, so real-world evaluation on your own tasks still matters. Labs cite them heavily at launch. See also: Frontier Model, Reasoning Models, Foundation Model

C

Chain-of-Thought

Model terms

Chain-of-thought is a prompting and training technique where a model works through intermediate reasoning steps before giving a final answer, which improves accuracy on math, logic, and multi-step tasks. Reasoning models internalize this behavior and spend extra inference compute deliberating. You can also elicit it by asking a model to think step by step. See also: Reasoning Models, Test-Time Compute, Prompt Engineering

Chunking

Model terms

Chunking is splitting documents into smaller passages before embedding them, so retrieval can return focused, relevant pieces rather than whole files. Chunk size and overlap strongly affect retrieval quality. It is a key tuning decision in any retrieval-augmented generation pipeline. See also: RAG, Embedding, Vector Database

Code Completion

Model terms

Code completion is inline AI suggestion of the next code as you type, from a single line to a whole function, accepted with a keystroke. It is the most widely used AI coding feature and is distinct from an agent that completes multi-step tasks. It speeds routine coding without taking over control. See also: Coding Agent, AI Copilot, Vibe Coding

Coding Agent

Agent systems

A coding agent is an AI system that writes, edits, runs, and debugs code with limited human input, iterating against tests, builds, and error output. It goes beyond autocomplete to carry out multi-step development tasks. Examples include terminal and IDE agents like those in Cursor and Claude Code. See also: AI Agent, Vibe Coding, Cursor

Compliance

Model terms

Compliance is meeting the legal, regulatory, and security standards that apply to handling data and deploying software, such as SOC 2, GDPR, and HIPAA. For AI buyers it shapes which vendors and data flows are allowed. Enterprise AI plans often add the controls and certifications compliance requires. See also: PII, Zero Data Retention, SSO

Compute

Model terms

Compute is the processing power, usually GPUs or TPUs, used to train and run AI models, and it is one of the biggest costs and constraints in the field. Training frontier models needs vast clusters; serving them needs efficient inference. Compute availability shapes which models exist and how they are priced. See also: Inference, Parameters, Quantization

Computer Use

Agent systems

Computer Use is a capability in agentic AI systems that enables models to interact directly with computer interfaces by clicking buttons, typing text, and navigating screens. This extends AI agents beyond APIs to control visual UIs and legacy software for desktop automation. Claude Opus 4.8 demonstrates Computer Use by operating browsers and applications through screen observation and mouse actions. See also: Agentic AI, Multi-agent

Constitutional AI

Model terms

Constitutional AI is Anthropic's training approach that uses a written set of principles, a constitution, to guide a model toward helpful and harmless behavior with less direct human labeling. The model critiques and revises its own outputs against the principles. It is a notable alternative to relying solely on human preference labels. See also: RLHF, AI Alignment, Claude

Content Provenance

Model terms

Content provenance is metadata and cryptographic signing that records where a piece of media came from and how it was edited, with C2PA the leading open standard. It helps audiences and platforms tell AI-generated or manipulated content from authentic media. Provenance complements, but does not replace, detection. See also: Deepfake, Diffusion Model, Voice Cloning

Context Engineering

Build terms

Context engineering is the practice of deciding what information goes into a model's context window and how it is arranged: instructions, retrieved data, examples, and history. As context windows and agents grow, managing this well matters more than single-prompt wording. It includes retrieval, compaction, and ordering. See also: Context Window, RAG, Prompt Engineering

Context Window

Build terms

Context window is the maximum number of tokens a large language model processes at once, including prompts and conversation history, acting as its working memory. Larger windows enable handling of extended documents and sustained dialogues. As of June 2026, Claude Opus 4.8, Gemini 3.1 Pro, and GPT-5.5-class OpenAI API models all support long-context workflows, with exact limits varying by model, API, app surface, and plan. See also: Tokens, LLM

D

Deepfake

Model terms

A deepfake is synthetic image, video, or audio generated to convincingly impersonate a real person, often created with diffusion or voice-cloning models. Deepfakes raise fraud, misinformation, and consent concerns, which is why provenance and watermarking efforts are growing. Detection remains difficult and imperfect. See also: Voice Cloning, Diffusion Model, Multimodal

Diffusion Model

Model terms

A diffusion model generates images, audio, or video by starting from random noise and iteratively denoising it toward a result that matches the prompt. It is the dominant approach for AI image and video generation. Tools like Midjourney and many text-to-video systems rely on diffusion or diffusion-style methods. See also: Foundation Model, Multimodal, Midjourney

Digital Human

Model terms

A digital human is an AI-driven, often photoreal avatar that can speak, listen, and present, used for video, customer service, and training. It combines generated video, voice, and sometimes real-time conversation. Quality and consent controls vary, and it overlaps with deepfake concerns when it mimics real people. See also: Deepfake, Text-to-Video, Real-time Voice

E

Edge AI

Model terms

Edge AI runs models directly on a device or near it, such as a phone, laptop, or sensor, instead of in the cloud, cutting latency and keeping data local. It depends on small or quantized models that fit limited hardware. It is central to private, offline, and real-time use cases. See also: Local LLM, Quantization, Inference

Embedding

Build terms

Embedding is a numerical vector representation of text, images, audio, or other data that captures semantic meaning and relationships in multidimensional space.

This enables machines to quantify similarity between data points by measuring vector proximity, powering semantic search and AI applications.

For example, embeddings for "dog" and "puppy" cluster closely in vector space, while "dog" and "refrigerator" remain distant.

F

Few-shot Learning

Model terms

Few-shot learning is when you include a handful of examples in the prompt to show a model the pattern you want before asking it to continue. It often raises accuracy and consistency on formatting, classification, and edge-case tasks without any training. It is a core prompt-engineering technique. See also: Zero-shot Learning, In-context Learning, Prompt Engineering

Fine-tuning

Model terms

Fine-tuning is the process of adapting a pre-trained foundation model by further training it on a task-specific dataset to improve performance on targeted applications.

Fine-tuning leverages existing model knowledge to achieve superior results with less data and compute than training from scratch.

For example, fine-tuning a current GPT-5 family model on company support tickets can improve customer-service response accuracy.

Foundation Model

Model terms

A foundation model is a large AI model trained on broad data using self-supervision at scale that adapts to a wide range of downstream tasks.

These models form the base for specialized applications, enabling faster and cost-effective development.

Examples include GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro.

Frontier Model

Model terms

A frontier model is one of the most capable general-purpose AI models available at a given time, typically expensive to train and released by leading labs. The label tracks the moving edge of capability rather than a fixed threshold. As of 2026, examples include GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. See also: Foundation Model, LLM, ChatGPT

Function Calling

Model terms

Function calling lets a model return structured arguments that an application uses to run a real function or API, then feed the result back into the conversation. It is how assistants book a meeting, query a database, or fetch live data. It is the foundation of tool-using agents and is often paired with structured output. See also: AI Agent, MCP (Model Context Protocol), API

G

GEO

Business terms

Generative Engine Optimization (GEO) is the practice of structuring content so AI systems like ChatGPT, Claude Opus 4.8, and Gemini 3.1 Pro cite it in generated responses. This shifts visibility from search rankings to direct inclusion in AI-generated answers, making brand representation dependent on LLM synthesis rather than click-through traffic. Content optimization for GEO emphasizes clear structure, authoritative citations, comprehensive topic coverage, and natural language that LLMs can easily extract and reference, distinguishing it fundamentally from traditional SEO's focus on keyword ranking and backlinks.

Grounding

Model terms

Grounding ties a model's output to verifiable external information, such as retrieved documents or live data, so answers reflect real sources rather than only model memory. It reduces hallucination and enables citations. Retrieval-augmented generation is the most common grounding technique. See also: RAG, Hallucination, Semantic Search

Guardrails

Model terms

Guardrails are the safety constraints placed around a model to keep its behavior and output within acceptable bounds, through training, system prompts, input and output filtering, and tool permissions. They reduce harmful, off-topic, or unsafe responses. Effective guardrails balance safety with not over-refusing legitimate requests. See also: AI Alignment, Jailbreak, Constitutional AI

H

Hallucination

Trust and media

Hallucination is a response generated by an AI model that contains false or misleading information presented confidently as fact.

This undermines reliability in critical applications like healthcare, law, and education, where accuracy determines outcomes.

For example, a model might claim a current GPT release won two Nobel Prizes, though it won none.

I

In-context Learning

Build terms

In-context learning is a model's ability to pick up a task from information in the prompt at inference time, without updating any weights. Examples, instructions, and retrieved documents all shape the output through context alone. It is why larger context windows and good retrieval matter so much. See also: Few-shot Learning, Context Window, RAG

Inference

Build terms

Inference is the execution phase where a trained AI model analyzes new data to produce predictions, decisions, or generated outputs without learning anything new. This is where AI delivers real-world value, transforming learned patterns into actionable results at scale. When you send a prompt to Claude Opus 4.8 and receive a response, or when a GPT-5.5-class model generates text, that computational process is inference. Inference differs fundamentally from training: it requires only a forward pass through the model rather than parameter updates, making individual predictions far less computationally demanding than model development. Inference costs represent what users pay for API usage and depend on model size, input/output token length, and underlying hardware. Optimization techniques, including model quantization, prompt caching, and deploying smaller specialized models, have become critical for reducing inference expenses in production environments.

Inference Cost

Build terms

Inference cost is what it costs to run a model to produce output, usually billed per input and output token and driven by model size and context length. It is the main ongoing cost of an AI product, distinct from the one-time cost of training. Caching, smaller models, and batching all reduce it. See also: Inference, Tokens, Compute

J

Jailbreak

Model terms

A jailbreak is a prompt crafted to bypass a model's safety guardrails and make it produce restricted or disallowed output. Jailbreaks exploit role-play, obfuscation, or instruction conflicts. Labs counter them with training, guardrails, and red teaming, but it remains an ongoing cat-and-mouse problem. See also: Prompt Injection, Guardrails, Red Teaming

K

Knowledge Distillation

Model terms

Knowledge distillation trains a smaller, cheaper student model to imitate a larger teacher model, transferring much of its capability at a fraction of the size and cost. It is a common way to ship fast, affordable models derived from frontier ones. Distillation pairs well with quantization for efficient deployment. See also: Fine-tuning, Quantization, Parameters

Knowledge Graph

Model terms

A knowledge graph stores information as a network of entities and the relationships between them, enabling structured queries and reasoning over connected facts. Paired with language models, it can supply precise, relational grounding that plain text retrieval misses. It is one approach to reducing hallucination. See also: RAG, Semantic Search, Grounding

L

Latency

Build terms

Latency is the time delay between when an AI system receives an input and generates the corresponding output. This metric directly impacts user experience, with low latency enabling real-time interactions in conversational interfaces and autonomous systems. In Claude Opus 4.8 and GPT-5.5-class models, latency stems from data preprocessing, mathematical computations, data transfer between processing units, and postprocessing, with larger models typically exhibiting higher latency due to increased computational overhead. Reducing latency requires model compression, optimized inference code, hardware acceleration, and lower-precision numerical formats. Streaming responses decreases perceived latency by delivering tokens incrementally rather than waiting for complete generation.

LLM

Model terms

Large Language Model (LLM) is a deep learning neural network trained on vast text datasets to understand, generate, and process human-like natural language. LLMs underpin modern AI tools by enabling text generation, summarization, translation, and reasoning at scale. Examples include GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. See also: Foundation Model, Tokens, AI Writing Category

Local LLM

Model terms

A local LLM is a language model you run on your own hardware rather than a hosted API, giving full data control and no per-token fee in exchange for setup and compute. Open-weight models plus quantization make this practical on consumer machines. Tools like Ollama and LM Studio simplify it. See also: Open Weights, Edge AI, Quantization

LoRA

Model terms

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes pre-trained model weights and injects trainable low-rank decomposition matrices into Transformer layers.

It reduces compute and memory needs, enabling smaller teams to customize large models without full retraining.

LoRA customizes open-source models like Llama 4 and DeepSeek V3.2 for specific tasks.

M

MCP (Model Context Protocol)

Build terms

MCP (Model Context Protocol) is an open standard for connecting AI models to external tools, data sources, and services through a common interface, so integrations work across apps instead of being rebuilt for each one. It has become a widely adopted way to give assistants and agents access to files, APIs, and databases. See also: Function Calling, AI Agent, SDK

MoE (Mixture of Experts)

Model terms

MoE (Mixture of Experts) is a machine learning architecture that divides a neural network into specialized sub-networks called experts, with a gating network activating only relevant experts per input for efficiency.

This selective activation scales models to billions of parameters while reducing compute costs during training and inference.

Mixtral 2 and Grok 4.20 deploy MoE layers to match dense model performance at lower inference expense.

Multi-agent

Agent systems

A multi-agent system is a computational architecture of multiple autonomous AI agents that interact in a shared environment to achieve complex goals difficult for a single agent.

Multi-agent systems divide tasks among specialized agents for superior efficiency, scalability, and resilience in production workflows.

CrewAI 2 can orchestrate a research agent using an OpenAI model, a writing agent with Claude Opus 4.8, and a review agent for report generation.

Multimodal

Model terms

Multimodal AI processes and generates more than one type of data, such as text, images, audio, and video, within a single model or system. It lets you ask questions about a screenshot, generate an image from a description, or analyze a chart. GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro are all multimodal to varying degrees. See also: LLM, Foundation Model, Diffusion Model

Music Generation

Model terms

Music generation is AI creating original music, vocals, or sound from text prompts or other inputs. Modern tools produce full tracks with structure and lyrics, raising both creative possibilities and rights questions. It sits alongside image and video as a major generative-media category. See also: Diffusion Model, Multimodal, TTS (Text-to-Speech)

N

Negative Prompt

Model terms

A negative prompt tells an image or video model what to avoid, such as extra fingers, text artifacts, or a style you do not want. It complements the main prompt to steer results. Not every model supports it, and its effect varies by system. See also: Text-to-Image, Prompt Engineering, Diffusion Model

No-code/Low-code

Agent systems

No-code and low-code platforms enable building applications using visual drag-and-drop interfaces and pre-built components with minimal or no hand-coding required.

They accelerate development for developers and non-technical users, enabling rapid creation of custom software without deep programming expertise.

Bubble supports no-code web apps, while Retool provides low-code dashboards integrated with OpenAI APIs.

O

Open Source vs Closed Source

Model terms

Open Source vs Closed Source in AI distinguishes models with publicly available weights, architecture, code, and data from proprietary models where these elements remain confidential and accessible only via API or fee.

Open source enables self-hosting, fine-tuning, inspection, and privacy; closed source provides superior performance, updates, security, and ease of integration.

Examples include open source Llama 4, DeepSeek V3.2, and Mixtral 2 versus closed source GPT-5.5 and Claude Opus 4.8.

Open Weights

Model terms

Open weights means a model's trained parameters are publicly downloadable, so anyone can run, fine-tune, or self-host it, though the license may still restrict commercial use. It is distinct from fully open source, which also releases training code and data. DeepSeek and Mistral publish open-weight models. See also: Open Source vs Closed Source, Parameters, DeepSeek

Overfitting

Model terms

Overfitting happens when a model learns its training data too closely, including noise, and then performs poorly on new inputs. It signals poor generalization and is countered with more or cleaner data, regularization, and held-out evaluation. Benchmarks can mask it if a model has effectively memorized them. See also: Fine-tuning, Benchmark, Synthetic Data

P

Parameters

Model terms

Parameters are the learned numerical weights inside a neural network that encode what a model knows, adjusted during training and frozen at inference. Parameter count is a rough proxy for capacity, though architecture and training data matter just as much. Mixture-of-experts designs activate only a fraction of total parameters per token to cut cost. See also: Foundation Model, Open Weights, MoE (Mixture of Experts)

PII

Model terms

PII (personally identifiable information) is data that can identify a specific person, such as names, emails, or IDs. Sending PII to AI services raises privacy and legal obligations, so teams redact it, restrict retention, or self-host. Handling PII correctly is central to AI compliance. See also: Zero Data Retention, Compliance, Local LLM

Pretraining

Model terms

Pretraining is the initial, expensive phase where a model learns general patterns from a huge unlabeled dataset using self-supervision, before any task-specific tuning. It produces the base capabilities that fine-tuning and prompting later shape. Most of a frontier model's cost and knowledge comes from pretraining. See also: Foundation Model, Fine-tuning, Parameters

Prompt Caching

Model terms

Prompt caching stores the processed form of a stable prompt prefix so repeated requests that share it skip recomputation, cutting cost and latency. It pays off when a large system prompt or document is reused across many calls. Keeping the cached prefix byte-identical is what makes it work. See also: System Prompt, Tokens, Latency

Prompt Engineering

Model terms

Prompt engineering is the process of designing and refining natural language prompts to guide generative AI models, particularly large language models, toward producing accurate and desired outputs.

Prompt engineering optimizes AI performance without model retraining, enabling precise control over responses through techniques like few-shot prompting and chain-of-thought reasoning.

For example, Claude Opus 4.8 generates step-by-step solutions when prompted with "Think step by step" for complex math problems.

Prompt Injection

Model terms

Prompt injection is an attack where malicious text hidden in user input or external content tricks a model into ignoring its instructions or taking unintended actions. It is a leading security risk for AI agents that read web pages, emails, or documents. Defenses include input isolation, tool permission limits, and treating retrieved content as untrusted. See also: System Prompt, AI Agent, Agentic AI

Q

Quantization

Model terms

Quantization reduces the numerical precision of a model's weights, for example from 16-bit to 4-bit, to shrink its size and speed up inference, usually with a small accuracy cost. It is what makes large open-weight models runnable on consumer hardware. Many local LLM setups rely on quantized versions of frontier-class models. See also: Parameters, Inference, Open Weights

R

RAG

Build terms

Retrieval-Augmented Generation (RAG) is a technique that enables large language models to retrieve relevant information from external knowledge bases before generating responses.

RAG grounds outputs in current, domain-specific data to produce accurate responses without retraining the model.

For example, Claude Opus 4.8 uses RAG to query a company vector database for employee HR policies during a leave inquiry.

Rate Limit

Model terms

A rate limit caps how many requests or tokens you can send to an API in a given period, protecting providers from overload and controlling spend. Hitting one returns an error you handle with backoff and retries. Limits vary by plan and model and are a key planning factor for production apps. See also: API, Inference Cost, Latency

Real-time Voice

Trust and media

Real-time voice is low-latency, speech-to-speech conversation with an AI, where you talk and it replies almost immediately in a natural voice. It combines fast speech recognition, model reasoning, and speech synthesis in a tight loop. It is the basis of modern voice assistants and phone agents. See also: Speech-to-Text, TTS (Text-to-Speech), Latency

Reasoning Models

Model terms

Reasoning models are large language models trained to perform multi-step logical reasoning, breaking complex problems into chain-of-thought steps for superior accuracy on math, coding, and planning tasks. They enable reliable solutions to challenges beyond standard LLMs' pattern-matching capabilities. Examples include Claude Opus 4.8 with extended thinking and Gemini 3.1 Pro Thinking. See also: LLM, Prompt Engineering

Red Teaming

Model terms

Red teaming is the practice of deliberately attacking an AI system to find vulnerabilities, unsafe outputs, and jailbreaks before release. Internal and external red teams probe for harmful, biased, or exploitable behavior. Findings feed back into guardrails and training. See also: Jailbreak, Guardrails, AI Alignment

Reranking

Model terms

Reranking is a second retrieval step that reorders an initial set of candidate results by relevance to the query, usually with a model more precise than the first-pass search. It sharpens the context fed to a model in retrieval-augmented generation, improving answer quality. See also: Semantic Search, RAG, Embedding

RLHF

Model terms

RLHF (Reinforcement Learning from Human Feedback) is a training method that tunes a model using human preference judgments, rewarding outputs people rate as more helpful, honest, and harmless. It is a major reason modern assistants follow instructions well. Anthropic's Constitutional AI is a related approach that uses written principles to guide this feedback. See also: Fine-tuning, AI Alignment, Claude

S

SaaS

Business terms

SaaS is a cloud computing model where providers host and deliver applications over the internet on a subscription basis, managing all infrastructure and updates. This model enables AI tool users to access compute-intensive services without local installation or maintenance costs. Examples include ChatGPT Plus with GPT-5.5 access and Claude Pro or Max with Claude Opus 4.8 via browser apps. See also: ARR, API, MaaS

SDK

Build terms

Software Development Kit (SDK). Collection of tools, libraries, and documentation that simplifies building applications with an API by wrapping calls in language-specific functions.

SDKs accelerate development and reduce errors for developers integrating AI services.

Examples include Anthropic Python SDK for Claude Opus 4.8 and the OpenAI Node SDK for GPT-5.5-class models; the Claude Agent SDK adds frameworks for autonomous AI agents.

Semantic Search

Model terms

Semantic search finds results by meaning rather than exact keywords, by comparing vector embeddings of the query and the content. It returns relevant passages even when wording differs, which is why it underpins retrieval-augmented generation. It usually runs on a vector database. See also: Embedding, Vector Database, RAG

SEO

Business terms

Search Engine Optimization (SEO) is the practice of improving websites and web pages to increase visibility and organic traffic in unpaid search engine results pages (SERPs).

SEO drives targeted users searching for information, products, or services, boosting engagement, brand awareness, and conversions without paid ads.

Surfer SEO automates keyword research and on-page analysis for higher rankings.

Speech-to-Text

Model terms

Speech-to-text (STT), also called automatic speech recognition, converts spoken audio into written text. It powers transcription, captions, voice commands, and the listening half of voice assistants. Accuracy depends on audio quality, accents, and domain vocabulary. See also: TTS (Text-to-Speech), Real-time Voice, Multimodal

SSO

Model terms

SSO (single sign-on) lets users access multiple applications with one set of credentials managed by a central identity provider. In AI tools it is an enterprise requirement for security and user management, usually paired with provisioning and audit controls. It typically appears only on business and enterprise plans. See also: Compliance, SaaS, API

Streaming

Model terms

Streaming delivers a model's output token by token as it is generated rather than waiting for the full response. It cuts perceived latency, lets users read along, and avoids timeouts on long outputs. Most chat interfaces and many APIs stream by default. See also: Latency, Tokens, Inference

Structured Output

Model terms

Structured output constrains a model to return data in a defined shape, such as JSON matching a schema, so applications can parse it reliably. It removes brittle text scraping and is essential for tool use and pipelines. Many APIs enforce it with a schema or strict mode. See also: Function Calling, Tool Use, Prompt Engineering

Superintelligence

Model terms

Superintelligence is a hypothetical AI whose general capability far exceeds the best humans across virtually all domains, beyond even AGI. It is a central subject of long-term safety and governance debate. No such system exists today, and whether or when one could is deeply uncertain. See also: AGI, Frontier Model, AI Alignment

Synthetic Data

Model terms

Synthetic data is training or evaluation data generated by AI rather than collected from the real world. It helps cover rare cases, protect privacy, and scale datasets cheaply, but poor synthetic data can amplify bias or degrade quality. It is increasingly used to train and fine-tune models. See also: Fine-tuning, Foundation Model, AI Bias

System Prompt

Model terms

A system prompt is the instruction layer that sets a model's role, rules, and tone before any user message, shaping how it responds across a conversation. Applications use it to define behavior, safety boundaries, and output format. Because it carries operator authority and sits at the front of the context, keeping it stable also helps prompt caching. See also: Prompt Engineering, Prompt Injection, Context Window

T

Temperature

Model terms

Temperature is a sampling setting that controls how random a model's output is: low values make responses focused and deterministic, while high values increase variety and creativity. Lower temperature suits extraction and code; higher suits brainstorming. Note that some newer models restrict or remove this control in favor of effort settings. See also: Tokens, Prompt Engineering, Hallucination

Test-Time Compute

Model terms

Test-Time Compute allocates additional computational resources during model inference to enhance output quality through techniques like multiple sampling, search, or iterative refinement.

This scales performance on complex tasks by trading inference time and hardware for superior accuracy and reasoning.

Examples include GPT-5.5-class models allocating extra reasoning tokens and Claude Opus 4.8 using search-like deliberation patterns.

Text-to-Image

Model terms

Text-to-image generation creates pictures from a written prompt, typically using diffusion models. Quality depends on the prompt, the model, and settings like aspect ratio and seed. It is one of the most popular consumer AI uses, led by tools such as Midjourney. See also: Diffusion Model, Multimodal, Midjourney

Text-to-Video

Model terms

Text-to-video generation produces short video clips from a written prompt, an advancing frontier that is more compute-intensive and harder to control than image generation. Outputs are improving fast in coherence and length. Tools like Runway and several frontier video models compete here. See also: Diffusion Model, Text-to-Image, Runway

Tokens

Build terms

Tokens are the discrete units of text that large language models break down and process, representing words, subwords, punctuation, or character combinations. Token count directly determines both computational cost and the maximum input length a model can accept within its context window. For common English text, roughly 750 words equal 1,000 tokens, making token estimation essential for API budgeting and prompt design.

Tool Use

Model terms

Tool use is a model's ability to call external functions, APIs, or services to act and fetch information beyond its training, then incorporate the results. It is what turns a chatbot into an agent that can search, calculate, or change real systems. Function calling and MCP are the common mechanisms. See also: Function Calling, AI Agent, MCP (Model Context Protocol)

Transfer Learning

Model terms

Transfer learning reuses a model trained on one broad task as the starting point for another, so you adapt existing knowledge instead of training from scratch. It is why a single pretrained foundation model can be fine-tuned for many specialized uses with far less data and compute. See also: Pretraining, Fine-tuning, Foundation Model

Transformer

Model terms

The Transformer is the neural network architecture behind modern large language models, introduced in 2017, that uses self-attention to weigh the relationships between all tokens in a sequence in parallel. It replaced earlier recurrent approaches and made it practical to train models on internet-scale text. Nearly every current frontier model, including GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro, is Transformer-based. See also: Foundation Model, Attention Mechanism, LLM

TTS (Text-to-Speech)

Trust and media

TTS (Text-to-Speech) converts written text into spoken audio using speech synthesis technology. Modern AI TTS enables scalable voice content creation for accessibility, audiobooks, virtual assistants, and customer service. ElevenLabs v3 and OpenAI TTS-2 produce human-like speech with emotion and natural pacing. See also: Voice Cloning, ElevenLabs, Voxtral

U

Usage-based Pricing

Model terms

Usage-based pricing charges by how much you consume, such as tokens, credits, API calls, or compute time, rather than a flat subscription. It is common for AI products because inference cost scales with use. It rewards light users but can make heavy or unpredictable workloads hard to budget. See also: Inference Cost, SaaS, Rate Limit

V

Vector Database

Build terms

A vector database stores, indexes, and queries high-dimensional vector embeddings representing unstructured data like text, images, or audio for efficient similarity search.

Vector databases enable low-latency semantic retrieval essential for RAG systems and generative AI applications.

Pinecone stores embeddings from OpenAI, Anthropic-compatible, or open-weight embedding models for querying relevant passages in enterprise RAG pipelines.

Vibe Coding

Agent systems

Vibe coding is a software development practice where developers describe tasks in natural language prompts to AI large language models, which generate, refine, and debug code.

It accelerates prototyping and experimentation by shifting focus from manual coding to guiding AI outputs.

Andrej Karpathy coined the term in February 2025, exemplified by using Claude Opus 4.8 or Cursor 2 to build MVPs from conversational descriptions.

Voice Cloning

Trust and media

Voice cloning replicates a specific person's voice using AI trained on audio samples to synthesize realistic speech matching their tone, accent, and inflections. This technology enables scalable content creation and accessibility tools while posing risks of fraud and deepfakes without consent safeguards. Professional voice-clone workflows can require longer consented samples, while instant voice methods in voice platforms use much shorter clips and need stricter consent controls. See also: TTS, ElevenLabs

W

Watermarking

Model terms

Watermarking embeds a detectable signal into AI-generated text, images, audio, or video so it can later be identified as machine-made. It supports provenance and abuse detection, but signals can be weakened by editing and are not foolproof. It complements, rather than replaces, content provenance standards. See also: Content Provenance, Deepfake, Diffusion Model

Workflow Automation

Agent systems

Workflow automation uses software to execute multi-step business processes automatically based on triggers, rules, actions, and logic, minimizing human intervention.

It enables faster operations, reduces errors, and frees teams for high-value work.

For example, Zapier can call an OpenAI model to generate social posts, then schedule them via Make.

Z

Zero Data Retention

Model terms

Zero data retention (ZDR) is a provider arrangement where prompts and outputs are not stored after a request is served, important for privacy-sensitive and regulated workloads. It is usually limited to specific API or enterprise routes rather than consumer apps, so verify scope before relying on it. Self-hosting is the stronger control. See also: API, SaaS, Local LLM

Zero-shot Learning

Model terms

Zero-shot learning is when a model performs a task it was not explicitly given examples for, relying only on its pretraining and the instruction in the prompt. Modern frontier models are strong zero-shot, which is why a plain instruction often works. Adding examples (few-shot) can still improve hard or unusual tasks. See also: Few-shot Learning, In-context Learning, Prompt Engineering