Skip to main content

AI Glossary

Definitions for the model, agent, business, and infrastructure terms used across aipedia.wiki.

101 terms visible

A

Affiliate Marketing

#

Business terms

Affiliate marketing is earning commission by promoting third-party products or services, with compensation typically tied to sales, clicks, or conversions. In the AI tools ecosystem, this model creates financial incentives that can influence product recommendations and editorial objectivity. AI tool review platforms frequently rely on affiliate revenue from major AI vendors and SaaS suites, making disclosure of these relationships essential for reader trust. See also: SEO, GEO, SaaS

Agentic AI

#

Agent systems

Agentic AI is an autonomous artificial intelligence system that accomplishes specific goals by reasoning, planning, and executing multi-step actions across tools and systems without continuous human intervention. This capability enables AI to operate proactively in complex, dynamic environments rather than simply responding to prompts or generating content. Claude Opus 4.8 with Computer Use, Gemini 3.1 Pro agents, and GPT-5.5-class OpenAI agents demonstrate agentic capabilities by autonomously breaking down tasks, making contextual decisions, and coordinating across multiple specialized agents to reach defined outcomes. See also: Multi-agent, Workflow Automation, Large Language Model, Autonomous Agent

AGI

#

Model terms

AGI (Artificial General Intelligence) refers to a hypothetical AI that matches or exceeds human ability across essentially any cognitive task, rather than excelling at narrow ones. There is no agreed definition or test, and timelines are heavily debated. Today's frontier models are powerful but remain narrow and fallible relative to this bar. See also: Frontier Model, Foundation Model, AI Alignment

AI Agent

#

Agent systems

An AI agent is a system that uses a model to plan and take actions toward a goal, calling tools, reading the results, and iterating with limited human input. Agents range from simple tool-using assistants to autonomous multi-step workers. Reliability depends on tool design, permissions, and guardrails against errors and prompt injection. See also: Agentic AI, Multi-agent, Function Calling

AI Alignment

#

Model terms

AI alignment is the field focused on making AI systems pursue their operators' and society's intended goals and values rather than unintended ones. It spans training techniques, evaluation, and oversight, and grows more important as models become more capable and autonomous. RLHF and Constitutional AI are practical alignment methods. See also: RLHF, Hallucination, Reasoning Models

AI Bias

#

Model terms

AI bias is systematic unfairness in a model's outputs that reflects skew in its training data, objectives, or design, which can disadvantage particular groups. It matters most in high-stakes uses like hiring, lending, and healthcare. Mitigation spans data curation, evaluation, and human oversight. See also: Synthetic Data, Hallucination, AI Alignment

AI Copilot

#

Model terms

An AI copilot is an assistant embedded inside an application that helps with tasks in context, suggesting, drafting, or acting while the human stays in control. The term spans coding, writing, and productivity tools. It contrasts with fully autonomous agents by keeping a human in the loop. See also: Code Completion, Coding Agent, AI Agent

AI Orchestration

#

Model terms

AI orchestration is the coordination of multiple models, tools, and steps into a single reliable workflow, deciding what runs when, passing data between stages, and handling errors. It is what turns individual model calls into a dependable product. Agent frameworks and workflow tools both provide orchestration. See also: AI Agent, Multi-agent, Workflow Automation

API

#

Build terms

An API (Application Programming Interface) is a set of rules and protocols that enables software applications to communicate, exchange data, and access features from other systems. APIs enable developers to integrate AI services into apps and workflows by sending programmatic requests for responses. The OpenAI API processes prompts to GPT-5.5-class models; the Claude API handles queries to Claude Opus 4.8. See also: SDK, Tokens, Workflow Automation

App Builder

#

Model terms

An app builder is an AI tool that turns a plain-language description into a working, often deployable application, handling UI, logic, and sometimes a database and hosting. It targets non-developers and rapid prototyping. Output usually still needs review before it becomes production software. See also: Vibe Coding, No-code/Low-code, Coding Agent

ARR

#

Business terms

Annual Recurring Revenue (ARR) is the normalized annual value of predictable subscription revenue from contracts, excluding one-time fees and overages. ARR gauges financial health and growth potential for SaaS companies, including AI tools, enabling accurate forecasting and investor evaluation. For example, ChatGPT reportedly reached $4B ARR by 2026. See also: SaaS, MRR

Attention Mechanism

#

Model terms

The attention mechanism lets a model weigh how much each token should influence every other token when producing output, so it can focus on the most relevant parts of the input. Self-attention is the core operation inside a Transformer, and it is what allows long-range context to shape each prediction. Larger context windows depend on making attention efficient. See also: Transformer, Context Window, Tokens

B

Batch Processing

#

Model terms

Batch processing submits many requests together for asynchronous handling, often at a lower price than real-time calls, in exchange for slower turnaround. It suits bulk jobs like classification, summarization, or dataset generation where latency does not matter. Several providers offer a discounted batch tier. See also: API, Inference Cost, Rate Limit

C

Chain-of-Thought

#

Model terms

Chain-of-thought is a prompting and training technique where a model works through intermediate reasoning steps before giving a final answer, which improves accuracy on math, logic, and multi-step tasks. Reasoning models internalize this behavior and spend extra inference compute deliberating. You can also elicit it by asking a model to think step by step. See also: Reasoning Models, Test-Time Compute, Prompt Engineering

Chunking

#

Model terms

Chunking is splitting documents into smaller passages before embedding them, so retrieval can return focused, relevant pieces rather than whole files. Chunk size and overlap strongly affect retrieval quality. It is a key tuning decision in any retrieval-augmented generation pipeline. See also: RAG, Embedding, Vector Database

Code Completion

#

Model terms

Code completion is inline AI suggestion of the next code as you type, from a single line to a whole function, accepted with a keystroke. It is the most widely used AI coding feature and is distinct from an agent that completes multi-step tasks. It speeds routine coding without taking over control. See also: Coding Agent, AI Copilot, Vibe Coding

Coding Agent

#

Agent systems

A coding agent is an AI system that writes, edits, runs, and debugs code with limited human input, iterating against tests, builds, and error output. It goes beyond autocomplete to carry out multi-step development tasks. Examples include terminal and IDE agents like those in Cursor and Claude Code. See also: AI Agent, Vibe Coding, Cursor

Compliance

#

Model terms

Compliance is meeting the legal, regulatory, and security standards that apply to handling data and deploying software, such as SOC 2, GDPR, and HIPAA. For AI buyers it shapes which vendors and data flows are allowed. Enterprise AI plans often add the controls and certifications compliance requires. See also: PII, Zero Data Retention, SSO

Compute

#

Model terms

Compute is the processing power, usually GPUs or TPUs, used to train and run AI models, and it is one of the biggest costs and constraints in the field. Training frontier models needs vast clusters; serving them needs efficient inference. Compute availability shapes which models exist and how they are priced. See also: Inference, Parameters, Quantization

Computer Use

#

Agent systems

Computer Use is a capability in agentic AI systems that enables models to interact directly with computer interfaces by clicking buttons, typing text, and navigating screens. This extends AI agents beyond APIs to control visual UIs and legacy software for desktop automation. Claude Opus 4.8 demonstrates Computer Use by operating browsers and applications through screen observation and mouse actions. See also: Agentic AI, Multi-agent

Constitutional AI

#

Model terms

Constitutional AI is Anthropic's training approach that uses a written set of principles, a constitution, to guide a model toward helpful and harmless behavior with less direct human labeling. The model critiques and revises its own outputs against the principles. It is a notable alternative to relying solely on human preference labels. See also: RLHF, AI Alignment, Claude

Content Provenance

#

Model terms

Content provenance is metadata and cryptographic signing that records where a piece of media came from and how it was edited, with C2PA the leading open standard. It helps audiences and platforms tell AI-generated or manipulated content from authentic media. Provenance complements, but does not replace, detection. See also: Deepfake, Diffusion Model, Voice Cloning

Context Engineering

#

Build terms

Context engineering is the practice of deciding what information goes into a model's context window and how it is arranged: instructions, retrieved data, examples, and history. As context windows and agents grow, managing this well matters more than single-prompt wording. It includes retrieval, compaction, and ordering. See also: Context Window, RAG, Prompt Engineering

Context Window

#

Build terms

Context window is the maximum number of tokens a large language model processes at once, including prompts and conversation history, acting as its working memory. Larger windows enable handling of extended documents and sustained dialogues. As of June 2026, Claude Opus 4.8, Gemini 3.1 Pro, and GPT-5.5-class OpenAI API models all support long-context workflows, with exact limits varying by model, API, app surface, and plan. See also: Tokens, LLM

D

Deepfake

#

Model terms

A deepfake is synthetic image, video, or audio generated to convincingly impersonate a real person, often created with diffusion or voice-cloning models. Deepfakes raise fraud, misinformation, and consent concerns, which is why provenance and watermarking efforts are growing. Detection remains difficult and imperfect. See also: Voice Cloning, Diffusion Model, Multimodal

Diffusion Model

#

Model terms

A diffusion model generates images, audio, or video by starting from random noise and iteratively denoising it toward a result that matches the prompt. It is the dominant approach for AI image and video generation. Tools like Midjourney and many text-to-video systems rely on diffusion or diffusion-style methods. See also: Foundation Model, Multimodal, Midjourney

Digital Human

#

Model terms

A digital human is an AI-driven, often photoreal avatar that can speak, listen, and present, used for video, customer service, and training. It combines generated video, voice, and sometimes real-time conversation. Quality and consent controls vary, and it overlaps with deepfake concerns when it mimics real people. See also: Deepfake, Text-to-Video, Real-time Voice

E

Edge AI

#

Model terms

Edge AI runs models directly on a device or near it, such as a phone, laptop, or sensor, instead of in the cloud, cutting latency and keeping data local. It depends on small or quantized models that fit limited hardware. It is central to private, offline, and real-time use cases. See also: Local LLM, Quantization, Inference

Embedding

#

Build terms

Embedding is a numerical vector representation of text, images, audio, or other data that captures semantic meaning and relationships in multidimensional space. This enables machines to quantify similarity between data points by measuring vector proximity, powering semantic search and AI applications. For example, embeddings for "dog" and "puppy" cluster closely in vector space, while "dog" and "refrigerator" remain distant. See also: Vector Database, RAG

F

Fine-tuning

#

Model terms

Fine-tuning is the process of adapting a pre-trained foundation model by further training it on a task-specific dataset to improve performance on targeted applications. Fine-tuning leverages existing model knowledge to achieve superior results with less data and compute than training from scratch. For example, fine-tuning a current GPT-5 family model on company support tickets can improve customer-service response accuracy. See also: LoRA, Foundation Model, Prompt Engineering

Foundation Model

#

Model terms

A foundation model is a large AI model trained on broad data using self-supervision at scale that adapts to a wide range of downstream tasks. These models form the base for specialized applications, enabling faster and cost-effective development. Examples include GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. See also: LLM, Fine-tuning

Frontier Model

#

Model terms

A frontier model is one of the most capable general-purpose AI models available at a given time, typically expensive to train and released by leading labs. The label tracks the moving edge of capability rather than a fixed threshold. As of 2026, examples include GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. See also: Foundation Model, LLM, ChatGPT

Function Calling

#

Model terms

Function calling lets a model return structured arguments that an application uses to run a real function or API, then feed the result back into the conversation. It is how assistants book a meeting, query a database, or fetch live data. It is the foundation of tool-using agents and is often paired with structured output. See also: AI Agent, MCP (Model Context Protocol), API

G

GEO

#

Business terms

Generative Engine Optimization (GEO) is the practice of structuring content so AI systems like ChatGPT, Claude Opus 4.8, and Gemini 3.1 Pro cite it in generated responses. This shifts visibility from search rankings to direct inclusion in AI-generated answers, making brand representation dependent on LLM synthesis rather than click-through traffic. Content optimization for GEO emphasizes clear structure, authoritative citations, comprehensive topic coverage, and natural language that LLMs can easily extract and reference, distinguishing it fundamentally from traditional SEO's focus on keyword ranking and backlinks. See also: SEO, Answer Engine Optimization, Large Language Model, AI Overviews

Grounding

#

Model terms

Grounding ties a model's output to verifiable external information, such as retrieved documents or live data, so answers reflect real sources rather than only model memory. It reduces hallucination and enables citations. Retrieval-augmented generation is the most common grounding technique. See also: RAG, Hallucination, Semantic Search

Guardrails

#

Model terms

Guardrails are the safety constraints placed around a model to keep its behavior and output within acceptable bounds, through training, system prompts, input and output filtering, and tool permissions. They reduce harmful, off-topic, or unsafe responses. Effective guardrails balance safety with not over-refusing legitimate requests. See also: AI Alignment, Jailbreak, Constitutional AI

H

Hallucination

#

Trust and media

Hallucination is a response generated by an AI model that contains false or misleading information presented confidently as fact. This undermines reliability in critical applications like healthcare, law, and education, where accuracy determines outcomes. For example, a model might claim a current GPT release won two Nobel Prizes, though it won none. See also: RAG, LLM

I

In-context Learning

#

Build terms

In-context learning is a model's ability to pick up a task from information in the prompt at inference time, without updating any weights. Examples, instructions, and retrieved documents all shape the output through context alone. It is why larger context windows and good retrieval matter so much. See also: Few-shot Learning, Context Window, RAG

Inference

#

Build terms

Inference is the execution phase where a trained AI model analyzes new data to produce predictions, decisions, or generated outputs without learning anything new. This is where AI delivers real-world value, transforming learned patterns into actionable results at scale. When you send a prompt to Claude Opus 4.8 and receive a response, or when a GPT-5.5-class model generates text, that computational process is inference. Inference differs fundamentally from training: it requires only a forward pass through the model rather than parameter updates, making individual predictions far less computationally demanding than model development. Inference costs represent what users pay for API usage and depend on model size, input/output token length, and underlying hardware. Optimization techniques, including model quantization, prompt caching, and deploying smaller specialized models, have become critical for reducing inference expenses in production environments. See also: Training, Tokens, Latency, API, Quantization

Inference Cost

#

Build terms

Inference cost is what it costs to run a model to produce output, usually billed per input and output token and driven by model size and context length. It is the main ongoing cost of an AI product, distinct from the one-time cost of training. Caching, smaller models, and batching all reduce it. See also: Inference, Tokens, Compute

J

Jailbreak

#

Model terms

A jailbreak is a prompt crafted to bypass a model's safety guardrails and make it produce restricted or disallowed output. Jailbreaks exploit role-play, obfuscation, or instruction conflicts. Labs counter them with training, guardrails, and red teaming, but it remains an ongoing cat-and-mouse problem. See also: Prompt Injection, Guardrails, Red Teaming

K

Knowledge Distillation

#

Model terms

Knowledge distillation trains a smaller, cheaper student model to imitate a larger teacher model, transferring much of its capability at a fraction of the size and cost. It is a common way to ship fast, affordable models derived from frontier ones. Distillation pairs well with quantization for efficient deployment. See also: Fine-tuning, Quantization, Parameters

Knowledge Graph

#

Model terms

A knowledge graph stores information as a network of entities and the relationships between them, enabling structured queries and reasoning over connected facts. Paired with language models, it can supply precise, relational grounding that plain text retrieval misses. It is one approach to reducing hallucination. See also: RAG, Semantic Search, Grounding

L

Latency

#

Build terms

Latency is the time delay between when an AI system receives an input and generates the corresponding output. This metric directly impacts user experience, with low latency enabling real-time interactions in conversational interfaces and autonomous systems. In Claude Opus 4.8 and GPT-5.5-class models, latency stems from data preprocessing, mathematical computations, data transfer between processing units, and postprocessing, with larger models typically exhibiting higher latency due to increased computational overhead. Reducing latency requires model compression, optimized inference code, hardware acceleration, and lower-precision numerical formats. Streaming responses decreases perceived latency by delivering tokens incrementally rather than waiting for complete generation. See also: Inference, TTS, API, Model Compression

LLM

#

Model terms

Large Language Model (LLM) is a deep learning neural network trained on vast text datasets to understand, generate, and process human-like natural language. LLMs underpin modern AI tools by enabling text generation, summarization, translation, and reasoning at scale. Examples include GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro. See also: Foundation Model, Tokens, AI Writing Category

Local LLM

#

Model terms

A local LLM is a language model you run on your own hardware rather than a hosted API, giving full data control and no per-token fee in exchange for setup and compute. Open-weight models plus quantization make this practical on consumer machines. Tools like Ollama and LM Studio simplify it. See also: Open Weights, Edge AI, Quantization

LoRA

#

Model terms

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes pre-trained model weights and injects trainable low-rank decomposition matrices into Transformer layers. It reduces compute and memory needs, enabling smaller teams to customize large models without full retraining. LoRA customizes open-source models like Llama 4 and DeepSeek V3.2 for specific tasks. See also: Fine-tuning, Open Source vs Closed Source

M

MCP (Model Context Protocol)

#

Build terms

MCP (Model Context Protocol) is an open standard for connecting AI models to external tools, data sources, and services through a common interface, so integrations work across apps instead of being rebuilt for each one. It has become a widely adopted way to give assistants and agents access to files, APIs, and databases. See also: Function Calling, AI Agent, SDK

MoE (Mixture of Experts)

#

Model terms

MoE (Mixture of Experts) is a machine learning architecture that divides a neural network into specialized sub-networks called experts, with a gating network activating only relevant experts per input for efficiency. This selective activation scales models to billions of parameters while reducing compute costs during training and inference. Mixtral 2 and Grok 4.20 deploy MoE layers to match dense model performance at lower inference expense. See also: LLM, Inference

Multi-agent

#

Agent systems

A multi-agent system is a computational architecture of multiple autonomous AI agents that interact in a shared environment to achieve complex goals difficult for a single agent. Multi-agent systems divide tasks among specialized agents for superior efficiency, scalability, and resilience in production workflows. CrewAI 2 can orchestrate a research agent using an OpenAI model, a writing agent with Claude Opus 4.8, and a review agent for report generation. See also: Agentic AI, Workflow Automation

Multimodal

#

Model terms

Multimodal AI processes and generates more than one type of data, such as text, images, audio, and video, within a single model or system. It lets you ask questions about a screenshot, generate an image from a description, or analyze a chart. GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro are all multimodal to varying degrees. See also: LLM, Foundation Model, Diffusion Model

N

No-code/Low-code

#

Agent systems

No-code and low-code platforms enable building applications using visual drag-and-drop interfaces and pre-built components with minimal or no hand-coding required. They accelerate development for developers and non-technical users, enabling rapid creation of custom software without deep programming expertise. Bubble supports no-code web apps, while Retool provides low-code dashboards integrated with OpenAI APIs. See also: Workflow Automation, n8n, Bubble

O

Open Source vs Closed Source

#

Model terms

Open Source vs Closed Source in AI distinguishes models with publicly available weights, architecture, code, and data from proprietary models where these elements remain confidential and accessible only via API or fee. Open source enables self-hosting, fine-tuning, inspection, and privacy; closed source provides superior performance, updates, security, and ease of integration. Examples include open source Llama 4, DeepSeek V3.2, and Mixtral 2 versus closed source GPT-5.5 and Claude Opus 4.8. See also: Mistral, LoRA, Fine-tuning

Overfitting

#

Model terms

Overfitting happens when a model learns its training data too closely, including noise, and then performs poorly on new inputs. It signals poor generalization and is countered with more or cleaner data, regularization, and held-out evaluation. Benchmarks can mask it if a model has effectively memorized them. See also: Fine-tuning, Benchmark, Synthetic Data

P

Parameters

#

Model terms

Parameters are the learned numerical weights inside a neural network that encode what a model knows, adjusted during training and frozen at inference. Parameter count is a rough proxy for capacity, though architecture and training data matter just as much. Mixture-of-experts designs activate only a fraction of total parameters per token to cut cost. See also: Foundation Model, Open Weights, MoE (Mixture of Experts)

PII

#

Model terms

PII (personally identifiable information) is data that can identify a specific person, such as names, emails, or IDs. Sending PII to AI services raises privacy and legal obligations, so teams redact it, restrict retention, or self-host. Handling PII correctly is central to AI compliance. See also: Zero Data Retention, Compliance, Local LLM

Pretraining

#

Model terms

Pretraining is the initial, expensive phase where a model learns general patterns from a huge unlabeled dataset using self-supervision, before any task-specific tuning. It produces the base capabilities that fine-tuning and prompting later shape. Most of a frontier model's cost and knowledge comes from pretraining. See also: Foundation Model, Fine-tuning, Parameters

Prompt Caching

#

Model terms

Prompt caching stores the processed form of a stable prompt prefix so repeated requests that share it skip recomputation, cutting cost and latency. It pays off when a large system prompt or document is reused across many calls. Keeping the cached prefix byte-identical is what makes it work. See also: System Prompt, Tokens, Latency

Prompt Engineering

#

Model terms

Prompt engineering is the process of designing and refining natural language prompts to guide generative AI models, particularly large language models, toward producing accurate and desired outputs. Prompt engineering optimizes AI performance without model retraining, enabling precise control over responses through techniques like few-shot prompting and chain-of-thought reasoning. For example, Claude Opus 4.8 generates step-by-step solutions when prompted with "Think step by step" for complex math problems. See also: LLM, Fine-tuning, Tokens

Prompt Injection

#

Model terms

Prompt injection is an attack where malicious text hidden in user input or external content tricks a model into ignoring its instructions or taking unintended actions. It is a leading security risk for AI agents that read web pages, emails, or documents. Defenses include input isolation, tool permission limits, and treating retrieved content as untrusted. See also: System Prompt, AI Agent, Agentic AI

Q

Quantization

#

Model terms

Quantization reduces the numerical precision of a model's weights, for example from 16-bit to 4-bit, to shrink its size and speed up inference, usually with a small accuracy cost. It is what makes large open-weight models runnable on consumer hardware. Many local LLM setups rely on quantized versions of frontier-class models. See also: Parameters, Inference, Open Weights

R

RAG

#

Build terms

Retrieval-Augmented Generation (RAG) is a technique that enables large language models to retrieve relevant information from external knowledge bases before generating responses. RAG grounds outputs in current, domain-specific data to produce accurate responses without retraining the model. For example, Claude Opus 4.8 uses RAG to query a company vector database for employee HR policies during a leave inquiry. See also: Embedding, Vector Database, Hallucination

Rate Limit

#

Model terms

A rate limit caps how many requests or tokens you can send to an API in a given period, protecting providers from overload and controlling spend. Hitting one returns an error you handle with backoff and retries. Limits vary by plan and model and are a key planning factor for production apps. See also: API, Inference Cost, Latency

Real-time Voice

#

Trust and media

Real-time voice is low-latency, speech-to-speech conversation with an AI, where you talk and it replies almost immediately in a natural voice. It combines fast speech recognition, model reasoning, and speech synthesis in a tight loop. It is the basis of modern voice assistants and phone agents. See also: Speech-to-Text, TTS (Text-to-Speech), Latency

Reasoning Models

#

Model terms

Reasoning models are large language models trained to perform multi-step logical reasoning, breaking complex problems into chain-of-thought steps for superior accuracy on math, coding, and planning tasks. They enable reliable solutions to challenges beyond standard LLMs' pattern-matching capabilities. Examples include Claude Opus 4.8 with extended thinking and Gemini 3.1 Pro Thinking. See also: LLM, Prompt Engineering

Red Teaming

#

Model terms

Red teaming is the practice of deliberately attacking an AI system to find vulnerabilities, unsafe outputs, and jailbreaks before release. Internal and external red teams probe for harmful, biased, or exploitable behavior. Findings feed back into guardrails and training. See also: Jailbreak, Guardrails, AI Alignment

Reranking

#

Model terms

Reranking is a second retrieval step that reorders an initial set of candidate results by relevance to the query, usually with a model more precise than the first-pass search. It sharpens the context fed to a model in retrieval-augmented generation, improving answer quality. See also: Semantic Search, RAG, Embedding

RLHF

#

Model terms

RLHF (Reinforcement Learning from Human Feedback) is a training method that tunes a model using human preference judgments, rewarding outputs people rate as more helpful, honest, and harmless. It is a major reason modern assistants follow instructions well. Anthropic's Constitutional AI is a related approach that uses written principles to guide this feedback. See also: Fine-tuning, AI Alignment, Claude

S

SaaS

#

Business terms

SaaS is a cloud computing model where providers host and deliver applications over the internet on a subscription basis, managing all infrastructure and updates. This model enables AI tool users to access compute-intensive services without local installation or maintenance costs. Examples include ChatGPT Plus with GPT-5.5 access and Claude Pro or Max with Claude Opus 4.8 via browser apps. See also: ARR, API, MaaS

SDK

#

Build terms

Software Development Kit (SDK). Collection of tools, libraries, and documentation that simplifies building applications with an API by wrapping calls in language-specific functions. SDKs accelerate development and reduce errors for developers integrating AI services. Examples include Anthropic Python SDK for Claude Opus 4.8 and the OpenAI Node SDK for GPT-5.5-class models; the Claude Agent SDK adds frameworks for autonomous AI agents. See also: API, Agentic AI

SEO

#

Business terms

Search Engine Optimization (SEO) is the practice of improving websites and web pages to increase visibility and organic traffic in unpaid search engine results pages (SERPs). SEO drives targeted users searching for information, products, or services, boosting engagement, brand awareness, and conversions without paid ads. Surfer SEO automates keyword research and on-page analysis for higher rankings. See also: GEO, Surfer SEO

SSO

#

Model terms

SSO (single sign-on) lets users access multiple applications with one set of credentials managed by a central identity provider. In AI tools it is an enterprise requirement for security and user management, usually paired with provisioning and audit controls. It typically appears only on business and enterprise plans. See also: Compliance, SaaS, API

Streaming

#

Model terms

Streaming delivers a model's output token by token as it is generated rather than waiting for the full response. It cuts perceived latency, lets users read along, and avoids timeouts on long outputs. Most chat interfaces and many APIs stream by default. See also: Latency, Tokens, Inference

Superintelligence

#

Model terms

Superintelligence is a hypothetical AI whose general capability far exceeds the best humans across virtually all domains, beyond even AGI. It is a central subject of long-term safety and governance debate. No such system exists today, and whether or when one could is deeply uncertain. See also: AGI, Frontier Model, AI Alignment

Synthetic Data

#

Model terms

Synthetic data is training or evaluation data generated by AI rather than collected from the real world. It helps cover rare cases, protect privacy, and scale datasets cheaply, but poor synthetic data can amplify bias or degrade quality. It is increasingly used to train and fine-tune models. See also: Fine-tuning, Foundation Model, AI Bias

System Prompt

#

Model terms

A system prompt is the instruction layer that sets a model's role, rules, and tone before any user message, shaping how it responds across a conversation. Applications use it to define behavior, safety boundaries, and output format. Because it carries operator authority and sits at the front of the context, keeping it stable also helps prompt caching. See also: Prompt Engineering, Prompt Injection, Context Window

T

Temperature

#

Model terms

Temperature is a sampling setting that controls how random a model's output is: low values make responses focused and deterministic, while high values increase variety and creativity. Lower temperature suits extraction and code; higher suits brainstorming. Note that some newer models restrict or remove this control in favor of effort settings. See also: Tokens, Prompt Engineering, Hallucination

Test-Time Compute

#

Model terms

Test-Time Compute allocates additional computational resources during model inference to enhance output quality through techniques like multiple sampling, search, or iterative refinement. This scales performance on complex tasks by trading inference time and hardware for superior accuracy and reasoning. Examples include GPT-5.5-class models allocating extra reasoning tokens and Claude Opus 4.8 using search-like deliberation patterns. See also: Inference, Reasoning Models

Text-to-Image

#

Model terms

Text-to-image generation creates pictures from a written prompt, typically using diffusion models. Quality depends on the prompt, the model, and settings like aspect ratio and seed. It is one of the most popular consumer AI uses, led by tools such as Midjourney. See also: Diffusion Model, Multimodal, Midjourney

Text-to-Video

#

Model terms

Text-to-video generation produces short video clips from a written prompt, an advancing frontier that is more compute-intensive and harder to control than image generation. Outputs are improving fast in coherence and length. Tools like Runway and several frontier video models compete here. See also: Diffusion Model, Text-to-Image, Runway

Tokens

#

Build terms

Tokens are the discrete units of text that large language models break down and process, representing words, subwords, punctuation, or character combinations. Token count directly determines both computational cost and the maximum input length a model can accept within its context window. For common English text, roughly 750 words equal 1,000 tokens, making token estimation essential for API budgeting and prompt design. See also: Context Window, Tokenization, LLM, API

Transfer Learning

#

Model terms

Transfer learning reuses a model trained on one broad task as the starting point for another, so you adapt existing knowledge instead of training from scratch. It is why a single pretrained foundation model can be fine-tuned for many specialized uses with far less data and compute. See also: Pretraining, Fine-tuning, Foundation Model

Transformer

#

Model terms

The Transformer is the neural network architecture behind modern large language models, introduced in 2017, that uses self-attention to weigh the relationships between all tokens in a sequence in parallel. It replaced earlier recurrent approaches and made it practical to train models on internet-scale text. Nearly every current frontier model, including GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro, is Transformer-based. See also: Foundation Model, Attention Mechanism, LLM

TTS (Text-to-Speech)

#

Trust and media

TTS (Text-to-Speech) converts written text into spoken audio using speech synthesis technology. Modern AI TTS enables scalable voice content creation for accessibility, audiobooks, virtual assistants, and customer service. ElevenLabs v3 and OpenAI TTS-2 produce human-like speech with emotion and natural pacing. See also: Voice Cloning, ElevenLabs, Voxtral

U

Usage-based Pricing

#

Model terms

Usage-based pricing charges by how much you consume, such as tokens, credits, API calls, or compute time, rather than a flat subscription. It is common for AI products because inference cost scales with use. It rewards light users but can make heavy or unpredictable workloads hard to budget. See also: Inference Cost, SaaS, Rate Limit

V

Vector Database

#

Build terms

A vector database stores, indexes, and queries high-dimensional vector embeddings representing unstructured data like text, images, or audio for efficient similarity search. Vector databases enable low-latency semantic retrieval essential for RAG systems and generative AI applications. Pinecone stores embeddings from OpenAI, Anthropic-compatible, or open-weight embedding models for querying relevant passages in enterprise RAG pipelines. See also: Embedding, RAG

Vibe Coding

#

Agent systems

Vibe coding is a software development practice where developers describe tasks in natural language prompts to AI large language models, which generate, refine, and debug code. It accelerates prototyping and experimentation by shifting focus from manual coding to guiding AI outputs. Andrej Karpathy coined the term in February 2025, exemplified by using Claude Opus 4.8 or Cursor 2 to build MVPs from conversational descriptions. See also: Agentic Engineering, Software 2.0, AI Coding Category

Voice Cloning

#

Trust and media

Voice cloning replicates a specific person's voice using AI trained on audio samples to synthesize realistic speech matching their tone, accent, and inflections. This technology enables scalable content creation and accessibility tools while posing risks of fraud and deepfakes without consent safeguards. Professional voice-clone workflows can require longer consented samples, while instant voice methods in voice platforms use much shorter clips and need stricter consent controls. See also: TTS, ElevenLabs

W

Watermarking

#

Model terms

Watermarking embeds a detectable signal into AI-generated text, images, audio, or video so it can later be identified as machine-made. It supports provenance and abuse detection, but signals can be weakened by editing and are not foolproof. It complements, rather than replaces, content provenance standards. See also: Content Provenance, Deepfake, Diffusion Model

Workflow Automation

#

Agent systems

Workflow automation uses software to execute multi-step business processes automatically based on triggers, rules, actions, and logic, minimizing human intervention. It enables faster operations, reduces errors, and frees teams for high-value work. For example, Zapier can call an OpenAI model to generate social posts, then schedule them via Make. See also: No-code/Low-code, Agentic AI, n8n

Z

Zero Data Retention

#

Model terms

Zero data retention (ZDR) is a provider arrangement where prompts and outputs are not stored after a request is served, important for privacy-sensitive and regulated workloads. It is usually limited to specific API or enterprise routes rather than consumer apps, so verify scope before relying on it. Self-hosting is the stronger control. See also: API, SaaS, Local LLM