Essential AI Terms: Your Comprehensive Glossary

I spend a lot of time explaining AI terms and acronyms in conversations particularly network related terms, so I decided to create a single reference post with the most common ones. If you’re starting your AI journey or just need a quick refresher, I hope this glossary helps you cut through the jargon and understand the essentials.

AI Concepts

Chatbot: A computer program that uses AI to understand and respond to human language in a conversational way. i.e. ChatGPT

AI Agent: Can operate without human interaction , and has a specific objective i.e. book a flight, analise data

Hallucination: When an AI Chatbot generates incorrect or fabricated information but that sounds plausible.

Chat GPT (Generative Pre-trained Transformers) a popular Chatbot

Prompt Engineering: Intelligently crafting inputs to guide LLM responses to more accurately meet your intent, i.e. asking ChatGPT something in such a way that gives you exactly the response you were looking for.

Agentic AI: Advanced form of AI focused on autonomous decision making and goal driven behaviour with limited human intervention. Uses AI agents that can plan, adapt and execute tasks in dynamic environments.

Physical AI: AI that enables machines and robots to perceive, understand and interact with the physical world.

FP – Floating Points i.e. FP4, FP16 up to FP 32 the amount of bits used to store a large number, the more bits the higher the precision but slower, the sweet spot is generally FP16 , still accurate but faster training time. FP8 Superfast, is used in inferencing.

Token: A small piece of text, usually a word or part of a word or punctuation i.e. the phase ‘AI is amazing.’ is 4 ‘tokens’ “AI”, “is”, “amazing”, “.” AI coverts all inputs and outputs to ‘tokens’ the number of ‘Tokens’ determines how much text you can process and how much it costs. Tokens are then converted to ‘Vectors’ to associate a numeric meaning and context to the string of tokens, which are then stored in the vector database for similarity search.

Embedding: A numerical representation of meaning, concept, or intent learned by an AI model.
For example, the words “coffee” and “tea” produce similar embeddings because they have similar semantic meaning.
An embedding is expressed as a vector, a list of numerical values, where each number captures some aspect of the underlying meaning. These vectors are stored in a vector database to enable semantic search and retrieval.

Vector Database: A vector database is a type of database used in AI that stores data as vectors (lists of numbers). Instead of searching for exact matches, it enables similarity search, finding items that are “close” in meaning or features. For example, if you like Product A, the system may suggest Product B because their vectors share several similarities.

Context Window: The maximum number of tokens an LLM can process at once.

Inferencing: Using a pre-trained AI model to generate predictions or responses based solely on its training data.
Key Point: No external updates; the model relies on internal knowledge.
Analogy: Answering a question from memory.

RAG: Retrieval Augmented Generation.
An AI technique that combines a language model with real-time data retrieval from external sources before generating an answer.
Key Point: Adds fresh, domain-specific context to overcome model knowledge cutoff.
Analogy: Checking a reference book before answering.

LLMs Large language models: AI programs trained on massive datasets to understand and generate human language.

LRM: Large Reasoning Model, A form of LLM that has undergone reasoning focused fine tuning, to handle multi-step reasoning tasks. Instead of just predicting the next word (like traditional LLMs)

Fine Tuning – Is the process of taking a pre‑trained AI model and adjusting it with new, specific data so it performs better on a particular task. Like teaching an AI model that already knows a lot of general stuff to specialise in your subject.

CUDA: A software platform developed by NVIDIA that enables programs to run thousands of tasks in parallel on GPUs, unlocking massive speedups for computing beyond graphics

MCP: Model Context Protocol: MCP is a specialised protocol designed for AI models. It standardises how models connect to external tools and data sources, making it easier to provide context and structured results. Unlike APIs, which vary by service, MCP acts as a universal adapter for AI systems, ensuring consistent integration across different environments

MoE: Mixture of Experts: An AI model design where multiple specialised “expert” networks are trained for different tasks or patterns, and a gating mechanism decides which expert(s) to activate for each input. This makes the model more efficient, since only a few experts are used at a time instead of the whole network.

Slurm: (Simple Linux Utility for Resource Management)
An open‑source workload manager that schedules and runs jobs on clusters of computers. It’s widely used in high‑performance computing (HPC) and AI to allocate resources, manage queues, and coordinate parallel tasks across many nodes.

BCM: NVIDIA Base Command Manager .A proprietary workload management and cluster orchestration platform from NVIDIA, built to schedule, monitor, and optimize AI and HPC jobs on GPU‑accelerated systems. It provides resource allocation, job scheduling, and Kubernetes integration, similar to Slurm but tailored for NVIDIA hardware.

Infrastructure & Hardware

Hyperscaler – Large Cloud Provider i.e. AWS, Azure, GCP

Neoscaler/Neocloud – Typically smaller more agile than a Hyperscaler Emerging providers focused on specialised AI infrastructure at scale i.e CoreWeave, Lambda Labs

RDMA (Remote Direct Memory Access): Enables direct memory access between nodes without CPU involvement.

InfiniBand: A high‑performance, low‑latency networking technology widely used in supercomputing and AI training clusters. It supports advanced features such as RDMA (Remote Direct Memory Access) and collective offload, enabling efficient data movement at scale. While InfiniBand is an open standard maintained by the InfiniBand Trade Association (IBTA), in practice the ecosystem is heavily dominated by NVIDIA, following its acquisition of Mellanox in 2020..

ROCEv2 A protocol that enables RDMA over Ethernet using IP headers for routability. It requires lossless / Converged Ethernet (PFC/DCQCN) and is common in HPC and storage networks

Ultra Ethernet: An open standard designed to evolve Ethernet for AI and HPC workloads, focusing on low latency, congestion control, and scalability. It aims to provide an Ethernet-based alternative to InfiniBand for GPU clusters

NVLink – NVIDIA’s High-speed GPU interconnect usually within a node

NVSwitch – An advanced switch fabric that connects multiple NVLink-enabled GPUs inside large servers (e.g., DGX systems). Connects up to 72 GPUs (NVIDIAs NL72 rackscale system)

NCCL (NVIDIA Collective Communication Library, pronounced “Nickel”) — A library developed by NVIDIA that enables fast, efficient communication between GPUs. It provides optimised routines for collective operations such as all‑reduce, broadcast, and gather, making multi‑GPU and multi‑node training in AI and HPC scalable and performant. NCCL runs over NVLink/NVSwitch or PCIe within a node, and InfiniBand or RoCEv2 between nodes, using CUDA under the hood..

DGX GB200: This system uses the NVIDIA GB200 Superchip, which combines the Grace CPU with Blackwell (B200) GPUs

DGX GB300: This system is based on the enhanced NVIDIA GB300 Superchip, featuring the Grace CPU with the Blackwell Ultra (B300) GPU

NVIDIA NVL4 Board – (GB200 Grace Blackwell ‘SuperChip’ 4 x Blackwell GPUs with 768 GB HBM3E memory, 2 Grace CPUs with 960 GB LPDDR5X memory connected via NVlink to perform like a single unified processor . Designed for hyperscale AI with trillions of parameters.

NVIDIA NVL72 -Essentially 18 NVL4 boards in a single rack connected via NVLink, with 72 Blackwell GPUs, 36 Grace CPUs sharing 13.8 TB of HBM3E and LPDDR5X memory and acting like a single giant GPU pool, definitely more than the sum of it’s parts. Liquid cooled and can draw ~120 kW.

NVIDIA NVL144 – NVIDIA Vera Rubin NVL144 CPX contains 144 Rubin GPUs and 36 Vera CPUs in a single rack providing 8 exaflops of AI performance. provides 7.5x more AI performance than NVIDIA GB300 NVL72 systems: Availability Expected H2 2026

NVIDIA Kyber NVL576 – 576 Rubin Ultra GPUs and 144 Vera CPUs in a single cohesive GPU domain over 2 racks (1 GPU Kyber rack and 1 Kyber side car rack, for power conversion, chillers and monitoring) drawing up to 600kW., provides 15 exaflops at FP4 (Inference) and 5 exaflops at FP8 (Training) Availability Expected H2 2027

NVIDIA BasePOD A reference architecture for building GPU‑accelerated clusters using NVIDIA DGX systems. It provides validated designs for flexible AI and HPC deployments, allowing organisations to choose networking, storage, and supporting infrastructure from different vendors.

NVIDIA SuperPOD A turnkey AI data center solution from NVIDIA, delivering fully integrated GPU clusters at massive scale. It combines DGX systems, high‑speed networking, storage, and NVIDIA software into a complete package for enterprise and research workloads.

GPU: Graphics Processing Unit is a specialised processor originally designed for rendering graphics, now widely used in AI for its ability to perform thousands of parallel computations simultaneously. This architecture makes GPUs ideal for accelerating deep learning tasks such as training neural networks and running inference, dramatically reducing processing time compared to traditional CPUs

DPU: Data Processing Unit: is a specialised processor designed to offload and accelerate data-centric tasks such as networking, storage, and security from the CPU. DPUs typically combine programmable cores, high-speed network interfaces, and hardware accelerators to handle packet processing, encryption, and virtualisation, improving performance and freeing up CPU resources for application workloads. (e.g., NVIDIA BlueField).

TPU: Tensor Processing Unit, Purpose built chip by Google for AI and ML acceleration, significant speed and efficiency gains over GPU or CPU based systems, but not as flexible.as optimised for a narrower set of operations.

NPU (Native Processing Unit) is an emerging class of processor that uses photonic (light-based) technology instead of traditional electronic signalling to perform computations. By leveraging photons, NPUs achieve ultra-high bandwidth, dramatically lower power consumption, and minimal heat generation, making them ideal for accelerating AI workloads and large-scale data processing.

CPO: Co-Packaged Optics modules use silicon photonics to integrate optical components directly alongside high-speed electronic chips, like switch ASICs or AI accelerators, within a single package

Trainium: is a custom AI accelerator ASIC (so even more specialised than a TPU) developed by Amazon Web Services (AWS) to deliver high-performance training and inference for machine learning models at lower power consumption and cost compared to general-purpose GPUs. Designed for deep learning workloads, Trainium offers optimised tensor operations and scalability for large-scale AI training in the cloud.

Smart NIC: is an advanced network interface card that includes onboard processing capabilities, often using programmable CPUs or DPUs, to offload networking, security, and virtualisation tasks from the host CPU. This enables higher performance, lower latency, and improved scalability for data centre and cloud environments by handling functions such as packet processing, encryption, and traffic shaping directly on the NIC.

HBM: High Bandwidth Memory dedicated to a GPU, energy efficient, Wide bus up to 1024bits directly on the same package as the GPU minimising latency.

SXM (Server PCI Module) GPU module mounted directly on the motherboard offering superior performance and scalability for large-scale projects like AI training,

PCIe (Peripheral Component Interconnect Express) Standard expansion card inserted into PCIe slots, provides flexibility and wider compatibility, making it suitable for smaller-scale tasks and cost-effectiveness.

CDU – Cooling Distribution Unit is a device used in liquid‑cooled data centers to circulate coolant between the building’s central cooling system and the liquid‑cooled IT equipment. It regulates coolant temperature, pressure, and flow to ensure safe, efficient heat removal from high‑density compute systems.

Think I’ve missed any important ones?, let me know in the comments.

Essential AI Terms: Your Comprehensive Glossary

About Colin Lynch

1 Response to Essential AI Terms: Your Comprehensive Glossary

Leave a comment Cancel reply

Search UCSguru.com

Recent Posts

Categories

Helpful UCS Links

Blogroll

Tweets

Top Search TAGs

Archives

Follow Blog via Email

Meta

Essential AI Terms: Your Comprehensive Glossary

Share this:

Related

About Colin Lynch

1 Response to Essential AI Terms: Your Comprehensive Glossary

Leave a comment Cancel reply

Search UCSguru.com

Recent Posts

Categories

Helpful UCS Links

Blogroll

Tweets

Top Search TAGs

Archives

Follow Blog via Email