• SemiAnalysis InferenceMAX: vLLM and NVIDIA Accelerate Blackwell Inference
    Oct 9, 2025

    SemiAnalysis InferenceMAX: vLLM and NVIDIA Accelerate Blackwell Inference

  • DeepSeek-V3.2-Exp in vLLM: Fine-Grained Sparse Attention in Action
    Sep 29, 2025

    DeepSeek-V3.2-Exp in vLLM: Fine-Grained Sparse Attention in Action

  • The First vLLM Meetup in Korea
    Sep 16, 2025

    The First vLLM Meetup in Korea

  • vLLM Semantic Router: Next Phase in LLM inference
    Sep 11, 2025

    vLLM Semantic Router: Next Phase in LLM inference

  • vLLM Now Supports Qwen3-Next: Hybrid Architecture with Extreme Efficiency
    Sep 11, 2025

    vLLM Now Supports Qwen3-Next: Hybrid Architecture with Extreme Efficiency

  • Serving Geospatial, Vision, and Beyond: Enabling Multimodal Output Processing in vLLM
    Sep 5, 2025

    Serving Geospatial, Vision, and Beyond: Enabling Multimodal Output Processing in vLLM

  • Inside vLLM: Anatomy of a High-Throughput LLM Inference System
    Sep 5, 2025

    Inside vLLM: Anatomy of a High-Throughput LLM Inference System

  • Introduction to torch.compile and How It Works with vLLM
    Aug 20, 2025

    Introduction to torch.compile and How It Works with vLLM

  • GLM-4.5 Meets vLLM: Built for Intelligent Agents
    Aug 19, 2025

    GLM-4.5 Meets vLLM: Built for Intelligent Agents

  • CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond
    Aug 11, 2025

    CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond

  • vLLM Now Supports gpt-oss
    Aug 5, 2025

    vLLM Now Supports gpt-oss

  • MiniMax-M1 Hybrid Architecture Meets vLLM: Long Context, Fast Inference
    Jun 30, 2025

    MiniMax-M1 Hybrid Architecture Meets vLLM: Long Context, Fast Inference

  • Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU
    May 12, 2025

    Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU

  • Accelerating RLHF with vLLM, Best Practice from OpenRLHF
    Apr 23, 2025

    Accelerating RLHF with vLLM, Best Practice from OpenRLHF

  • Transformers backend integration in vLLM
    Apr 11, 2025

    Transformers backend integration in vLLM

  • Llama 4 in vLLM
    Apr 5, 2025

    Llama 4 in vLLM

  • PTPC-FP8: Boosting vLLM Performance on AMD ROCm
    Feb 24, 2025

    PTPC-FP8: Boosting vLLM Performance on AMD ROCm

  • Introducing AIBrix: A Scalable, Cost-Effective Control Plane for vLLM
    Feb 21, 2025

    Introducing AIBrix: A Scalable, Cost-Effective Control Plane for vLLM

  • Distributed Inference with vLLM
    Feb 17, 2025

    Distributed Inference with vLLM

  • vLLM V1: A Major Upgrade to vLLM's Core Architecture
    Jan 27, 2025

    vLLM V1: A Major Upgrade to vLLM's Core Architecture

  • Introducing vLLM Inference Provider in Llama Stack
    Jan 27, 2025

    Introducing vLLM Inference Provider in Llama Stack

  • High Performance and Easy Deployment of vLLM in K8S with vLLM production-stack
    Jan 21, 2025

    High Performance and Easy Deployment of vLLM in K8S with vLLM production-stack

  • Structured Decoding in vLLM: a gentle introduction
    Jan 14, 2025

    Structured Decoding in vLLM: a gentle introduction

  • vLLM 2024 Retrospective and 2025 Vision
    Jan 10, 2025

    vLLM 2024 Retrospective and 2025 Vision

  • Installing and Developing vLLM with Ease
    Jan 10, 2025

    Installing and Developing vLLM with Ease

  • Serving LLMs on AMD MI300X: Best Practices
    Oct 23, 2024

    Serving LLMs on AMD MI300X: Best Practices

  • How Speculative Decoding Boosts vLLM Performance by up to 2.8x
    Oct 17, 2024

    How Speculative Decoding Boosts vLLM Performance by up to 2.8x

  • vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction
    Sep 5, 2024

    vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction

  • vLLM’s Open Governance and Performance Roadmap
    Jul 25, 2024

    vLLM’s Open Governance and Performance Roadmap

  • Announcing Llama 3.1 Support in vLLM
    Jul 23, 2024

    Announcing Llama 3.1 Support in vLLM

  • Notes on vLLM v.s. DeepSpeed-FastGen
    Nov 14, 2023

    Notes on vLLM v.s. DeepSpeed-FastGen

  • vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
    Jun 20, 2023

    vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention