Incubation

vLLM is an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, an attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LLM serving: it delivers up to 24x higher throughput than HuggingFace Transformers, without requiring any model architecture changes.

GitHub: https://github.com/vllm-project

Contribution Policy: https://github.com/vllm-project/vllm/blob/57b7be0e1c4e594c58a78297ab65fbb3ec206958/CONTRIBUTING.md#L4

License: Apache 2.0

Requirements Doc: https://github.com/vllm-project/vllm/blob/main/docs/requirements-docs.txt

Maintainers:

Antoni Baum
Cade Daniel
Cody Yu
Cyrus Leung
Kaichao You
Keven Luu
Lily Liu
Michael Goin
Nick Hill
Philipp Moritz
Robert Shaw
Roger Wang
Roy Lu
SangBin Cho
Simon Mo
Woosuk Kwon
Zuohan Li

Reference Information

Reference Information

vLLM Home

Reference Information

Reference Information