1 open source AI project tagged with cuda
Python
vLLM is a high-throughput LLM serving engine using PagedAttention to optimize GPU memory. Deploy Llama and DeepSeek models via an OpenAI-compatible API today.