by waybarrios · MCP Server · ★ 1.1k
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
vLLM-MLX vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac Overview vllm-mlx brings native Apple Silicon GPU acceleration to vLLM by integrating: MLX: Apple's ML framework with unified memory and Metal kernels mlx-lm: Optimized LLM inference with KV cache and quantization mlx-vlm: Vision-language models for multimodal inference mlx-audio: Speech-to-Text and Text-to-Speech with native voices mlx-embeddings: Text embeddings for semantic search and RAG Features Multimodal - Text, Image, Video & Audio in one platform Native GPU acceleration on Apple Silicon (M1, M2, M3, M4) Native TTS voices - Spanish, French, Chinese, Japanese + 5 more languages OpenAI API compatible - drop-in replacement for OpenAI client Anthropic Messages API - native /v1/me
| Stars | 1,075 |
| Forks | 154 |
| Language | Python |
| Category | MCP Server |
| License | Apache-2.0 |
| Quality Score | 52.57/100 |
| Open Issues | 41 |
| Last Updated | 2026-05-02 |
| Created | 2025-12-06 |
| Platforms | claude-code, mcp, python |
| Est. Tokens | ~782k |
These tools work well together with vllm-mlx for enhanced workflows:
Looking for a vllm-mlx alternative? If you're comparing vllm-mlx with other mcp server tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Run Claude Code 100% on-device with local AI on Apple Silicon. MLX-native Anthropic-API server, 65 tok/s Qwen
Supercharge Claude Code with 11 AI agents, 36 commands & 15 skills — the claude-code plugin framework inspired
The self-hosted AI gateway for production RAG across LLMs, databases, APIs, and files.
Own your AI. The native macOS harness for AI agents -- any model, persistent memory, autonomous execution, cry
Voice-to-text dictation app with local (Nvidia Parakeet/Whisper) and cloud models (BYOK). Privacy-first and av
Communicate with an LLM provider using a single interface
Explore other popular mcp server tools:
vllm-mlx is OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac. It is categorized as a MCP Server with 1.1k GitHub stars.
vllm-mlx is primarily written in Python. It covers topics such as anthropic, apple-silicon, audio-processing.
You can find installation instructions and usage details in the vllm-mlx GitHub repository at github.com/waybarrios/vllm-mlx. The project has 1.1k stars and 154 forks, indicating an active community.
vllm-mlx is released under the Apache-2.0 license, making it free to use and modify according to the license terms.
The top alternatives to vllm-mlx on Agent Skills Hub include claude-code-local, claude-forge, orbit. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.