by mbzuai-oryx · Agent Tool · ★ 97
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
| Stars | 97 |
| Forks | 5 |
| Language | Python |
| Category | Agent Tool |
| Quality Score | 35.25/100 |
| Open Issues | 8 |
| Last Updated | 2025-04-14 |
| Created | 2024-10-31 |
| Platforms | python |
| Est. Tokens | ~2718k |
Looking for a VideoGLaMM alternative? If you're comparing VideoGLaMM with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Auto-Use Computer Use — drives your OS, browser, scours the web, writes your code. One agent, end to end.
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
LLM Agent that leverages cheminformatics tools to provide informed responses.
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion)
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
This repository serves as a comprehensive knowledge hub, curating cutting-edge research papers and development
Explore other popular agent tool tools:
VideoGLaMM is [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos. It is categorized as a Agent Tool with 97 GitHub stars.
VideoGLaMM is primarily written in Python. It covers topics such as cvpr2025, foundation-models, llm-agent.
You can find installation instructions and usage details in the VideoGLaMM GitHub repository at github.com/mbzuai-oryx/VideoGLaMM. The project has 97 stars and 5 forks, indicating an active community.
The top alternatives to VideoGLaMM on Agent Skills Hub include Auto-Use, PopupAttack, cactus. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.