VideoGLaMM — Agent Tool by mbzuai-oryx

Last updated: 2025-04-14 · Indexed by AgentSkillsHub · Auto-synced every 8h

About VideoGLaMM

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos [CVPR 2025🔥] Shehan Munasinghe , Hanan Gani , Wenqi Zhu , Jiale Cao, Eric Xing, Fahad Shahbaz Khan. Salman Khan, Mohamed bin Zayed University of Artificial Intelligence, Tianjin University, Linköping University, Australian National University, Carnegie Mellon University 📢 Latest Updates Feb-2025: Video-GLaMM is accepted at CVPR 2025! 🎊🎊 Overview VideoGLaMM is a large video multimodal video model capable of pixel-level visual grounding. The model responds to natural language queries from the user and intertwines spatio-temporal object masks in its generated textual responses to provide a detailed understanding of video content. V

cvpr2025 foundation-models llm-agent lmm vision-and-language vision-language-model

Quick Facts

Stars	97
Forks	5
Language	Python
Category	Agent Tool
Quality Score	35.25/100
Open Issues	8
Last Updated	2025-04-14
Created	2024-10-31
Platforms	python
Est. Tokens	~2718k

VideoGLaMM alternative? Top 6 similar tools

Looking for a VideoGLaMM alternative? If you're comparing VideoGLaMM with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.

Auto-Use by auto-use · ⭐ 117
Auto-Use Computer Use — drives your OS, browser, scours the web, writes your code. One agent, end to end.
PopupAttack by SALT-NLP · ⭐ 51
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
cactus by pnnl · ⭐ 50
LLM Agent that leverages cheminformatics tools to provide informed responses.
atlas-mcp-server by cyanheads · ⭐ 467
A Model Context Protocol (MCP) server for ATLAS, a Neo4j-powered task management system for LLM Agents - imple
edsl by expectedparrot · ⭐ 466
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market
ToolNeuron by Siddhesh2377 · ⭐ 397
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion)

More Agent Tool Tools

Explore other popular agent tool tools:

View all Agent Tool tools →

Popular Python Agent Tools

TrendRadar ⭐ 59.7k · MCP Server
gpt-researcher ⭐ 27.4k · MCP Server
Scrapling ⭐ 64.6k · MCP Server
serena ⭐ 25.5k · MCP Server
MaxKB ⭐ 21.4k · MCP Server

Frequently Asked Questions

What is VideoGLaMM?

VideoGLaMM is [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos. It is categorized as a Agent Tool with 97 GitHub stars.

What programming language is VideoGLaMM written in?

VideoGLaMM is primarily written in Python. It covers topics such as cvpr2025, foundation-models, llm-agent.

How do I install or use VideoGLaMM?

You can find installation instructions and usage details in the VideoGLaMM GitHub repository at github.com/mbzuai-oryx/VideoGLaMM. The project has 97 stars and 5 forks, indicating an active community.

What are the best alternatives to VideoGLaMM?

The top alternatives to VideoGLaMM on Agent Skills Hub include Auto-Use, PopupAttack, cactus. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.

View on GitHub → Browse Agent Tool tools