by SALT-NLP · Agent Tool · ★ 51
Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
Attacking Vision-Language Computer Agents via Pop-ups Yanzhe Zhang, Tao Yu, Diyi Yang Overview Autonomous agents powered by large vision and language models (VLM) have demonstrated significant potential in completing daily computer tasks, such as browsing the web to book travel and operating desktop software, which requires agents to understand these interfaces. Despite such visual inputs becoming more integrated into agentic applications, what types of risks and attacks exist around them still remain unclear. In this work, we demonstrate that VLM agents can be easily attacked by a set of carefully designed adversarial pop-ups, which human users would typically recognize and ignore. This distraction leads agents to click these pop-ups instead of performing the tasks as usual. Integrating these pop-ups into existing agent testing environments like OSWorld and VisualWebArena leads to an attack success rate (the frequency of the agent clicking the pop-ups) of 86% on average and decreases the task success rate by 47%. Basic defense techniques such as asking the agent to ignore pop-ups or including an advertisement notice, are ineffective against the attack.
| Stars | 51 |
| Forks | 3 |
| Language | Python |
| Category | Agent Tool |
| Quality Score | 35.25/100 |
| Open Issues | 1 |
| Last Updated | 2024-12-23 |
| Created | 2024-11-04 |
| Platforms | claude-code, python |
| Est. Tokens | ~13278k |
Looking for a PopupAttack alternative? If you're comparing PopupAttack with other agent tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
Auto-Use Computer Use — drives your OS, browser, scours the web, writes your code. One agent, end to end.
A fully-featured, GUI-powered local LLM Agent sandbox with complete MCP protocol support. Features both CLI
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Bilingual (中文+EN) ML / LLM / diffusion / agent interview cheat sheets for AI 秋招 — generated by ARIS /interview
AI controls your OS. OS AI Computer Use, OS and API agnostic. For now on OpenAI and Anthropic API. Desktop app
A curated list of awesome resources, tools, research papers, and projects related to the concept of Large Lang
Explore other popular agent tool tools:
PopupAttack is Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups. It is categorized as a Agent Tool with 51 GitHub stars.
PopupAttack is primarily written in Python. It covers topics such as attack, claude-3-5-sonnet, computer-use.
You can find installation instructions and usage details in the PopupAttack GitHub repository at github.com/SALT-NLP/PopupAttack. The project has 51 stars and 3 forks, indicating an active community.
The top alternatives to PopupAttack on Agent Skills Hub include Auto-Use, EdgeBox, VideoGLaMM. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.