PC Desktop Getting Started
This guide walks you through everything required to automate PC desktop applications with Midscene: install dependencies, configure model credentials, and run your first JavaScript script.
Control PC desktop with JavaScript: https://github.com/web-infra-dev/midscene-example/tree/main/computer/javascript-sdk-demo
Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/computer/vitest-demo
Control a remote Windows desktop over RDP: https://github.com/web-infra-dev/midscene-example/tree/main/computer/rdp-demo
Test Obsidian (an Electron app) on headless Linux CI with @midscene/computer: https://github.com/web-infra-dev/midscene-example/tree/main/computer/electron-demo
Set up API keys for model
Set your model configs into the environment variables. You may refer to Model strategy for more details.
For more configuration details, please refer to Model strategy and Model configuration.
System Requirements
Node.js
Node.js 18.19.0 or higher is required.
Platform-Specific Dependencies
macOS: Accessibility permissions are required for keyboard and mouse control. When you run the script for the first time, macOS will prompt you to grant access. Go to System Settings > Privacy & Security > Accessibility and enable permissions for the application running your script (e.g., Terminal, iTerm2, VS Code, WebStorm, or other IDEs). For more details, see nut.js macOS setup.
Windows: No extra setup is needed for ordinary apps. However, Windows isolates input across privilege levels (UIPI): a non-elevated process cannot send mouse or keyboard input to a window that runs as Administrator (elevated). The input is silently dropped — the cursor still moves to the right spot, but clicks and keystrokes have no effect. Prefer running the target application without Administrator privileges. If the target application must stay elevated, run the terminal or Node.js that launches Midscene as Administrator too, so both processes share the same privilege level. See Windows: clicks have no effect on some apps.
Linux: ImageMagick is required for screenshot functionality.
Headless Linux (CI): To run desktop automation on a headless Linux server (e.g. GitHub Actions), install Xvfb and its dependencies, then enable headless mode:
Xvfb creates a virtual display so that mouse, keyboard, and screenshot operations work without a physical monitor. See API Reference for details.
Try Playground (no code)
Playground is the fastest way to validate the connection and observe AI-driven steps without writing code. It shares the same core as @midscene/computer, so anything that works here will behave the same once scripted.
- Launch the Playground CLI:
- Click the gear icon in the Playground window, then paste your API key configuration. Refer back to Model configuration if you still need credentials.
Start experiencing
After configuration, you can start using Midscene right away. It provides several key operation tabs:
- Act: interact with the page. This is Auto Planning, corresponding to
aiAct. For example:
- Query: extract JSON data from the interface, corresponding to
aiQuery.
Similar methods include aiBoolean(), aiNumber(), and aiString() for directly extracting booleans, numbers, and strings.
- Assert: understand the page and assert; if the condition is not met, throw an error, corresponding to
aiAssert.
- Tap: click on an element. This is Instant Action, corresponding to
aiTap.
For the difference between Auto Planning and Instant Action, see the API document.
Integration with Midscene Agent
Once Playground works, move to a repeatable script with the JavaScript SDK.
Step 1. Install dependencies
Step 2. Write your first script
Create example.ts:
Step 3. Run the script
Connect to a Remote Windows Desktop via RDP
@midscene/computer can also drive a remote Windows desktop directly over the RDP protocol through the dedicated agentForRDPComputer() factory.
Prerequisites
- A reachable Windows machine with RDP enabled.
- FreeRDP installed on the machine running your script.
Example
Common RDP Options
host: Remote Windows host or IP.port: RDP port. Defaults to3389.username/password: Account credentials for the remote session.domain: Optional Windows domain.ignoreCertificate: Skip certificate validation for self-signed setups.desktopWidth/desktopHeight: Request a specific remote desktop resolution.adminSession: Request the remote admin session when the server allows it.
RDP sessions are exposed to Midscene as a single remote display. You can still use the same aiAct, aiQuery, aiAssert, and report features as local desktop automation.
Multi-Display Support
If you have multiple displays, you can control a specific one:
Example Usage
Basic Mouse Operations
Keyboard Operations
Query Information
Complex Workflows
Environment Check
You can check if your system is properly configured:
FAQ
macOS: Script cannot control mouse or keyboard
macOS requires Accessibility permissions for keyboard and mouse control. Go to System Settings > Privacy & Security > Accessibility and enable the toggle for the application running your script (e.g., Terminal, iTerm2, VS Code, or WebStorm).
If you have already granted permission but it still doesn't work, try removing the app from the Accessibility list and re-adding it — macOS sometimes caches stale permissions.
Windows: clicks have no effect on some apps
If the cursor moves to the correct position but clicks or key presses do nothing on a particular application — while other apps work fine — check whether the target app is running as Administrator (elevated). Windows UIPI blocks input injected from a lower-privilege process into an elevated window and drops it silently, with no error.
Prefer lowering the target application's privilege level first, for example by launching it without "Run as Administrator" or disabling any setting that always starts it elevated. If the target app must stay elevated, run the terminal or Node.js that launches Midscene as Administrator so it matches the target app's privilege level, then try again. System-level shortcuts such as Win+Tab are handled by the shell and keep working even when this happens, which is why keyboard shortcuts may appear to work while in-app clicks do not.
The health check logged at connection time prints this troubleshooting link when Midscene is not running as Administrator on Windows.
Linux: Screenshots or interactions fail on a headless server
A headless Linux environment (e.g. CI) has no physical display. You need to install Xvfb and ImageMagick, and enable headless mode:
Or set the environment variable:

