Browser Automations - Komos Documentation

Welcome! This guide gives you the fastest possible orientation to Komos tasks—our building block for reusable browser automations.

Browser tasks at a glance

What they are: A browser task is a reusable automation graph that can run in the always-on Komos Cloud Sandbox or on a managed self-hosted runner (Windows or macOS host) that you manage. Each task maps the navigation, interaction, and data capture steps needed to complete a workflow.
How to use them: Pick an existing task from the dashboard or clone a template, then launch runs as needed. Every run executes asynchronously so you can keep working while the sandbox completes the flow.
Inputs: Tasks declare a typed input schema (strings, numbers, booleans, lists, and structured objects/object lists). Provide values when you trigger a run—either via the UI, a schedule, or the API. Defaults defined in the task builder are applied automatically when you omit a field; for object inputs, add the expected fields so teammates and API callers see the shape.
Outputs: When a run finishes it emits structured variables captured by EXTRACT_DATA, PROCESS_DATA, and PARSE_DOCUMENT nodes. Outputs can be scalars, lists, or structured objects with defined fields. Review them in the run detail view, export via the dashboard, or fetch them through the public API. The builder now flags undefined variable references while you edit so you can fix typos before a run.
Nested tasks: Add a Task node to call any published task as a child flow. Inputs map from the parent, outputs surface back up, and the node stays pinned to the child’s published version so runs stay deterministic.

Train your task

Komos supports two complementary training modes so you can build the perfect automation style for your team:

Human-Narrated Training Demo: Walk through the workflow once while describing what should happen. Komos turns that narration into a runnable plan, then suggests the nodes to add. This is ideal for subject-matter experts who want to articulate intent without thinking about selectors.
Visual Builder (step-by-step): Record the exact clicks, waits, and extractions directly in the browser preview. The builder captures DOM selectors and lets you edit node inputs before saving the task definition.

You can switch between the modes at any time—use narration to sketch the flow, then refine specific steps in the builder.

Give the agent visual cues

Open a Browser Action, Fill Form, or Login node and expand Advanced → Visual References to add screenshots of the exact page state you expect. Tight, well-labeled crops make AI-driven steps more reliable.
You can upload your own images or keep/delete still frames Komos suggests from your narrated training sessions.
Visual cues travel with the task, so reruns and node-slice debugging both use the same references across environments.

Record a training demo from your desktop

Start in the Training step of the task wizard and choose Add training session. The in-app recorder launches in a new view so you can pick the window or browser tab to capture.
Use Chrome or Microsoft Edge 116+ to enable screen capture and optional tab audio. When the share picker appears, select the source and, if you want system sound, check Share tab audio.
Enable the narration toggle to mix your microphone with the shared audio. A live VU meter helps you confirm levels before you hit Start recording.
The session runs for up to five minutes. You’ll get a one-minute warning and recordings stop automatically at the limit. If MP4 isn’t supported on the device, we fall back to WebM and display a banner so you know which format will be saved.
While recording, Komos opens a Picture-in-Picture window with pause/resume and stop controls. If the browser blocks PiP, the page keeps the full control set so you can continue uninterrupted.
After you stop, review the inline playback, rename the file, download a copy if needed, and click Save to task. The upload runs with progress feedback and returns you to the wizard with a success toast once the training session is attached.

Guide the execution plan

Every time you click Generate the Plan—either during the wizard or from an existing task—you’ll see an optional guidance prompt. Use it to highlight priorities, shortcuts, or anything you want the AI to emphasize. Komos combines that note with your training sessions, supporting documents, and current task definition to rebuild the execution graph. Your most recent guidance stays prefilled so you can tweak it between regenerations without retyping from scratch, and it never overwrites the main task description. Existing tasks now have two AI modes:

Improve current plan (incremental): starts from the saved draft, keeps structure where possible, and applies your latest guidance plus any selected training sessions or documents. Best for quick edits.
Rebuild from scratch: ignores the current draft and creates a fresh plan using the same inputs/outputs and any sources you pick.

In the Task Details page, the Build with AI button opens a modal where you choose the mode, pick which training sessions and supporting documents to ground this generation, and add optional guidance. The creation wizard flow is unchanged; it still defaults to generating a plan from your Basics/Training inputs without showing the mode toggle.

Fine-tune your workflow

Visual editor: Open any task in the builder to add, reorder, or delete actions. Each card exposes the selectors, parameters, and descriptions the runner uses, so you can refine the automation without re-recording a session. Reach for the action palette to drop new interactions or data steps; consult the Node types reference to pick the right move.
API Request node: Call REST endpoints directly inside a flow. Set the method, URL, optional Bearer token, and JSON body with ${variable} interpolation; Komos parses the JSON response into a reusable variable (e.g., ${api_call.response}). Use Send test in the editor to try a sample request without leaving the builder.
Download the graph: The canvas controls now include Download JSON, which exports the current task definition (including unsaved edits) for offline review or version control.
Reusable tasks: Drop a published task into another automation with the new Task node. Map the child inputs to parent variables, choose which outputs to expose, and Komos runs the child as a nested flow pinned to its published version. You’ll see “Child task update available” badges when the child ships a new version and can refresh nodes on your schedule.
Structured inputs & outputs: Mark task inputs and Process Data outputs as object or object_list and list their fields. Downstream nodes, API callers, and docs then see the expected shape, and Komos can validate references for you while you edit.
Guidance prompts: Nudge the planner with natural language when you regenerate the execution plan. Prompts are great for high-level adjustments such as “prefer search results over the marketing page” or “loop through every invoice row before exporting.” Because prompts never overwrite your saved nodes, you can iterate safely and revert to the prior plan at any time.
Hybrid flow: Mix both approaches—record or prompt for the broad strokes, then use the editor to wire precise waits, selectors, or output mappings so variables stay canonical.

Debug a slice of nodes

Multi-select in the builder: Drag-select or shift-click a contiguous slice of nodes (including IF/ELSE or LOOP structures). The selection toolbar shows how many steps are included and surfaces Run selection, Duplicate, and Delete actions.
Pick a runner once: The builder header now mirrors Task Details, so your preferred managed sandbox or self-hosted runner stays in sync while you debug. Offline runners surface warnings before you launch a slice.
Node debug drawer: Launching Run selection opens a right-hand drawer that lists the ordered nodes, required canonical inputs (scalar/list/JSON), and a reminder that fresh sandbox sessions or self-hosted prep are still manual. Provide every requested input—Run slice stays disabled until they’re filled.
Run behavior: Submissions enqueue a dedicated node_debug run and deep-link straight to the Execution stream. Slice runs execute real side effects (emails, uploads, API calls) and do not auto-hydrate state, so copy any outputs you need from the run detail view when it finishes.

Choose your environment

Decide where each run executes before you launch it. The Komos Cloud Sandbox stays online for scheduled or API-triggered tasks, while the self-hosted runners (Windows or macOS) connect through Komos to reuse your domain-joined desktops for intranet portals or thick-client flows. Komos no longer ships or requires a local browser extension—these managed environments cover every run target. Review Environment for setup steps and guidance on when to pick each target.

Trigger a run

Choose the trigger that matches how your team works:

Manual launch: Kick off ad-hoc runs from the dashboard when you need a one-off execution.
Schedules: Define a cadence (hourly, daily, weekly) so Komos runs the task automatically.
API key: Issue an organization API key and call POST /public/v1/tasks/{taskId}/runs to launch tasks programmatically.
Follow-up via API: Poll GET /public/v1/task-runs/{runId} or GET /public/v1/task-runs/{runId}/logs to monitor progress, or subscribe to webhooks for real-time status updates.

Review past runs faster

The Task Details page now includes a Past runs tab that lists the latest executions (status, trigger, duration, runner, requester). Jump straight into any run’s live log stream or open the full Execution view for deeper debugging.

Use the rest of this documentation to dive deeper into training best practices and the public API.

Getting started

​Browser tasks at a glance

​Train your task

​Give the agent visual cues

​Record a training demo from your desktop

​Guide the execution plan

​Fine-tune your workflow

​Debug a slice of nodes

​Choose your environment

​Trigger a run

​Review past runs faster