Skip to main contentWelcome! This guide gives you the fastest possible orientation to Komos tasks—our building block for
reusable browser automations.
Browser tasks at a glance
- What they are: A browser task is a reusable automation graph that can run in the always-on Komos
Cloud Sandbox or on a managed self-hosted runner (Windows or macOS host) that you manage. Each task maps the navigation, interaction, and data
capture steps needed to complete a workflow.
- How to use them: Pick an existing task from the dashboard or clone a template, then launch runs
as needed. Every run executes asynchronously so you can keep working while the sandbox completes
the flow.
- Inputs: Tasks declare a typed input schema (strings, numbers, booleans, lists, and
structured objects/object lists). Provide values when you trigger a run—either via the UI, a
schedule, or the API. Defaults defined in the task builder are applied automatically when you omit
a field; for object inputs, add the expected fields so teammates and API callers see the shape.
- Outputs: When a run finishes it emits structured variables captured by
EXTRACT_DATA,
PROCESS_DATA, and PARSE_DOCUMENT nodes. Outputs can be scalars, lists, or structured objects
with defined fields. Review them in the run detail view, export via the dashboard, or fetch them
through the public API. The builder now flags undefined variable references while you edit so you
can fix typos before a run.
- Nested tasks: Add a Task node to call any published task as a child flow. Inputs map from
the parent, outputs surface back up, and the node stays pinned to the child’s published version so
runs stay deterministic.
Train your task
Komos supports two complementary training modes so you can build the perfect automation style for
your team:
- Human-Narrated Training Demo: Walk through the workflow once while describing what should
happen. Komos turns that narration into a runnable plan, then suggests the nodes to add. This is
ideal for subject-matter experts who want to articulate intent without thinking about selectors.
- Visual Builder (step-by-step): Record the exact clicks, waits, and extractions directly in the
browser preview. The builder captures DOM selectors and lets you edit node inputs before saving the
task definition.
You can switch between the modes at any time—use narration to sketch the flow, then refine specific
steps in the builder.
Give the agent visual cues
- Open a Browser Action, Fill Form, or Login node and expand Advanced → Visual References to
add screenshots of the exact page state you expect. Tight, well-labeled crops make AI-driven steps
more reliable.
- You can upload your own images or keep/delete still frames Komos suggests from your narrated
training sessions.
- Visual cues travel with the task, so reruns and node-slice debugging both use the same references
across environments.
Record a training demo from your desktop
- Start in the Training step of the task wizard and choose Add training session. The in-app
recorder launches in a new view so you can pick the window or browser tab to capture.
- Use Chrome or Microsoft Edge 116+ to enable screen capture and optional tab audio. When the share
picker appears, select the source and, if you want system sound, check Share tab audio.
- Enable the narration toggle to mix your microphone with the shared audio. A live VU meter helps
you confirm levels before you hit Start recording.
- The session runs for up to five minutes. You’ll get a one-minute warning and
recordings stop automatically at the limit. If MP4 isn’t supported on the device, we fall back to
WebM and display a banner so you know which format will be saved.
- While recording, Komos opens a Picture-in-Picture window with pause/resume and stop controls. If
the browser blocks PiP, the page keeps the full control set so you can continue uninterrupted.
- After you stop, review the inline playback, rename the file, download a copy if needed, and click
Save to task. The upload runs with progress feedback and returns you to the wizard with a
success toast once the training session is attached.
Guide the execution plan
Every time you click Generate the Plan—either during the wizard or from an existing task—you’ll see
an optional guidance prompt. Use it to highlight priorities, shortcuts, or anything you want the AI to
emphasize. Komos combines that note with your training sessions, supporting documents, and current task
definition to rebuild the execution graph. Your most recent guidance stays prefilled so you can tweak it
between regenerations without retyping from scratch, and it never overwrites the main task description.
Existing tasks now have two AI modes:
- Improve current plan (incremental): starts from the saved draft, keeps structure where possible, and applies your latest guidance plus any selected training sessions or documents. Best for quick edits.
- Rebuild from scratch: ignores the current draft and creates a fresh plan using the same inputs/outputs and any sources you pick.
In the Task Details page, the Build with AI button opens a modal where you choose the mode, pick which training sessions and supporting documents to ground this generation, and add optional guidance. The creation wizard flow is unchanged; it still defaults to generating a plan from your Basics/Training inputs without showing the mode toggle.
Fine-tune your workflow
- Visual editor: Open any task in the builder to add, reorder, or delete actions. Each card exposes
the selectors, parameters, and descriptions the runner uses, so you can refine the automation without
re-recording a session. Reach for the action palette to drop new interactions or data steps; consult
the Node types reference to pick the right move.
- API Request node: Call REST endpoints directly inside a flow. Set the method, URL, optional
Bearer token, and JSON body with
${variable} interpolation; Komos parses the JSON response into a
reusable variable (e.g., ${api_call.response}). Use Send test in the editor to try a sample
request without leaving the builder.
- Download the graph: The canvas controls now include Download JSON, which exports the current
task definition (including unsaved edits) for offline review or version control.
- Reusable tasks: Drop a published task into another automation with the new Task node. Map the
child inputs to parent variables, choose which outputs to expose, and Komos runs the child as a nested
flow pinned to its published version. You’ll see “Child task update available” badges when the child ships a new
version and can refresh nodes on your schedule.
- Structured inputs & outputs: Mark task inputs and Process Data outputs as
object or
object_list and list their fields. Downstream nodes, API callers, and docs then see the expected
shape, and Komos can validate references for you while you edit.
- Guidance prompts: Nudge the planner with natural language when you regenerate the execution plan.
Prompts are great for high-level adjustments such as “prefer search results over the marketing page”
or “loop through every invoice row before exporting.” Because prompts never overwrite your saved
nodes, you can iterate safely and revert to the prior plan at any time.
- Hybrid flow: Mix both approaches—record or prompt for the broad strokes, then use the editor to
wire precise waits, selectors, or output mappings so variables stay canonical.
Debug a slice of nodes
- Multi-select in the builder: Drag-select or shift-click a contiguous slice of nodes (including IF/ELSE or LOOP structures). The selection toolbar shows how many steps are included and surfaces
Run selection, Duplicate, and Delete actions.
- Pick a runner once: The builder header now mirrors Task Details, so your preferred managed sandbox or self-hosted runner stays in sync while you debug. Offline runners surface warnings before you launch a slice.
- Node debug drawer: Launching
Run selection opens a right-hand drawer that lists the ordered nodes, required canonical inputs (scalar/list/JSON), and a reminder that fresh sandbox sessions or self-hosted prep are still manual. Provide every requested input—Run slice stays disabled until they’re filled.
- Run behavior: Submissions enqueue a dedicated
node_debug run and deep-link straight to the Execution stream. Slice runs execute real side effects (emails, uploads, API calls) and do not auto-hydrate state, so copy any outputs you need from the run detail view when it finishes.
Choose your environment
Decide where each run executes before you launch it. The Komos Cloud Sandbox stays online for
scheduled or API-triggered tasks, while the self-hosted runners (Windows or macOS) connect through Komos to reuse your
domain-joined desktops for intranet portals or thick-client flows. Komos no longer ships or requires a
local browser extension—these managed environments cover every run target. Review
Environment for setup steps and guidance on when to pick each target.
Trigger a run
Choose the trigger that matches how your team works:
- Manual launch: Kick off ad-hoc runs from the dashboard when you need a one-off execution.
- Schedules: Define a cadence (hourly, daily, weekly) so Komos runs the task automatically.
- API key: Issue an organization API key and call
POST /public/v1/tasks/{taskId}/runs to launch
tasks programmatically.
- Follow-up via API: Poll
GET /public/v1/task-runs/{runId} or GET /public/v1/task-runs/{runId}/logs
to monitor progress, or subscribe to webhooks for real-time status updates.
Review past runs faster
- The Task Details page now includes a Past runs tab that lists the latest executions (status,
trigger, duration, runner, requester). Jump straight into any run’s live log stream or open the full
Execution view for deeper debugging.
Use the rest of this documentation to dive deeper into training best practices and the public API.