Skip to main contentWelcome! This guide gives you the fastest possible orientation to Komos tasks—our building block for
reusable browser automations.
Browser tasks at a glance
- What they are: A browser task is a reusable automation graph that runs inside our sandboxed
Chromium environment. Each task owns the navigation, interaction, and extraction steps needed to
complete a workflow.
- How to use them: Pick an existing task from the dashboard or clone a template, then launch runs
as needed. Every run executes asynchronously so you can keep working while the sandbox completes
the flow.
- Inputs: Tasks declare a typed input schema (strings, numbers, booleans, and lists). Provide
values when you trigger a run—either via the UI, a schedule, or the API. Defaults defined in the
task builder are applied automatically when you omit a field.
- Outputs: When a run finishes it emits structured variables captured by
EXTRACT_DATA and
PROCESS_DATA nodes. Review them in the run detail view, export via the dashboard, or fetch them
through the public API.
Train your task
Komos supports two complementary training modes so you can build the perfect automation style for
your team:
- Human-Narrated Training Demo: Walk through the workflow once while describing what should
happen. Komos turns that narration into a runnable plan, then suggests the nodes to add. This is
ideal for subject-matter experts who want to articulate intent without thinking about selectors.
- Visual Builder (step-by-step): Record the exact clicks, waits, and extractions directly in the
browser preview. The builder captures DOM selectors and lets you edit node inputs before saving the
task definition.
You can switch between the modes at any time—use narration to sketch the flow, then refine specific
steps in the builder.
Record a training demo from your desktop
- Start in the Training step of the task wizard and choose Add training session. The in-app
recorder launches in a new view so you can pick the window or browser tab to capture.
- Use Chrome or Microsoft Edge 116+ to enable screen capture and optional tab audio. When the share
picker appears, select the source and, if you want system sound, check Share tab audio.
- Enable the narration toggle to mix your microphone with the shared audio. A live VU meter helps
you confirm levels before you hit Start recording.
- The session runs for up to five minutes. You’ll get a one-minute warning and
recordings stop automatically at the limit. If MP4 isn’t supported on the device, we fall back to
WebM and display a banner so you know which format will be saved.
- While recording, Komos opens a Picture-in-Picture window with pause/resume and stop controls. If
the browser blocks PiP, the page keeps the full control set so you can continue uninterrupted.
- After you stop, review the inline playback, rename the file, download a copy if needed, and click
Save to task. The upload runs with progress feedback and returns you to the wizard with a
success toast once the training session is attached.
Guide the execution plan
Every time you click Generate the Plan—either during the wizard or from an existing task—you’ll see
an optional guidance prompt. Use it to highlight priorities, shortcuts, or anything you want the AI to
emphasize. Komos combines that note with your training sessions, supporting documents, and current task
definition to rebuild the execution graph. Your most recent guidance stays prefilled so you can tweak it
between regenerations without retyping from scratch, and it never overwrites the main task description.
Trigger a run
Choose the trigger that matches how your team works:
- Manual launch: Kick off ad-hoc runs from the dashboard when you need a one-off execution.
- Schedules: Define a cadence (hourly, daily, weekly) so Komos runs the task automatically.
- API key: Issue an organization API key and call
POST /public/v1/tasks/{taskId}/runs to launch
tasks programmatically.
- Follow-up via API: Poll
GET /public/v1/task-runs/{runId} or GET /public/v1/task-runs/{runId}/logs
to monitor progress, or subscribe to webhooks for real-time status updates.
Use the rest of this documentation to dive deeper into training best practices and the public API.