Skip to main content
Welcome! This guide gives you the fastest possible orientation to Komos tasks—our building block for reusable browser automations.

Browser tasks at a glance

  • What they are: A browser task is a reusable automation graph that runs inside our sandboxed Chromium environment. Each task owns the navigation, interaction, and extraction steps needed to complete a workflow.
  • How to use them: Pick an existing task from the dashboard or clone a template, then launch runs as needed. Every run executes asynchronously so you can keep working while the sandbox completes the flow.
  • Inputs: Tasks declare a typed input schema (strings, numbers, booleans, and lists). Provide values when you trigger a run—either via the UI, a schedule, or the API. Defaults defined in the task builder are applied automatically when you omit a field.
  • Outputs: When a run finishes it emits structured variables captured by EXTRACT_DATA and PROCESS_DATA nodes. Review them in the run detail view, export via the dashboard, or fetch them through the public API.

Train your task

Komos supports two complementary training modes so you can build the perfect automation style for your team:
  1. Human-Narrated Training Demo: Walk through the workflow once while describing what should happen. Komos turns that narration into a runnable plan, then suggests the nodes to add. This is ideal for subject-matter experts who want to articulate intent without thinking about selectors.
  2. Visual Builder (step-by-step): Record the exact clicks, waits, and extractions directly in the browser preview. The builder captures DOM selectors and lets you edit node inputs before saving the task definition.
You can switch between the modes at any time—use narration to sketch the flow, then refine specific steps in the builder.

Record a training demo from your desktop

  • Start in the Training step of the task wizard and choose Add training session. The in-app recorder launches in a new view so you can pick the window or browser tab to capture.
  • Use Chrome or Microsoft Edge 116+ to enable screen capture and optional tab audio. When the share picker appears, select the source and, if you want system sound, check Share tab audio.
  • Enable the narration toggle to mix your microphone with the shared audio. A live VU meter helps you confirm levels before you hit Start recording.
  • The session runs for up to five minutes. You’ll get a one-minute warning and recordings stop automatically at the limit. If MP4 isn’t supported on the device, we fall back to WebM and display a banner so you know which format will be saved.
  • While recording, Komos opens a Picture-in-Picture window with pause/resume and stop controls. If the browser blocks PiP, the page keeps the full control set so you can continue uninterrupted.
  • After you stop, review the inline playback, rename the file, download a copy if needed, and click Save to task. The upload runs with progress feedback and returns you to the wizard with a success toast once the training session is attached.

Guide the execution plan

Every time you click Generate the Plan—either during the wizard or from an existing task—you’ll see an optional guidance prompt. Use it to highlight priorities, shortcuts, or anything you want the AI to emphasize. Komos combines that note with your training sessions, supporting documents, and current task definition to rebuild the execution graph. Your most recent guidance stays prefilled so you can tweak it between regenerations without retyping from scratch, and it never overwrites the main task description.

Trigger a run

Choose the trigger that matches how your team works:
  • Manual launch: Kick off ad-hoc runs from the dashboard when you need a one-off execution.
  • Schedules: Define a cadence (hourly, daily, weekly) so Komos runs the task automatically.
  • API key: Issue an organization API key and call POST /public/v1/tasks/{taskId}/runs to launch tasks programmatically.
  • Follow-up via API: Poll GET /public/v1/task-runs/{runId} or GET /public/v1/task-runs/{runId}/logs to monitor progress, or subscribe to webhooks for real-time status updates.
Use the rest of this documentation to dive deeper into training best practices and the public API.