Agent Action - Komos Documentation

Agent Action is Komos’s most powerful node type. It’s a prompt-driven AI agent that can perform complex, multi-step operations in a single node.

Capabilities

Agent Action can:

Capability	Examples
Browse websites	Navigate, click, fill forms, scroll, extract data
Process data	Transform, filter, summarize, format data with AI
Call integrations	Google Sheets, Slack, CRMs, and 100+ connected tools
Search the web	Look up information, verify facts, find resources
Read/write files	Create reports, parse uploads, manage documents
Send emails	Compose and send via connected email integrations
Inspect network	Capture XHR/Fetch requests to extract hidden data

When to use Agent Action

Use Agent Action when:

The task requires judgment or adaptation
Multiple steps need to happen based on what’s found
You need to combine browser automation with data processing
The exact steps depend on page content

Use specialized nodes when:

Steps are deterministic and well-defined (Navigate, Click, Extract Data)
You need precise control over timing (Wait, Wait For)
You want explicit, auditable flows

Creating an Agent Action node

In the builder, click + Add Node and select Agent Action
Write a prompt describing what the agent should do
Define outputs the agent should produce
Optionally attach skills and integrations

Writing effective prompts

Be specific about the goal and constraints:

Extract all invoice numbers and amounts from the current page.

1. Look for a table or list of invoices
2. For each invoice found, capture:
   - Invoice number (usually starts with "INV-")
   - Amount (numeric value with currency)
   - Date if visible
3. If pagination exists, note that more pages are available
   but don't navigate to them

Return the data as a list of objects.

Tips:

Break complex tasks into numbered steps
Specify what success looks like
Mention edge cases or constraints
Don’t over-specify - let the agent adapt

Defining outputs

Declare what variables the agent should produce:

Output Type	Use Case
`scalar`	Single value (string, number, boolean)
`list`	Array of values
`object`	Structured data with named fields
`object_list`	Array of structured objects

Outputs are available as ${node_id.output_name} in downstream nodes.

Advanced features

Network request inspection

When data isn’t visible in the DOM (hidden IDs, URLs constructed by JavaScript), use network inspection:

The table shows applicant names but the IDs aren't in the HTML.

Navigate to the applicants page
Use get_network_requests to find the API call that loads the table
Use get_network_response_body to get the JSON response
Extract the applicant IDs from the API response
Build the detail URLs: /applicants/{id}

The agent has access to:

get_network_requests(url_pattern, method) - Find captured requests
get_network_response_body(request_id) - Get response content

Attaching skills

Skills provide reusable guidance. Attach them in the node editor:

Open the Agent Action node
Find the Skills section
Select skills that apply to this task

The agent reads skill instructions alongside your prompt.

Using integrations

Enable connected integrations for API work:

In the node editor, find Integrations
Select which connected accounts to allow
Describe the integration work in your prompt

Using the connected Google Sheets account:
Open the spreadsheet "Monthly Reports"
Find the sheet named "January"
Add a new row with today's date and the extracted totals

Visual references

Add screenshots to show the agent what to expect:

Expand Advanced > Visual References
Upload or paste screenshots
Label them clearly (e.g., “Login page”, “Success state”)

Visual references improve reliability for complex UIs.

Combining with other nodes

Agent Action works best as part of a larger flow:

Navigate → Login → Agent Action → Wait For → Extract Data → Process Data

Use Navigate and Login for deterministic setup
Use Agent Action for the adaptive middle steps
Follow with Wait For to ensure the page is ready
Use Extract Data or Process Data for deterministic capture

Troubleshooting

Agent doesn’t find elements

Add visual references showing the expected UI
Be more specific about element descriptions
Check if content loads dynamically (may need Wait For first)

Outputs are empty

Verify output names match what’s declared
Check the run logs for what the agent captured
Ensure the agent prompt asks for the data explicitly

Integration calls fail

Verify the integration is connected in Settings > Integrations
Check that the task has the integration enabled
Review the integration’s required scopes/permissions

Best practices

Start simple: Write a basic prompt, run it, then refine
Use skills for patterns: Extract repeated guidance into skills
Declare all outputs: Don’t rely on implicit variables
Add verification: Follow Agent Action with Wait For or Extract Data to confirm results
Check run logs: The execution view shows exactly what the agent did

Getting started

​Capabilities

​When to use Agent Action

​Creating an Agent Action node

​Writing effective prompts

​Defining outputs

​Advanced features

​Network request inspection

​Attaching skills

​Using integrations

​Visual references

​Combining with other nodes

​Troubleshooting

​Agent doesn’t find elements

​Outputs are empty

​Integration calls fail

​Best practices