Skip to main content
Agent Action is Komos’s most powerful node type. It’s a prompt-driven AI agent that can perform complex, multi-step operations in a single node.

Capabilities

Agent Action can:
CapabilityExamples
Browse websitesNavigate, click, fill forms, scroll, extract data
Process dataTransform, filter, summarize, format data with AI
Call integrationsGoogle Sheets, Slack, CRMs, and 100+ connected tools
Search the webLook up information, verify facts, find resources
Read/write filesCreate reports, parse uploads, manage documents
Send emailsCompose and send via connected email integrations
Inspect networkCapture XHR/Fetch requests to extract hidden data

When to use Agent Action

Use Agent Action when:
  • The task requires judgment or adaptation
  • Multiple steps need to happen based on what’s found
  • You need to combine browser automation with data processing
  • The exact steps depend on page content
Use specialized nodes when:
  • Steps are deterministic and well-defined (Navigate, Click, Extract Data)
  • You need precise control over timing (Wait, Wait For)
  • You want explicit, auditable flows

Creating an Agent Action node

  1. In the builder, click + Add Node and select Agent Action
  2. Write a prompt describing what the agent should do
  3. Define outputs the agent should produce
  4. Optionally attach skills and integrations

Writing effective prompts

Be specific about the goal and constraints:
Extract all invoice numbers and amounts from the current page.

1. Look for a table or list of invoices
2. For each invoice found, capture:
   - Invoice number (usually starts with "INV-")
   - Amount (numeric value with currency)
   - Date if visible
3. If pagination exists, note that more pages are available
   but don't navigate to them

Return the data as a list of objects.
Tips:
  • Break complex tasks into numbered steps
  • Specify what success looks like
  • Mention edge cases or constraints
  • Don’t over-specify - let the agent adapt

Defining outputs

Declare what variables the agent should produce:
Output TypeUse Case
scalarSingle value (string, number, boolean)
listArray of values
objectStructured data with named fields
object_listArray of structured objects
Outputs are available as ${node_id.output_name} in downstream nodes.

Advanced features

Network request inspection

When data isn’t visible in the DOM (hidden IDs, URLs constructed by JavaScript), use network inspection:
The table shows applicant names but the IDs aren't in the HTML.

1. Navigate to the applicants page
2. Use get_network_requests to find the API call that loads the table
3. Use get_network_response_body to get the JSON response
4. Extract the applicant IDs from the API response
5. Build the detail URLs: /applicants/{id}
The agent has access to:
  • get_network_requests(url_pattern, method) - Find captured requests
  • get_network_response_body(request_id) - Get response content

Attaching skills

Skills provide reusable guidance. Attach them in the node editor:
  1. Open the Agent Action node
  2. Find the Skills section
  3. Select skills that apply to this task
The agent reads skill instructions alongside your prompt.

Using integrations

Enable connected integrations for API work:
  1. In the node editor, find Integrations
  2. Select which connected accounts to allow
  3. Describe the integration work in your prompt
Using the connected Google Sheets account:
1. Open the spreadsheet "Monthly Reports"
2. Find the sheet named "January"
3. Add a new row with today's date and the extracted totals

Visual references

Add screenshots to show the agent what to expect:
  1. Expand Advanced > Visual References
  2. Upload or paste screenshots
  3. Label them clearly (e.g., “Login page”, “Success state”)
Visual references improve reliability for complex UIs.

Combining with other nodes

Agent Action works best as part of a larger flow:
Navigate → Login → Agent Action → Wait For → Extract Data → Process Data
  • Use Navigate and Login for deterministic setup
  • Use Agent Action for the adaptive middle steps
  • Follow with Wait For to ensure the page is ready
  • Use Extract Data or Process Data for deterministic capture

Troubleshooting

Agent doesn’t find elements

  • Add visual references showing the expected UI
  • Be more specific about element descriptions
  • Check if content loads dynamically (may need Wait For first)

Outputs are empty

  • Verify output names match what’s declared
  • Check the run logs for what the agent captured
  • Ensure the agent prompt asks for the data explicitly

Integration calls fail

  • Verify the integration is connected in Settings > Integrations
  • Check that the task has the integration enabled
  • Review the integration’s required scopes/permissions

Best practices

  1. Start simple: Write a basic prompt, run it, then refine
  2. Use skills for patterns: Extract repeated guidance into skills
  3. Declare all outputs: Don’t rely on implicit variables
  4. Add verification: Follow Agent Action with Wait For or Extract Data to confirm results
  5. Check run logs: The execution view shows exactly what the agent did