This blog post demonstrates how to properly use the REST API with the local watsonx Orchestrate server for agent orchestration. Unlike other guides that focus on direct LLM interactions, this post shows the correct approach for working with agents that have tools, knowledge bases, and complex workflows.
The post guides you step-by-step from configuring your local watsonx Orchestrate server to invoking agents via REST API using Python. We'll cover environment setup, authentication, agent discovery, and the crucial asynchronous execution pattern that makes agent orchestration work.
This post demonstrates a complete flow from server startup to agent invocation:
Choose your preferred environment setup method:
2The server boots up, connecting local credentials, containers, and runtimes.
source ./.venv/bin/activate
orchestrate server start --env-file .env
3This ensures that your client targets the local server configuration.
orchestrate env activate local
4When you inspect docker compose on your local machine, you'll find that the watsonx Orchestrate server runs on port 4321
. You can access the Swagger UI at http://localhost:4321/docs
.
The server runs on port 4321 by default, and the Swagger API explorer is available at:
http://localhost:4321/docs
http://localhost:4321/api/v1/orchestrate
5For local REST API calls, we need a Bearer token. We can extract it from the orchestrate cache:
grep -R "wxo_mcsp_token" ~/.cache/orchestrate/credentials.yaml
Windows Command Prompt Alternative:
findstr "wxo_mcsp_token" %USERPROFILE%\.cache\orchestrate\credentials.yaml
The file ~/.cache/orchestrate/credentials.yaml
contains authentication information:
auth:
local:
wxo_mcsp_token: YOUR_TOKEN_HERE
Copy the token value for use in your API calls.
6List your agents and get the ID for the agent you want to invoke:
orchestrate agents list
Example output:
Name | Description | LLM | Style | Collaborators | Tools | Knowledge Base | ID |
---|---|---|---|---|---|---|---|
DataAnalysisAgent | Elasticsearch Data Analysis Agent | watsonx/meta-llama/llama-3-2-90b-vision-instruct | default | elasticsearch_tool | b700b57a-29a1-40e1-b895-adf9eba4f907 |
Note your agent ID for the next step. In this example, the ID is b700b57a-29a1-40e1-b895-adf9eba4f907
.
7Create a Python file called agent_orchestrator.py
with the following code. This implementation correctly uses the Runs API for agent orchestration:
import requests
import json
import time
# Configuration
token = "your_wxo_mcsp_token_here"
agent_id = "your_agent_id_here"
base_url = "http://localhost:4321/api/v1/orchestrate"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
def get_run_result(run_id, max_attempts=30, delay=2):
"""Poll for the run result until it's complete"""
get_url = f"{base_url}/runs/{run_id}"
for attempt in range(max_attempts):
try:
resp = requests.get(get_url, headers=headers, timeout=10)
resp.raise_for_status()
run_data = resp.json()
status = run_data.get("status", "unknown")
print(f"Run status (attempt {attempt + 1}): {status}")
if status == "completed":
return run_data
elif status in ["failed", "cancelled"]:
return {"error": f"Run {status}", "details": run_data}
time.sleep(delay)
except Exception as e:
print(f"Error checking run status: {e}")
time.sleep(delay)
return {"error": "Timeout waiting for run to complete"}
def call_orchestrate(question):
"""Invoke an agent using the Runs API"""
chat_url = f"{base_url}/runs"
payload = {
"agent_id": agent_id,
"message": {
"role": "user",
"content": question
}
}
print("Sending payload:", json.dumps(payload, indent=2))
resp = requests.post(chat_url, headers=headers, json=payload, timeout=10)
resp.raise_for_status()
data = resp.json()
print("Initial API response:", json.dumps(data, indent=2))
# Get the actual run result
run_id = data.get("run_id")
if run_id:
print(f"Waiting for run {run_id} to complete...")
run_result = get_run_result(run_id)
print("Final run result:", json.dumps(run_result, indent=2))
return run_result
else:
return {"error": "No run_id in response", "response": data}
def parse_agent_response(response):
"""Parse the agent response to extract answer and citations"""
try:
result_data = response.get('result', {}).get('data', {}).get('message', {})
content = result_data.get('content', [])
if content and len(content) > 0:
answer_text = content[0].get('text', 'No answer found')
citations = content[0].get('citations', [])
print("Agent Response:", answer_text)
if citations:
print("\nSources:")
for i, citation in enumerate(citations, 1):
title = citation.get('title', 'Unknown source')
url = citation.get('url', '')
if len(url) > 60:
display_url = url[:60] + "..."
else:
display_url = url
print(f" [{i}] {title}")
if display_url:
print(f" {display_url}")
else:
print("Agent: No content found in response")
except Exception as e:
print(f"Agent: Error parsing response - {e}")
print("Raw response:", response)
# Example usage
if __name__ == "__main__":
question = "Analyze the recent sales data from our Elasticsearch database"
response = call_orchestrate(question)
parse_agent_response(response)
Important: This implementation uses the Runs API (/runs
) which is the correct endpoint for agent orchestration. Other guides may incorrectly use the Chat Completions API (/chat/completions
) which is designed for direct LLM interactions, not agent orchestration.
Many guides incorrectly suggest using the Chat Completions API for agent interactions. Here's an example of the wrong approach that you should avoid:
import requests
from modules import token
agent_id = "your_agent_id"
url = f"http://localhost:4321/api/v1/orchestrate/{agent_id}/chat/completions"
token = "your_wxo_mcsp_token"
payload = {
"messages": [
{
"role": "human",
"content": "Analyze the recent sales data from our Elasticsearch database"
}
],
"additional_parameters": {},
"context": {},
"stream": False
}
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Invoke the agent
response = requests.request("POST", url, json=payload, headers=headers)
print(f"Agent response:\n{response.text}\n")
Why This Approach Fails:
/chat/completions
instead of /runs
When to Use Chat Completions API: Only for direct LLM interactions without agent capabilities - simple chatbots, text generation, or basic conversation flows that don't require tools or knowledge bases.
When to Use Runs API: For any agent that has tools, knowledge bases, complex workflows, or requires asynchronous processing - which is most production use cases.
The watsonx Orchestrate Developer Edition provides several API endpoints for different use cases. Here's a comprehensive overview of each endpoint:
Method | Endpoint | Description | Use Case | Documentation |
---|---|---|---|---|
POST | /api/v1/orchestrate/runs/stream |
Chat With Orchestrate Agent As Stream Real-time streaming responses from agents with immediate feedback |
Interactive chat interfaces, real-time applications, live agent interactions | View Docs |
GET | /api/v1/orchestrate/runs |
List Orchestrate Agent Runs Retrieve a list of all agent run sessions with pagination support |
Monitoring agent activity, audit trails, run history analysis | View Docs |
POST | /api/v1/orchestrate/runs |
Chat With Orchestrate Agent Standard asynchronous agent invocation (what we used in our example) |
Production applications, complex workflows, tools and knowledge base access | View Docs |
POST | /api/v1/orchestrate/upload/s3 |
Upload To S3 Upload files to S3 storage for agent processing |
File processing, document analysis, data ingestion workflows | View Docs |
POST | /api/v1/orchestrate/runs/{run_id}/cancel |
Cancel An Orchestrate Agent Run Stop an ongoing agent execution |
Resource management, timeout handling, user-initiated cancellations | View Docs |
GET | /api/v1/orchestrate/runs/{run_id} |
Get Orchestrate Agent Run Retrieve detailed information about a specific run |
Run status checking, result retrieval, debugging agent executions | View Docs |
GET | /api/v1/orchestrate/runs/{run_id}/events |
Get Orchestrate Agent Run Events Retrieve detailed execution events and logs |
Debugging, execution tracing, performance analysis | View Docs |
POST | /api/v1/orchestrate/agents/{agent_id}/chat/completions |
Chat With Agents Direct LLM interaction without agent orchestration (avoid this for agents with tools) |
Simple chatbots, direct LLM access, basic text generation | View Docs |
Key Takeaway: For agents with tools, knowledge bases, or complex workflows, use the POST /api/v1/orchestrate/runs endpoint (as demonstrated in our example). The streaming endpoint is great for real-time interactions, while the direct chat endpoint should only be used for simple LLM interactions without agent capabilities.
The Orchestrate API supports rich message structures and advanced parameters for sophisticated agent interactions. Understanding these components is crucial for building production applications.
Basic Message Structure:
{
"message": {
"role": "user",
"content": [
{
"response_type": "conversational_search"
}
],
"mentions": [
{
"type": "document",
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "Sales Report 2024"
}
],
"document_ids": [
"3c90c3cc-0d44-4b50-8888-8dd25736052a"
],
"parent_message_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"additional_properties": {
"wxa_message": {},
"display_properties": {
"skip_render": false,
"is_async": false
},
"tool_calls": [
{}
],
"tool_call_id": "tool_123",
"tool_name": "elasticsearch_tool",
"wxo_connection_status": {
"connection_status": "connected",
"connection_message": "Successfully connected to Elasticsearch"
}
},
"assistant_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"context": {
"values": []
},
"step_history": [
{}
],
"message_state": {}
}
}
Component | Purpose | Example |
---|---|---|
mentions | Reference specific documents, tools, or entities in the conversation | Document references, tool invocations, user mentions |
document_ids | Specify which documents the agent should consider | Knowledge base documents, uploaded files, reports |
additional_properties | Extended metadata for tool calls, display options, and status | Tool execution details, rendering preferences, connection status |
llm_params | Control LLM behavior and response generation | Temperature, token limits, sampling parameters |
guardrails | Content filtering and safety controls | HAP, social bias, PII detection |
Common LLM Parameters:
"llm_params": {
"temperature": 0.7, // Controls randomness (0.0-1.0)
"max_new_tokens": 1000, // Maximum response length
"top_p": 0.9, // Nucleus sampling parameter
"top_k": 50, // Top-k sampling
"repetition_penalty": 1.1, // Prevents repetitive text
"stop_sequences": ["\n\n", "END"], // Stop generation at these tokens
"time_limit": 30, // Maximum processing time (seconds)
"return_options": {
"generated_tokens": true, // Include token-level details
"input_tokens": true, // Include input tokenization
"token_logprobs": true // Include probability scores
}
}
Guardrails Configuration:
"guardrails": {
"hap": { // Harmful content detection
"input": {
"enabled": true,
"threshold": 0.8
},
"output": {
"enabled": true,
"threshold": 0.8
}
},
"social_bias": { // Bias detection
"input": {
"enabled": true,
"threshold": 0.7
},
"output": {
"enabled": true,
"threshold": 0.7
}
},
"pii": { // Personal data protection
"input": {
"enabled": true,
"threshold": 0.9
},
"output": {
"enabled": true,
"threshold": 0.9
}
}
}
Orchestrate provides comprehensive APIs for managing documents and knowledge bases that agents can access during conversations.
Method | Endpoint | Description | Use Case |
---|---|---|---|
GET | /api/v1/orchestrate/documents/collections |
Fetch Document Collections List all document collections |
Browse available knowledge bases, organize documents |
POST | /api/v1/orchestrate/documents/collections |
Create Document Collection Create a new document collection |
Set up new knowledge bases, organize related documents |
DELETE | /api/v1/orchestrate/documents/collections/{id} |
Delete Document Collections Remove a document collection |
Clean up unused knowledge bases, data management |
GET | /api/v1/orchestrate/documents/collections/{id} |
Get Document Collection By Id Retrieve specific collection details |
Inspect collection metadata, verify contents |
PATCH | /api/v1/orchestrate/documents/collections/{id} |
Update Document Collection Modify collection properties |
Update collection metadata, change settings |
Method | Endpoint | Description | Use Case |
---|---|---|---|
GET | /api/v1/orchestrate/documents |
List Documents Retrieve all documents |
Browse available documents, inventory management |
POST | /api/v1/orchestrate/documents |
Create Document Create a new document entry |
Add new knowledge base entries, document metadata |
GET | /api/v1/orchestrate/documents/{id} |
Get Document By Id Retrieve specific document |
Access document content, verify information |
DELETE | /api/v1/orchestrate/documents/{id} |
Delete Document By Id Remove a document |
Clean up outdated information, data management |
PATCH | /api/v1/orchestrate/documents/{id} |
Update Document Modify document properties |
Update document metadata, correct information |
POST | /api/v1/orchestrate/documents/upload |
Upload Document Upload document content |
Add new documents to knowledge base, file ingestion |
GET | /api/v1/orchestrate/documents/{id}/download |
Download Document Retrieve document content |
Export documents, backup content, content verification |
Vector indices enable semantic search and retrieval from large document collections, making knowledge bases more powerful.
Method | Endpoint | Description | Use Case |
---|---|---|---|
GET | /api/v1/orchestrate/vector-indices |
List Vector Indices Retrieve all vector indices |
Browse available semantic search indices |
POST | /api/v1/orchestrate/vector-indices |
Create Vector Index Create a new semantic search index |
Set up semantic search for document collections |
DELETE | /api/v1/orchestrate/vector-indices/{id} |
Delete Vector Index Remove a vector index |
Clean up unused semantic search indices |
GET | /api/v1/orchestrate/vector-indices/{id} |
Get Vector Index By Id Retrieve specific index details |
Inspect index configuration, verify settings |
PATCH | /api/v1/orchestrate/vector-indices/{id} |
Update Vector Index Modify index properties |
Update index settings, change search parameters |
GET | /api/v1/orchestrate/vector-indices/{id}/collections |
Get Collections In Vector Index List collections in an index |
See which document collections are indexed |
PUT | /api/v1/orchestrate/vector-indices/{id}/collections |
Add Collection To Vector Index Add documents to semantic search |
Index new document collections for semantic search |
PUT | /api/v1/orchestrate/vector-indices/{id}/refresh |
Refresh Vector Index Update index with new documents |
Sync index with updated document collections |
PUT | /api/v1/orchestrate/vector-indices/{id}/rebuild |
Rebuild Vector Index Completely rebuild the index |
Major updates, index optimization, configuration changes |
POST | /api/v1/orchestrate/vector-indices/{id}/retrieve |
Retrieve Documents From The Vector Index Semantic search query |
Find relevant documents using natural language queries |
Integration Tip: Use document collections and vector indices together to create powerful knowledge bases. Upload documents to collections, then add those collections to vector indices for semantic search capabilities. Agents can then reference specific documents or perform semantic searches during conversations.
8Run the script and inspect the response:
python agent_orchestrator.py
The script will:
run_id
for tracking the asynchronous executionMany guides and blog posts incorrectly use the Chat Completions API for agent interactions. Here's why this approach is correct:
Aspect | Correct Approach (Runs API) | Incorrect Approach (Chat Completions API) |
---|---|---|
API Endpoint | /api/v1/orchestrate/runs |
/api/v1/orchestrate/{agent_id}/chat/completions |
Execution Model | Asynchronous with polling | Synchronous (immediate response) |
Agent Features | Supports tools, knowledge bases, complex workflows | Direct LLM interaction only |
Response Structure | Rich metadata with citations, tool usage | Simple text response |
Use Case | Production agent orchestration | Simple chatbot without agent capabilities |
Correct Approach: Agent runs are asynchronous because they need time to execute tools, access knowledge bases, and perform complex workflows. The polling mechanism waits for the agent to complete its work.
Incorrect Approach: Expects immediate responses, which won't work for complex agent operations that require time to complete.
Correct (Runs API):
{
"agent_id": "your_agent_id",
"message": {
"role": "user",
"content": "Your question here"
}
}
Incorrect (Chat Completions API):
{
"messages": [
{
"role": "human",
"content": "Your question here"
}
],
"additional_parameters": {
"max_tokens": 1000
}
}
Correct Approach: Handles complex nested response structures that include citations, tool usage, and other agent-specific metadata.
Incorrect Approach: Expects simple chat completion responses without agent-specific features.
This local REST API flow gives you complete control over agent orchestrationβno cloud dependencies, fully interactive and testable. The key is using the Runs API for agent interactions rather than the Chat Completions API.
This guide demonstrates the correct approach for working with watsonx Orchestrate agents via REST API.
I'm Justin Townsend, a technology enthusiast and developer advocate with a passion for cloud-native development and AI orchestration. My work focuses on empowering developers with practical knowledge and hands-on experience in emerging technologies.
With expertise in distributed systems, API development, and artificial intelligence, I specialize in creating comprehensive guides that bridge the gap between complex technical concepts and real-world implementation. My goal is to provide developers with the tools and knowledge they need to build robust, scalable applications in the cloud.
When I'm not coding or writing technical content, you'll find me exploring new technologies, contributing to open-source projects, and sharing insights with the developer community. I believe in the power of knowledge sharing and collaborative learning to drive innovation in the tech industry.
The views and opinions expressed in this guide are my own and do not necessarily reflect those of any organization or company.