Mastering watsonx Orchestrate REST APIs

watsonx Orchestrate Developer Edition Header

REST API Usage with the watsonx Orchestrate Developer Edition: A Complete Guide to Agent Orchestration

This blog post demonstrates how to properly use the REST API with the local watsonx Orchestrate server for agent orchestration. Unlike other guides that focus on direct LLM interactions, this post shows the correct approach for working with agents that have tools, knowledge bases, and complex workflows.

The post guides you step-by-step from configuring your local watsonx Orchestrate server to invoking agents via REST API using Python. We'll cover environment setup, authentication, agent discovery, and the crucial asynchronous execution pattern that makes agent orchestration work.

What's Covered in This Example
Step 1: Environment Setup
Step 2: Server Startup
Step 3: ADK Activation
Step 4: Port Configuration
Step 5: Authentication
Step 6: Agent Discovery
Step 7: Python Implementation
Step 8: Testing and Execution
Why This Approach Works vs. Others
Summary

What's Covered in This Example

This post demonstrates a complete flow from server startup to agent invocation:

Preparing your local environment
Launching the watsonx Orchestrate server
Activating the ADK shell
Determining the local port and API endpoints
Extracting your Bearer token
Listing your agents and getting the agent ID
Coding a Python script to invoke agents properly
Running the script and handling asynchronous responses

Step 1: Environment Setup

Choose your preferred environment setup method:

Use this option if you have an entitlement key from MyIBM and want to access watsonx.ai as model sources.

Create a .env file with the following variables:

WO_DEVELOPER_EDITION_SOURCE=myibm
WO_ENTITLEMENT_KEY=<YOUR_ENTITLEMENT_KEY>
WATSONX_APIKEY=<YOUR_WATSONX_API_KEY>
WATSONX_SPACE_ID=<YOUR_SPACE_ID>

Note: The WO_DEVELOPER_EDITION_SOURCE=myibm variable indicates you're using an entitlement key from MyIBM and accessing watsonx.ai as model sources.

Use this option to connect to your IBM Cloud watsonx Orchestrate instance. You'll need both the service instance URL and an API key.

Important: Don't use the credentials from the IBM Cloud resources page. Follow the procedure below to get the appropriate API key and service instance URL.

Getting Your IBM Cloud Credentials

Log in to your watsonx Orchestrate instance
Click your user icon on the top right and click Settings
Go to the API details tab
Copy the service instance URL
Click the Generate API key button
Generate an API Key: The page redirects you to the IBM Cloud Identity Access Management center. Click Create to create a new API Key
Enter a name and description for your API Key
Copy the API key and store it in a safe vault

Adding and Activating Your Environment

Use the ADK CLI to add and activate your environment:

orchestrate env add -n my-name -u https://my-service-instance-url --type ibm_iam --activate

Or with explicit API key:

orchestrate env add -n <environment-name> -u https://my-service-instance-url --api-key <my-api-key> --type ibm_iam --activate

Activating Your Environment

Run the following command to activate the environment you created:

orchestrate env activate <environment-name>

Note: You can also activate a local development environment. This environment is provided by the watsonx Orchestrate Developer Edition, a stripped-down version of watsonx Orchestrate that runs under a Docker container to be used as a development server.

Starting at version 1.5.0, you can use your watsonx Orchestrate account to pull images from Docker, and you no longer need an entitlement key to pull the images.

Use the following variables to pull watsonx Orchestrate Developer Edition images using the watsonx Orchestrate account:

WO_DEVELOPER_EDITION_SOURCE=orchestrate
WO_INSTANCE=<service_instance_url>
WO_API_KEY=<wxo_api_key>

Note: If image pulling fails, try setting WO_DEVELOPER_EDITION_SOURCE to myibm and add the WO_ENTITLEMENT_KEY variable.

Optional: Skip Login After Pulling Images

After pulling the images, you can optionally add the WO_DEVELOPER_EDITION_SKIP_LOGIN variable and set it to true to skip ICR login. The CLI won't pull new images, but you can still use the ones already available.

Pull Images and Start Server

With your local variables configured in the .env file, use the following command with the ADK to automatically pull the watsonx Orchestrate Developer Edition images and start the containers:

orchestrate server start -e <path-.env-file>

Use watsonx Orchestrate Developer Edition with ADK

After installing watsonx Orchestrate Developer Edition, manage and use it using ADK commands. With these commands, you can:

Start or stop the watsonx Orchestrate Developer Edition server
Activate your local environment
Launch the local UI
Reset the server
View server logs

Use this option if you have a watsonx.ai account and want to access watsonx.ai as model sources.

Use the following variables to pull watsonx Orchestrate Developer Edition images using the watsonx.ai account:

WO_DEVELOPER_EDITION_SOURCE=myibm
WO_ENTITLEMENT_KEY=<my_entitlement_key>
WATSONX_APIKEY=<my_watsonx_api_key>
WATSONX_SPACE_ID=<my_space_id>

Optional: Skip Login After Pulling Images

Pull Images and Start Server

With your local variables configured in the .env file, use the following command with the ADK to automatically pull the watsonx Orchestrate Developer Edition images and start the containers:

orchestrate server start -e <path-.env-file>

Use watsonx Orchestrate Developer Edition with ADK

After installing watsonx Orchestrate Developer Edition, manage and use it using ADK commands. With these commands, you can:

Start or stop the watsonx Orchestrate Developer Edition server
Activate your local environment
Launch the local UI
Reset the server
View server logs

Step 2: Server Startup

2The server boots up, connecting local credentials, containers, and runtimes.

source ./.venv/bin/activate
orchestrate server start --env-file .env

Step 3: ADK Activation

3This ensures that your client targets the local server configuration.

orchestrate env activate local

Step 4: Port Configuration

4When you inspect docker compose on your local machine, you'll find that the watsonx Orchestrate server runs on port 4321. You can access the Swagger UI at http://localhost:4321/docs.

The server runs on port 4321 by default, and the Swagger API explorer is available at:

API Documentation: http://localhost:4321/docs
Base URL: http://localhost:4321/api/v1/orchestrate

Step 5: Authentication

5For local REST API calls, we need a Bearer token. We can extract it from the orchestrate cache:

grep -R "wxo_mcsp_token" ~/.cache/orchestrate/credentials.yaml

Windows Command Prompt Alternative:

findstr "wxo_mcsp_token" %USERPROFILE%\.cache\orchestrate\credentials.yaml

The file ~/.cache/orchestrate/credentials.yaml contains authentication information:

auth:
  local:
    wxo_mcsp_token: YOUR_TOKEN_HERE

Copy the token value for use in your API calls.

Step 6: Agent Discovery

6List your agents and get the ID for the agent you want to invoke:

orchestrate agents list

Example output:

Name	Description	LLM	Style	Collaborators	Tools	Knowledge Base	ID
DataAnalysisAgent	Elasticsearch Data Analysis Agent	watsonx/meta-llama/llama-3-2-90b-vision-instruct	default		elasticsearch_tool		b700b57a-29a1-40e1-b895-adf9eba4f907

Note your agent ID for the next step. In this example, the ID is b700b57a-29a1-40e1-b895-adf9eba4f907.

Step 7: Python Implementation

7Create a Python file called agent_orchestrator.py with the following code. This implementation correctly uses the Runs API for agent orchestration:

import requests
import json
import time

# Configuration
token = "your_wxo_mcsp_token_here"
agent_id = "your_agent_id_here"
base_url = "http://localhost:4321/api/v1/orchestrate"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

def get_run_result(run_id, max_attempts=30, delay=2):
    """Poll for the run result until it's complete"""
    get_url = f"{base_url}/runs/{run_id}"
    
    for attempt in range(max_attempts):
        try:
            resp = requests.get(get_url, headers=headers, timeout=10)
            resp.raise_for_status()
            run_data = resp.json()
            
            status = run_data.get("status", "unknown")
            print(f"Run status (attempt {attempt + 1}): {status}")
            
            if status == "completed":
                return run_data
            elif status in ["failed", "cancelled"]:
                return {"error": f"Run {status}", "details": run_data}
            
            time.sleep(delay)
            
        except Exception as e:
            print(f"Error checking run status: {e}")
            time.sleep(delay)
    
    return {"error": "Timeout waiting for run to complete"}

def call_orchestrate(question):
    """Invoke an agent using the Runs API"""
    chat_url = f"{base_url}/runs"
    
    payload = {
        "agent_id": agent_id,
        "message": {
            "role": "user",
            "content": question
        }
    }

    print("Sending payload:", json.dumps(payload, indent=2))
    resp = requests.post(chat_url, headers=headers, json=payload, timeout=10)
    resp.raise_for_status()
    data = resp.json()
    print("Initial API response:", json.dumps(data, indent=2))
    
    # Get the actual run result
    run_id = data.get("run_id")
    if run_id:
        print(f"Waiting for run {run_id} to complete...")
        run_result = get_run_result(run_id)
        print("Final run result:", json.dumps(run_result, indent=2))
        return run_result
    else:
        return {"error": "No run_id in response", "response": data}

def parse_agent_response(response):
    """Parse the agent response to extract answer and citations"""
    try:
        result_data = response.get('result', {}).get('data', {}).get('message', {})
        content = result_data.get('content', [])
        
        if content and len(content) > 0:
            answer_text = content[0].get('text', 'No answer found')
            citations = content[0].get('citations', [])
            
            print("Agent Response:", answer_text)
            
            if citations:
                print("\nSources:")
                for i, citation in enumerate(citations, 1):
                    title = citation.get('title', 'Unknown source')
                    url = citation.get('url', '')
                    if len(url) > 60:
                        display_url = url[:60] + "..."
                    else:
                        display_url = url
                    print(f"  [{i}] {title}")
                    if display_url:
                        print(f"      {display_url}")
        else:
            print("Agent: No content found in response")
    except Exception as e:
        print(f"Agent: Error parsing response - {e}")
        print("Raw response:", response)

# Example usage
if __name__ == "__main__":
    question = "Analyze the recent sales data from our Elasticsearch database"
    response = call_orchestrate(question)
    parse_agent_response(response)

Important: This implementation uses the Runs API (/runs) which is the correct endpoint for agent orchestration. Other guides may incorrectly use the Chat Completions API (/chat/completions) which is designed for direct LLM interactions, not agent orchestration.

❌ What NOT to Do: Chat Completions API (Incorrect Approach)

Many guides incorrectly suggest using the Chat Completions API for agent interactions. Here's an example of the wrong approach that you should avoid:

import requests
from modules import token

agent_id = "your_agent_id"
url = f"http://localhost:4321/api/v1/orchestrate/{agent_id}/chat/completions"

token = "your_wxo_mcsp_token"

payload = {
    "messages": [
        {
            "role": "human",
            "content": "Analyze the recent sales data from our Elasticsearch database"
        }
    ],
    "additional_parameters": {},
    "context": {},
    "stream": False
}

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Invoke the agent
response = requests.request("POST", url, json=payload, headers=headers)
print(f"Agent response:\n{response.text}\n")

Why This Approach Fails:

Wrong API Endpoint: Uses /chat/completions instead of /runs
No Agent Features: Cannot access tools, knowledge bases, or complex workflows
Synchronous Only: Expects immediate responses, doesn't handle asynchronous agent operations
Limited Response: Returns simple text without citations, tool usage, or rich metadata
No Error Handling: Doesn't handle the complex response structure of agent orchestration

When to Use Chat Completions API: Only for direct LLM interactions without agent capabilities - simple chatbots, text generation, or basic conversation flows that don't require tools or knowledge bases.

When to Use Runs API: For any agent that has tools, knowledge bases, complex workflows, or requires asynchronous processing - which is most production use cases.

Orchestrate Agent API Reference

The watsonx Orchestrate Developer Edition provides several API endpoints for different use cases. Here's a comprehensive overview of each endpoint:

Method	Endpoint	Description	Use Case	Documentation
POST	`/api/v1/orchestrate/runs/stream`	Chat With Orchestrate Agent As Stream Real-time streaming responses from agents with immediate feedback	Interactive chat interfaces, real-time applications, live agent interactions	View Docs
GET	`/api/v1/orchestrate/runs`	List Orchestrate Agent Runs Retrieve a list of all agent run sessions with pagination support	Monitoring agent activity, audit trails, run history analysis	View Docs
POST	`/api/v1/orchestrate/runs`	Chat With Orchestrate Agent Standard asynchronous agent invocation (what we used in our example)	Production applications, complex workflows, tools and knowledge base access	View Docs
POST	`/api/v1/orchestrate/upload/s3`	Upload To S3 Upload files to S3 storage for agent processing	File processing, document analysis, data ingestion workflows	View Docs
POST	`/api/v1/orchestrate/runs/{run_id}/cancel`	Cancel An Orchestrate Agent Run Stop an ongoing agent execution	Resource management, timeout handling, user-initiated cancellations	View Docs
GET	`/api/v1/orchestrate/runs/{run_id}`	Get Orchestrate Agent Run Retrieve detailed information about a specific run	Run status checking, result retrieval, debugging agent executions	View Docs
GET	`/api/v1/orchestrate/runs/{run_id}/events`	Get Orchestrate Agent Run Events Retrieve detailed execution events and logs	Debugging, execution tracing, performance analysis	View Docs
POST	`/api/v1/orchestrate/agents/{agent_id}/chat/completions`	Chat With Agents Direct LLM interaction without agent orchestration (avoid this for agents with tools)	Simple chatbots, direct LLM access, basic text generation	View Docs

Key Takeaway: For agents with tools, knowledge bases, or complex workflows, use the POST /api/v1/orchestrate/runs endpoint (as demonstrated in our example). The streaming endpoint is great for real-time interactions, while the direct chat endpoint should only be used for simple LLM interactions without agent capabilities.

Advanced Parameters and Message Structure

The Orchestrate API supports rich message structures and advanced parameters for sophisticated agent interactions. Understanding these components is crucial for building production applications.

Message Structure Components

Basic Message Structure:

{
  "message": {
    "role": "user",
    "content": [
      {
        "response_type": "conversational_search"
      }
    ],
    "mentions": [
      {
        "type": "document",
        "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
        "name": "Sales Report 2024"
      }
    ],
    "document_ids": [
      "3c90c3cc-0d44-4b50-8888-8dd25736052a"
    ],
    "parent_message_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "additional_properties": {
      "wxa_message": {},
      "display_properties": {
        "skip_render": false,
        "is_async": false
      },
      "tool_calls": [
        {}
      ],
      "tool_call_id": "tool_123",
      "tool_name": "elasticsearch_tool",
      "wxo_connection_status": {
        "connection_status": "connected",
        "connection_message": "Successfully connected to Elasticsearch"
      }
    },
    "assistant_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "context": {
      "values": []
    },
    "step_history": [
      {}
    ],
    "message_state": {}
  }
}

Key Message Components Explained

Component	Purpose	Example
mentions	Reference specific documents, tools, or entities in the conversation	Document references, tool invocations, user mentions
document_ids	Specify which documents the agent should consider	Knowledge base documents, uploaded files, reports
additional_properties	Extended metadata for tool calls, display options, and status	Tool execution details, rendering preferences, connection status
llm_params	Control LLM behavior and response generation	Temperature, token limits, sampling parameters
guardrails	Content filtering and safety controls	HAP, social bias, PII detection

LLM Parameters for Fine-tuning

Common LLM Parameters:

"llm_params": {
  "temperature": 0.7,           // Controls randomness (0.0-1.0)
  "max_new_tokens": 1000,       // Maximum response length
  "top_p": 0.9,                // Nucleus sampling parameter
  "top_k": 50,                 // Top-k sampling
  "repetition_penalty": 1.1,    // Prevents repetitive text
  "stop_sequences": ["\n\n", "END"], // Stop generation at these tokens
  "time_limit": 30,            // Maximum processing time (seconds)
  "return_options": {
    "generated_tokens": true,   // Include token-level details
    "input_tokens": true,       // Include input tokenization
    "token_logprobs": true      // Include probability scores
  }
}

Guardrails for Content Safety

Guardrails Configuration:

"guardrails": {
  "hap": {                     // Harmful content detection
    "input": {
      "enabled": true,
      "threshold": 0.8
    },
    "output": {
      "enabled": true,
      "threshold": 0.8
    }
  },
  "social_bias": {             // Bias detection
    "input": {
      "enabled": true,
      "threshold": 0.7
    },
    "output": {
      "enabled": true,
      "threshold": 0.7
    }
  },
  "pii": {                     // Personal data protection
    "input": {
      "enabled": true,
      "threshold": 0.9
    },
    "output": {
      "enabled": true,
      "threshold": 0.9
    }
  }
}

Document and Knowledge Base Management

Orchestrate provides comprehensive APIs for managing documents and knowledge bases that agents can access during conversations.

Document Collection APIs

Method	Endpoint	Description	Use Case
GET	`/api/v1/orchestrate/documents/collections`	Fetch Document Collections List all document collections	Browse available knowledge bases, organize documents
POST	`/api/v1/orchestrate/documents/collections`	Create Document Collection Create a new document collection	Set up new knowledge bases, organize related documents
DELETE	`/api/v1/orchestrate/documents/collections/{id}`	Delete Document Collections Remove a document collection	Clean up unused knowledge bases, data management
GET	`/api/v1/orchestrate/documents/collections/{id}`	Get Document Collection By Id Retrieve specific collection details	Inspect collection metadata, verify contents
PATCH	`/api/v1/orchestrate/documents/collections/{id}`	Update Document Collection Modify collection properties	Update collection metadata, change settings

Individual Document APIs

Method	Endpoint	Description	Use Case
GET	`/api/v1/orchestrate/documents`	List Documents Retrieve all documents	Browse available documents, inventory management
POST	`/api/v1/orchestrate/documents`	Create Document Create a new document entry	Add new knowledge base entries, document metadata
GET	`/api/v1/orchestrate/documents/{id}`	Get Document By Id Retrieve specific document	Access document content, verify information
DELETE	`/api/v1/orchestrate/documents/{id}`	Delete Document By Id Remove a document	Clean up outdated information, data management
PATCH	`/api/v1/orchestrate/documents/{id}`	Update Document Modify document properties	Update document metadata, correct information
POST	`/api/v1/orchestrate/documents/upload`	Upload Document Upload document content	Add new documents to knowledge base, file ingestion
GET	`/api/v1/orchestrate/documents/{id}/download`	Download Document Retrieve document content	Export documents, backup content, content verification

Vector Index Management

Vector indices enable semantic search and retrieval from large document collections, making knowledge bases more powerful.

Method	Endpoint	Description	Use Case
GET	`/api/v1/orchestrate/vector-indices`	List Vector Indices Retrieve all vector indices	Browse available semantic search indices
POST	`/api/v1/orchestrate/vector-indices`	Create Vector Index Create a new semantic search index	Set up semantic search for document collections
DELETE	`/api/v1/orchestrate/vector-indices/{id}`	Delete Vector Index Remove a vector index	Clean up unused semantic search indices
GET	`/api/v1/orchestrate/vector-indices/{id}`	Get Vector Index By Id Retrieve specific index details	Inspect index configuration, verify settings
PATCH	`/api/v1/orchestrate/vector-indices/{id}`	Update Vector Index Modify index properties	Update index settings, change search parameters
GET	`/api/v1/orchestrate/vector-indices/{id}/collections`	Get Collections In Vector Index List collections in an index	See which document collections are indexed
PUT	`/api/v1/orchestrate/vector-indices/{id}/collections`	Add Collection To Vector Index Add documents to semantic search	Index new document collections for semantic search
PUT	`/api/v1/orchestrate/vector-indices/{id}/refresh`	Refresh Vector Index Update index with new documents	Sync index with updated document collections
PUT	`/api/v1/orchestrate/vector-indices/{id}/rebuild`	Rebuild Vector Index Completely rebuild the index	Major updates, index optimization, configuration changes
POST	`/api/v1/orchestrate/vector-indices/{id}/retrieve`	Retrieve Documents From The Vector Index Semantic search query	Find relevant documents using natural language queries

Integration Tip: Use document collections and vector indices together to create powerful knowledge bases. Upload documents to collections, then add those collections to vector indices for semantic search capabilities. Agents can then reference specific documents or perform semantic searches during conversations.

Step 8: Testing and Execution

8Run the script and inspect the response:

python agent_orchestrator.py

The script will:

Submit your question to the agent via the Runs API
Receive a run_id for tracking the asynchronous execution
Poll the run status until completion
Parse and display the final response with any citations

Why This Approach Works vs. Others

Many guides and blog posts incorrectly use the Chat Completions API for agent interactions. Here's why this approach is correct:

Aspect	Correct Approach (Runs API)	Incorrect Approach (Chat Completions API)
API Endpoint	`/api/v1/orchestrate/runs`	`/api/v1/orchestrate/{agent_id}/chat/completions`
Execution Model	Asynchronous with polling	Synchronous (immediate response)
Agent Features	Supports tools, knowledge bases, complex workflows	Direct LLM interaction only
Response Structure	Rich metadata with citations, tool usage	Simple text response
Use Case	Production agent orchestration	Simple chatbot without agent capabilities

Key Differences Explained

1. Asynchronous vs. Synchronous Processing

Correct Approach: Agent runs are asynchronous because they need time to execute tools, access knowledge bases, and perform complex workflows. The polling mechanism waits for the agent to complete its work.

Incorrect Approach: Expects immediate responses, which won't work for complex agent operations that require time to complete.

2. Payload Structure

Correct (Runs API):

{
    "agent_id": "your_agent_id",
    "message": {
        "role": "user",
        "content": "Your question here"
    }
}

Incorrect (Chat Completions API):

{
    "messages": [
        {
            "role": "human",
            "content": "Your question here"
        }
    ],
    "additional_parameters": {
        "max_tokens": 1000
    }
}

3. Response Handling

Correct Approach: Handles complex nested response structures that include citations, tool usage, and other agent-specific metadata.

Incorrect Approach: Expects simple chat completion responses without agent-specific features.

Summary

This local REST API flow gives you complete control over agent orchestration—no cloud dependencies, fully interactive and testable. The key is using the Runs API for agent interactions rather than the Chat Completions API.

This guide demonstrates the correct approach for working with watsonx Orchestrate agents via REST API.

About the Author

I'm Justin Townsend, a technology enthusiast and developer advocate with a passion for cloud-native development and AI orchestration. My work focuses on empowering developers with practical knowledge and hands-on experience in emerging technologies.

With expertise in distributed systems, API development, and artificial intelligence, I specialize in creating comprehensive guides that bridge the gap between complex technical concepts and real-world implementation. My goal is to provide developers with the tools and knowledge they need to build robust, scalable applications in the cloud.

When I'm not coding or writing technical content, you'll find me exploring new technologies, contributing to open-source projects, and sharing insights with the developer community. I believe in the power of knowledge sharing and collaborative learning to drive innovation in the tech industry.

The views and opinions expressed in this guide are my own and do not necessarily reflect those of any organization or company.

Mastering watsonx Orchestrate REST APIs

REST API Usage with the watsonx Orchestrate Developer Edition: A Complete Guide to Agent Orchestration

Table of Contents

What's Covered in This Example

Step 1: Environment Setup

Getting Your IBM Cloud Credentials

Adding and Activating Your Environment

Activating Your Environment

Optional: Skip Login After Pulling Images

Pull Images and Start Server

Use watsonx Orchestrate Developer Edition with ADK

Optional: Skip Login After Pulling Images

Pull Images and Start Server

Use watsonx Orchestrate Developer Edition with ADK

Step 2: Server Startup

Step 3: ADK Activation

Step 4: Port Configuration

Step 5: Authentication

Step 6: Agent Discovery

Step 7: Python Implementation

❌ What NOT to Do: Chat Completions API (Incorrect Approach)

Orchestrate Agent API Reference

Advanced Parameters and Message Structure

Message Structure Components

Key Message Components Explained

LLM Parameters for Fine-tuning

Guardrails for Content Safety

Document and Knowledge Base Management

Document Collection APIs

Individual Document APIs

Vector Index Management

Step 8: Testing and Execution

Why This Approach Works vs. Others

Key Differences Explained

1. Asynchronous vs. Synchronous Processing

2. Payload Structure

3. Response Handling

Summary

About the Author