Mastering watsonx Orchestrate REST APIs

watsonx Orchestrate Developer Edition Header

REST API Usage with the watsonx Orchestrate Developer Edition: A Complete Guide to Agent Orchestration

This blog post demonstrates how to properly use the REST API with the local watsonx Orchestrate server for agent orchestration. Unlike other guides that focus on direct LLM interactions, this post shows the correct approach for working with agents that have tools, knowledge bases, and complex workflows.

The post guides you step-by-step from configuring your local watsonx Orchestrate server to invoking agents via REST API using Python. We'll cover environment setup, authentication, agent discovery, and the crucial asynchronous execution pattern that makes agent orchestration work.

Table of Contents

What's Covered in This Example

This post demonstrates a complete flow from server startup to agent invocation:

  1. Preparing your local environment
  2. Launching the watsonx Orchestrate server
  3. Activating the ADK shell
  4. Determining the local port and API endpoints
  5. Extracting your Bearer token
  6. Listing your agents and getting the agent ID
  7. Coding a Python script to invoke agents properly
  8. Running the script and handling asynchronous responses

Step 1: Environment Setup

Choose your preferred environment setup method:

Use this option if you have an entitlement key from MyIBM and want to access watsonx.ai as model sources.

Create a .env file with the following variables:

WO_DEVELOPER_EDITION_SOURCE=myibm
WO_ENTITLEMENT_KEY=<YOUR_ENTITLEMENT_KEY>
WATSONX_APIKEY=<YOUR_WATSONX_API_KEY>
WATSONX_SPACE_ID=<YOUR_SPACE_ID>

Note: The WO_DEVELOPER_EDITION_SOURCE=myibm variable indicates you're using an entitlement key from MyIBM and accessing watsonx.ai as model sources.

Use this option to connect to your IBM Cloud watsonx Orchestrate instance. You'll need both the service instance URL and an API key.

Important: Don't use the credentials from the IBM Cloud resources page. Follow the procedure below to get the appropriate API key and service instance URL.

Getting Your IBM Cloud Credentials

  1. Log in to your watsonx Orchestrate instance
  2. Click your user icon on the top right and click Settings
  3. Go to the API details tab
  4. Copy the service instance URL
  5. Click the Generate API key button
  6. Generate an API Key: The page redirects you to the IBM Cloud Identity Access Management center. Click Create to create a new API Key
  7. Enter a name and description for your API Key
  8. Copy the API key and store it in a safe vault

Adding and Activating Your Environment

Use the ADK CLI to add and activate your environment:

orchestrate env add -n my-name -u https://my-service-instance-url --type ibm_iam --activate

Or with explicit API key:

orchestrate env add -n <environment-name> -u https://my-service-instance-url --api-key <my-api-key> --type ibm_iam --activate

Activating Your Environment

Run the following command to activate the environment you created:

orchestrate env activate <environment-name>

Note: You can also activate a local development environment. This environment is provided by the watsonx Orchestrate Developer Edition, a stripped-down version of watsonx Orchestrate that runs under a Docker container to be used as a development server.

Starting at version 1.5.0, you can use your watsonx Orchestrate account to pull images from Docker, and you no longer need an entitlement key to pull the images.

Use the following variables to pull watsonx Orchestrate Developer Edition images using the watsonx Orchestrate account:

WO_DEVELOPER_EDITION_SOURCE=orchestrate
WO_INSTANCE=<service_instance_url>
WO_API_KEY=<wxo_api_key>

Note: If image pulling fails, try setting WO_DEVELOPER_EDITION_SOURCE to myibm and add the WO_ENTITLEMENT_KEY variable.

Optional: Skip Login After Pulling Images

After pulling the images, you can optionally add the WO_DEVELOPER_EDITION_SKIP_LOGIN variable and set it to true to skip ICR login. The CLI won't pull new images, but you can still use the ones already available.

Pull Images and Start Server

With your local variables configured in the .env file, use the following command with the ADK to automatically pull the watsonx Orchestrate Developer Edition images and start the containers:

orchestrate server start -e <path-.env-file>

Use watsonx Orchestrate Developer Edition with ADK

After installing watsonx Orchestrate Developer Edition, manage and use it using ADK commands. With these commands, you can:

  • Start or stop the watsonx Orchestrate Developer Edition server
  • Activate your local environment
  • Launch the local UI
  • Reset the server
  • View server logs

Use this option if you have a watsonx.ai account and want to access watsonx.ai as model sources.

Use the following variables to pull watsonx Orchestrate Developer Edition images using the watsonx.ai account:

WO_DEVELOPER_EDITION_SOURCE=myibm
WO_ENTITLEMENT_KEY=<my_entitlement_key>
WATSONX_APIKEY=<my_watsonx_api_key>
WATSONX_SPACE_ID=<my_space_id>

Optional: Skip Login After Pulling Images

After pulling the images, you can optionally add the WO_DEVELOPER_EDITION_SKIP_LOGIN variable and set it to true to skip ICR login. The CLI won't pull new images, but you can still use the ones already available.

Pull Images and Start Server

With your local variables configured in the .env file, use the following command with the ADK to automatically pull the watsonx Orchestrate Developer Edition images and start the containers:

orchestrate server start -e <path-.env-file>

Use watsonx Orchestrate Developer Edition with ADK

After installing watsonx Orchestrate Developer Edition, manage and use it using ADK commands. With these commands, you can:

  • Start or stop the watsonx Orchestrate Developer Edition server
  • Activate your local environment
  • Launch the local UI
  • Reset the server
  • View server logs

Step 2: Server Startup

2The server boots up, connecting local credentials, containers, and runtimes.

source ./.venv/bin/activate
orchestrate server start --env-file .env

Step 3: ADK Activation

3This ensures that your client targets the local server configuration.

orchestrate env activate local

Step 4: Port Configuration

4When you inspect docker compose on your local machine, you'll find that the watsonx Orchestrate server runs on port 4321. You can access the Swagger UI at http://localhost:4321/docs.

The server runs on port 4321 by default, and the Swagger API explorer is available at:

Step 5: Authentication

5For local REST API calls, we need a Bearer token. We can extract it from the orchestrate cache:

grep -R "wxo_mcsp_token" ~/.cache/orchestrate/credentials.yaml

Windows Command Prompt Alternative:

findstr "wxo_mcsp_token" %USERPROFILE%\.cache\orchestrate\credentials.yaml

The file ~/.cache/orchestrate/credentials.yaml contains authentication information:

auth:
  local:
    wxo_mcsp_token: YOUR_TOKEN_HERE

Copy the token value for use in your API calls.

Step 6: Agent Discovery

6List your agents and get the ID for the agent you want to invoke:

orchestrate agents list

Example output:

Name Description LLM Style Collaborators Tools Knowledge Base ID
DataAnalysisAgent Elasticsearch Data Analysis Agent watsonx/meta-llama/llama-3-2-90b-vision-instruct default elasticsearch_tool b700b57a-29a1-40e1-b895-adf9eba4f907

Note your agent ID for the next step. In this example, the ID is b700b57a-29a1-40e1-b895-adf9eba4f907.

Step 7: Python Implementation

7Create a Python file called agent_orchestrator.py with the following code. This implementation correctly uses the Runs API for agent orchestration:

import requests
import json
import time

# Configuration
token = "your_wxo_mcsp_token_here"
agent_id = "your_agent_id_here"
base_url = "http://localhost:4321/api/v1/orchestrate"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

def get_run_result(run_id, max_attempts=30, delay=2):
    """Poll for the run result until it's complete"""
    get_url = f"{base_url}/runs/{run_id}"
    
    for attempt in range(max_attempts):
        try:
            resp = requests.get(get_url, headers=headers, timeout=10)
            resp.raise_for_status()
            run_data = resp.json()
            
            status = run_data.get("status", "unknown")
            print(f"Run status (attempt {attempt + 1}): {status}")
            
            if status == "completed":
                return run_data
            elif status in ["failed", "cancelled"]:
                return {"error": f"Run {status}", "details": run_data}
            
            time.sleep(delay)
            
        except Exception as e:
            print(f"Error checking run status: {e}")
            time.sleep(delay)
    
    return {"error": "Timeout waiting for run to complete"}

def call_orchestrate(question):
    """Invoke an agent using the Runs API"""
    chat_url = f"{base_url}/runs"
    
    payload = {
        "agent_id": agent_id,
        "message": {
            "role": "user",
            "content": question
        }
    }

    print("Sending payload:", json.dumps(payload, indent=2))
    resp = requests.post(chat_url, headers=headers, json=payload, timeout=10)
    resp.raise_for_status()
    data = resp.json()
    print("Initial API response:", json.dumps(data, indent=2))
    
    # Get the actual run result
    run_id = data.get("run_id")
    if run_id:
        print(f"Waiting for run {run_id} to complete...")
        run_result = get_run_result(run_id)
        print("Final run result:", json.dumps(run_result, indent=2))
        return run_result
    else:
        return {"error": "No run_id in response", "response": data}

def parse_agent_response(response):
    """Parse the agent response to extract answer and citations"""
    try:
        result_data = response.get('result', {}).get('data', {}).get('message', {})
        content = result_data.get('content', [])
        
        if content and len(content) > 0:
            answer_text = content[0].get('text', 'No answer found')
            citations = content[0].get('citations', [])
            
            print("Agent Response:", answer_text)
            
            if citations:
                print("\nSources:")
                for i, citation in enumerate(citations, 1):
                    title = citation.get('title', 'Unknown source')
                    url = citation.get('url', '')
                    if len(url) > 60:
                        display_url = url[:60] + "..."
                    else:
                        display_url = url
                    print(f"  [{i}] {title}")
                    if display_url:
                        print(f"      {display_url}")
        else:
            print("Agent: No content found in response")
    except Exception as e:
        print(f"Agent: Error parsing response - {e}")
        print("Raw response:", response)

# Example usage
if __name__ == "__main__":
    question = "Analyze the recent sales data from our Elasticsearch database"
    response = call_orchestrate(question)
    parse_agent_response(response)

Important: This implementation uses the Runs API (/runs) which is the correct endpoint for agent orchestration. Other guides may incorrectly use the Chat Completions API (/chat/completions) which is designed for direct LLM interactions, not agent orchestration.

❌ What NOT to Do: Chat Completions API (Incorrect Approach)

Many guides incorrectly suggest using the Chat Completions API for agent interactions. Here's an example of the wrong approach that you should avoid:

import requests
from modules import token

agent_id = "your_agent_id"
url = f"http://localhost:4321/api/v1/orchestrate/{agent_id}/chat/completions"

token = "your_wxo_mcsp_token"

payload = {
    "messages": [
        {
            "role": "human",
            "content": "Analyze the recent sales data from our Elasticsearch database"
        }
    ],
    "additional_parameters": {},
    "context": {},
    "stream": False
}

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Invoke the agent
response = requests.request("POST", url, json=payload, headers=headers)
print(f"Agent response:\n{response.text}\n")

Why This Approach Fails:

When to Use Chat Completions API: Only for direct LLM interactions without agent capabilities - simple chatbots, text generation, or basic conversation flows that don't require tools or knowledge bases.

When to Use Runs API: For any agent that has tools, knowledge bases, complex workflows, or requires asynchronous processing - which is most production use cases.

Orchestrate Agent API Reference

The watsonx Orchestrate Developer Edition provides several API endpoints for different use cases. Here's a comprehensive overview of each endpoint:

Method Endpoint Description Use Case Documentation
POST /api/v1/orchestrate/runs/stream Chat With Orchestrate Agent As Stream
Real-time streaming responses from agents with immediate feedback
Interactive chat interfaces, real-time applications, live agent interactions View Docs
GET /api/v1/orchestrate/runs List Orchestrate Agent Runs
Retrieve a list of all agent run sessions with pagination support
Monitoring agent activity, audit trails, run history analysis View Docs
POST /api/v1/orchestrate/runs Chat With Orchestrate Agent
Standard asynchronous agent invocation (what we used in our example)
Production applications, complex workflows, tools and knowledge base access View Docs
POST /api/v1/orchestrate/upload/s3 Upload To S3
Upload files to S3 storage for agent processing
File processing, document analysis, data ingestion workflows View Docs
POST /api/v1/orchestrate/runs/{run_id}/cancel Cancel An Orchestrate Agent Run
Stop an ongoing agent execution
Resource management, timeout handling, user-initiated cancellations View Docs
GET /api/v1/orchestrate/runs/{run_id} Get Orchestrate Agent Run
Retrieve detailed information about a specific run
Run status checking, result retrieval, debugging agent executions View Docs
GET /api/v1/orchestrate/runs/{run_id}/events Get Orchestrate Agent Run Events
Retrieve detailed execution events and logs
Debugging, execution tracing, performance analysis View Docs
POST /api/v1/orchestrate/agents/{agent_id}/chat/completions Chat With Agents
Direct LLM interaction without agent orchestration (avoid this for agents with tools)
Simple chatbots, direct LLM access, basic text generation View Docs

Key Takeaway: For agents with tools, knowledge bases, or complex workflows, use the POST /api/v1/orchestrate/runs endpoint (as demonstrated in our example). The streaming endpoint is great for real-time interactions, while the direct chat endpoint should only be used for simple LLM interactions without agent capabilities.

Advanced Parameters and Message Structure

The Orchestrate API supports rich message structures and advanced parameters for sophisticated agent interactions. Understanding these components is crucial for building production applications.

Message Structure Components

Basic Message Structure:

{
  "message": {
    "role": "user",
    "content": [
      {
        "response_type": "conversational_search"
      }
    ],
    "mentions": [
      {
        "type": "document",
        "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
        "name": "Sales Report 2024"
      }
    ],
    "document_ids": [
      "3c90c3cc-0d44-4b50-8888-8dd25736052a"
    ],
    "parent_message_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "additional_properties": {
      "wxa_message": {},
      "display_properties": {
        "skip_render": false,
        "is_async": false
      },
      "tool_calls": [
        {}
      ],
      "tool_call_id": "tool_123",
      "tool_name": "elasticsearch_tool",
      "wxo_connection_status": {
        "connection_status": "connected",
        "connection_message": "Successfully connected to Elasticsearch"
      }
    },
    "assistant_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "context": {
      "values": []
    },
    "step_history": [
      {}
    ],
    "message_state": {}
  }
}

Key Message Components Explained

Component Purpose Example
mentions Reference specific documents, tools, or entities in the conversation Document references, tool invocations, user mentions
document_ids Specify which documents the agent should consider Knowledge base documents, uploaded files, reports
additional_properties Extended metadata for tool calls, display options, and status Tool execution details, rendering preferences, connection status
llm_params Control LLM behavior and response generation Temperature, token limits, sampling parameters
guardrails Content filtering and safety controls HAP, social bias, PII detection

LLM Parameters for Fine-tuning

Common LLM Parameters:

"llm_params": {
  "temperature": 0.7,           // Controls randomness (0.0-1.0)
  "max_new_tokens": 1000,       // Maximum response length
  "top_p": 0.9,                // Nucleus sampling parameter
  "top_k": 50,                 // Top-k sampling
  "repetition_penalty": 1.1,    // Prevents repetitive text
  "stop_sequences": ["\n\n", "END"], // Stop generation at these tokens
  "time_limit": 30,            // Maximum processing time (seconds)
  "return_options": {
    "generated_tokens": true,   // Include token-level details
    "input_tokens": true,       // Include input tokenization
    "token_logprobs": true      // Include probability scores
  }
}

Guardrails for Content Safety

Guardrails Configuration:

"guardrails": {
  "hap": {                     // Harmful content detection
    "input": {
      "enabled": true,
      "threshold": 0.8
    },
    "output": {
      "enabled": true,
      "threshold": 0.8
    }
  },
  "social_bias": {             // Bias detection
    "input": {
      "enabled": true,
      "threshold": 0.7
    },
    "output": {
      "enabled": true,
      "threshold": 0.7
    }
  },
  "pii": {                     // Personal data protection
    "input": {
      "enabled": true,
      "threshold": 0.9
    },
    "output": {
      "enabled": true,
      "threshold": 0.9
    }
  }
}

Document and Knowledge Base Management

Orchestrate provides comprehensive APIs for managing documents and knowledge bases that agents can access during conversations.

Document Collection APIs

Method Endpoint Description Use Case
GET /api/v1/orchestrate/documents/collections Fetch Document Collections
List all document collections
Browse available knowledge bases, organize documents
POST /api/v1/orchestrate/documents/collections Create Document Collection
Create a new document collection
Set up new knowledge bases, organize related documents
DELETE /api/v1/orchestrate/documents/collections/{id} Delete Document Collections
Remove a document collection
Clean up unused knowledge bases, data management
GET /api/v1/orchestrate/documents/collections/{id} Get Document Collection By Id
Retrieve specific collection details
Inspect collection metadata, verify contents
PATCH /api/v1/orchestrate/documents/collections/{id} Update Document Collection
Modify collection properties
Update collection metadata, change settings

Individual Document APIs

Method Endpoint Description Use Case
GET /api/v1/orchestrate/documents List Documents
Retrieve all documents
Browse available documents, inventory management
POST /api/v1/orchestrate/documents Create Document
Create a new document entry
Add new knowledge base entries, document metadata
GET /api/v1/orchestrate/documents/{id} Get Document By Id
Retrieve specific document
Access document content, verify information
DELETE /api/v1/orchestrate/documents/{id} Delete Document By Id
Remove a document
Clean up outdated information, data management
PATCH /api/v1/orchestrate/documents/{id} Update Document
Modify document properties
Update document metadata, correct information
POST /api/v1/orchestrate/documents/upload Upload Document
Upload document content
Add new documents to knowledge base, file ingestion
GET /api/v1/orchestrate/documents/{id}/download Download Document
Retrieve document content
Export documents, backup content, content verification

Vector Index Management

Vector indices enable semantic search and retrieval from large document collections, making knowledge bases more powerful.

Method Endpoint Description Use Case
GET /api/v1/orchestrate/vector-indices List Vector Indices
Retrieve all vector indices
Browse available semantic search indices
POST /api/v1/orchestrate/vector-indices Create Vector Index
Create a new semantic search index
Set up semantic search for document collections
DELETE /api/v1/orchestrate/vector-indices/{id} Delete Vector Index
Remove a vector index
Clean up unused semantic search indices
GET /api/v1/orchestrate/vector-indices/{id} Get Vector Index By Id
Retrieve specific index details
Inspect index configuration, verify settings
PATCH /api/v1/orchestrate/vector-indices/{id} Update Vector Index
Modify index properties
Update index settings, change search parameters
GET /api/v1/orchestrate/vector-indices/{id}/collections Get Collections In Vector Index
List collections in an index
See which document collections are indexed
PUT /api/v1/orchestrate/vector-indices/{id}/collections Add Collection To Vector Index
Add documents to semantic search
Index new document collections for semantic search
PUT /api/v1/orchestrate/vector-indices/{id}/refresh Refresh Vector Index
Update index with new documents
Sync index with updated document collections
PUT /api/v1/orchestrate/vector-indices/{id}/rebuild Rebuild Vector Index
Completely rebuild the index
Major updates, index optimization, configuration changes
POST /api/v1/orchestrate/vector-indices/{id}/retrieve Retrieve Documents From The Vector Index
Semantic search query
Find relevant documents using natural language queries

Integration Tip: Use document collections and vector indices together to create powerful knowledge bases. Upload documents to collections, then add those collections to vector indices for semantic search capabilities. Agents can then reference specific documents or perform semantic searches during conversations.

Step 8: Testing and Execution

8Run the script and inspect the response:

python agent_orchestrator.py

The script will:

  1. Submit your question to the agent via the Runs API
  2. Receive a run_id for tracking the asynchronous execution
  3. Poll the run status until completion
  4. Parse and display the final response with any citations

Why This Approach Works vs. Others

Many guides and blog posts incorrectly use the Chat Completions API for agent interactions. Here's why this approach is correct:

Aspect Correct Approach (Runs API) Incorrect Approach (Chat Completions API)
API Endpoint /api/v1/orchestrate/runs /api/v1/orchestrate/{agent_id}/chat/completions
Execution Model Asynchronous with polling Synchronous (immediate response)
Agent Features Supports tools, knowledge bases, complex workflows Direct LLM interaction only
Response Structure Rich metadata with citations, tool usage Simple text response
Use Case Production agent orchestration Simple chatbot without agent capabilities

Key Differences Explained

1. Asynchronous vs. Synchronous Processing

Correct Approach: Agent runs are asynchronous because they need time to execute tools, access knowledge bases, and perform complex workflows. The polling mechanism waits for the agent to complete its work.

Incorrect Approach: Expects immediate responses, which won't work for complex agent operations that require time to complete.

2. Payload Structure

Correct (Runs API):

{
    "agent_id": "your_agent_id",
    "message": {
        "role": "user",
        "content": "Your question here"
    }
}

Incorrect (Chat Completions API):

{
    "messages": [
        {
            "role": "human",
            "content": "Your question here"
        }
    ],
    "additional_parameters": {
        "max_tokens": 1000
    }
}

3. Response Handling

Correct Approach: Handles complex nested response structures that include citations, tool usage, and other agent-specific metadata.

Incorrect Approach: Expects simple chat completion responses without agent-specific features.

Summary

This local REST API flow gives you complete control over agent orchestrationβ€”no cloud dependencies, fully interactive and testable. The key is using the Runs API for agent interactions rather than the Chat Completions API.


This guide demonstrates the correct approach for working with watsonx Orchestrate agents via REST API.


About the Author

I'm Justin Townsend, a technology enthusiast and developer advocate with a passion for cloud-native development and AI orchestration. My work focuses on empowering developers with practical knowledge and hands-on experience in emerging technologies.

With expertise in distributed systems, API development, and artificial intelligence, I specialize in creating comprehensive guides that bridge the gap between complex technical concepts and real-world implementation. My goal is to provide developers with the tools and knowledge they need to build robust, scalable applications in the cloud.

When I'm not coding or writing technical content, you'll find me exploring new technologies, contributing to open-source projects, and sharing insights with the developer community. I believe in the power of knowledge sharing and collaborative learning to drive innovation in the tech industry.

The views and opinions expressed in this guide are my own and do not necessarily reflect those of any organization or company.