AI / LLM
AI / LLM Integration
Section titled “AI / LLM Integration”AI/LLM streaming module for Atmosphere. Provides @AiEndpoint, @Prompt, StreamingSession, the AgentRuntime SPI for auto-detected AI framework adapters, and a built-in OpenAiCompatibleClient that works with Gemini, OpenAI, Ollama, and any OpenAI-compatible API.
Maven Coordinates
Section titled “Maven Coordinates”<dependency> <groupId>org.atmosphere</groupId> <artifactId>atmosphere-ai</artifactId> <version>${project.version}</version></dependency>Architecture
Section titled “Architecture”Atmosphere has two pluggable SPI layers. AsyncSupport adapts web containers — Jetty, Tomcat, Undertow. AgentRuntime adapts AI frameworks across all twelve runtimes (Built-in, Spring AI, LangChain4j, Google ADK, Embabel, Koog, Semantic Kernel, AgentScope, Spring AI Alibaba, Anthropic, Cohere, CrewAI). Same design pattern, same discovery mechanism:
| Concern | Transport layer | AI layer |
|---|---|---|
| SPI interface | AsyncSupport | AgentRuntime |
| What it adapts | Web containers (Jetty, Tomcat, Undertow) | AI frameworks (all twelve AgentRuntime adapters) |
| Discovery | Classpath scanning | ServiceLoader |
| Resolution | Best available container | Highest priority() among isAvailable() |
| Initialization | init(ServletConfig) | configure(LlmSettings) |
| Core method | service(req, res) | execute(AgentExecutionContext, StreamingSession) |
| Fallback | BlockingIOCometSupport | Built-in AgentRuntime (OpenAI-compatible) |
This is the Servlet model for AI agents: write your @Agent once, run it on any supported AgentRuntime — determined by classpath.
Quick Start — @AiEndpoint
Section titled “Quick Start — @AiEndpoint”@AiEndpoint(path = "/ai/chat", systemPrompt = "You are a helpful assistant", conversationMemory = true)public class MyChatBot {
@Prompt public void onPrompt(String message, StreamingSession session) { session.stream(message); // auto-detects the best available AgentRuntime from classpath }}The @AiEndpoint annotation replaces the boilerplate of @ManagedService + @Ready + @Disconnect + @Message for AI streaming use cases. The @Prompt method runs on a virtual thread.
session.stream(message) auto-detects the best available AgentRuntime implementation via ServiceLoader — drop an adapter JAR on the classpath and it just works.
AgentRuntime SPI
Section titled “AgentRuntime SPI”The AgentRuntime SPI dispatches the entire agent loop — tool calling, memory, RAG, retries — to the AI framework on the classpath. When multiple implementations are available, the one with the highest priority() that reports isAvailable() wins.
public interface AgentRuntime { String name(); // e.g. "langchain4j", "spring-ai" boolean isAvailable(); // checks classpath dependencies int priority(); // higher wins void configure(AiConfig.LlmSettings settings); // called once after resolution Set<AiCapability> capabilities(); // feature discovery void execute(AgentExecutionContext context, StreamingSession session); // full agent loop}| Classpath JAR | Auto-detected AgentRuntime | Priority |
|---|---|---|
atmosphere-ai (default) | Built-in OpenAiCompatibleClient (Gemini, OpenAI, Ollama) | 0 |
atmosphere-spring-ai | Spring AI ChatClient | 100 |
atmosphere-langchain4j | LangChain4j StreamingChatLanguageModel | 100 |
atmosphere-adk | Google ADK Runner | 100 |
atmosphere-embabel | Embabel AgentPlatform | 100 |
atmosphere-koog | JetBrains Koog AIAgent | 100 |
atmosphere-semantic-kernel | Microsoft Semantic Kernel ChatCompletionService | 100 |
atmosphere-agentscope | Alibaba AgentScope ReActAgent | 100 |
atmosphere-spring-ai-alibaba | Spring AI Alibaba ReactAgent (see runtime caveat below) | 100 |
atmosphere-anthropic | Anthropic AnthropicMessagesClient (built-in Messages API client) (requires anthropic.api.key) | 100 |
atmosphere-cohere | Cohere CohereChatClient (built-in Chat API client) (requires cohere.api.key) | 100 |
atmosphere-crewai | CrewAI CrewAiSidecarClient (out-of-process Python sidecar over HTTP+SSE) (requires ATMOSPHERE_CREWAI_SIDECAR_URL + live /health) | 50 |
To switch runtimes, change a single Maven dependency — no code changes needed.
Spring AI Alibaba runtime — Spring Boot 3 only today. Spring AI Alibaba
1.1.2.2is compiled against Spring AI1.1.6, andspring-ai-alibaba-graph-core-1.1.2.xhardcodes Spring AI 1.1.x-only types likeDeepSeekAssistantMessage, so the runtime requires Spring AI 1.1.6. Spring AI 1.1.6 itself requires Spring Boot 3 — it pins the SB3-era FQN ofRestClientAutoConfiguration, which Spring Boot 4 ships at a renamed FQN. Dropatmosphere-spring-ai-alibabainto a Spring Boot 3 sample (e.g.samples/spring-boot-ai-chat -Pspring-boot3) and it round-trips end-to-end (verified via chrome-devtools against Ollama). A Spring Boot 4 path will become possible once Alibaba publishes a Spring AI 2.x-alignedspring-ai-alibaba-agent-framework.atmosphere-agentscopeis unaffected and works on Spring Boot 4.
Per-Request Runtime Extensions
Section titled “Per-Request Runtime Extensions”Each AgentRuntime runs its framework’s “happy path” by default. For requests
that need framework-native composition (Spring AI advisor chain, LangChain4j
AiServices, Koog graph DSL, ADK multi-agent topology), a small per-request
helper attaches the framework-native object to AgentExecutionContext.metadata()
and the runtime applies it for that one call — no AgentRuntime SPI growth, no
mutation of shared beans. Every helper follows the CacheHint pattern:
from(context) and attach(context, ...) static methods, with strict type
checking that throws IllegalArgumentException on a wrong-type slot (silent
drops would mask the override never firing).
| Helper | Runtime | Slot it drives |
|---|---|---|
SpringAiAdvisors | Spring AI | ChatClient.prompt().advisors(...) — RAG, memory, guardrails, observability (additive — multiple advisors compose into a chain) |
LangChain4jAiServices | LangChain4j | Routes through caller’s AiServices-backed interface (TokenStream callbacks bridged to session) — gives access to maxSequentialToolsInvocations, custom system message providers, etc. |
KoogStrategy | Koog | Swaps default chatAgentStrategy() with a custom AIAgentGraphStrategy<String, String> from the strategy {} DSL |
AdkRootAgent | ADK | Replaces the runtime’s default LlmAgent with SequentialAgent / ParallelAgent / LoopAgent / any BaseAgent subclass |
SemanticKernelInvocation | Semantic Kernel | Per-request InvocationContext — unlocks KernelHooks, withMaxAutoInvokeAttempts, custom PromptExecutionSettings |
EmbabelPromptRunner | Embabel | UnaryOperator<PromptRunner> customizer applied AFTER the runtime’s default wiring — stack withTemperature / withModel / withGuardrails on top. Atmosphere-native dispatch path only |
AgentScopeAgent | AgentScope | Per-request ReActAgent — useful when different prompts route through different agent topologies (planner vs. quick lookup) without re-installing the runtime client |
SpringAiAlibabaRunnableConfig | Spring AI Alibaba | Per-request RunnableConfig — Alibaba’s natural per-invocation handle for threadId (memory thread continuation), checkPointId (resume), streamMode, metadata, store |
ToolLoopPolicies | Built-in, Koog | Per-request ToolLoopPolicy(maxIterations, OnMaxIterations) — Built-in honors via OpenAI-compatible tool loop, Koog via AIAgent.maxIterations |
Example — Spring AI advisor scoped to one request:
var safeGuard = SafeGuardAdvisor.builder() .sensitiveWords(List.of("badword")) .failureResponse("I cannot answer that.") .build();
var ctx = SpringAiAdvisors.attach(baseContext, safeGuard, new SimpleLoggerAdvisor());runtime.execute(ctx, session);Each helper ships with a unit-level *BridgeTest that proves the runtime
honors the sidecar (e.g. SpringAiAgentRuntime.execute calls
promptSpec.advisors(perRequestAdvisors) only when SpringAiAdvisors.from(context)
returns non-empty). See the per-module READMEs for full DSL examples:
modules/spring-ai,
modules/langchain4j,
modules/koog,
modules/adk,
and the ToolLoopPolicy section.
All eight framework-wrapping runtimes ship a per-request sidecar
(SpringAiAdvisors, LangChain4jAiServices, KoogStrategy, AdkRootAgent,
SemanticKernelInvocation, EmbabelPromptRunner, AgentScopeAgent,
SpringAiAlibabaRunnableConfig). The three native runtimes — Anthropic and
Cohere (direct HTTP clients) and CrewAI (a Python sidecar process) — wrap no
third-party composition DSL, so they take per-request configuration through
AgentExecutionContext (system prompt, retry policy, tool approval) rather
than a dedicated sidecar. Embabel
also has native streaming: when
StreamingPromptRunnerBuilder.streaming().generateStream() is available the
runtime emits Flux<String> chunks directly to the session, with graceful
fallback to runner.generateText(...) when the streaming API is absent.
Model-Lifecycle Observability
Section titled “Model-Lifecycle Observability”AgentLifecycleListener exposes three model-lifecycle hooks in addition to the
tool hooks (onToolCall/onToolResult):
default void onModelStart(String model, int messageCount, int toolCount) { }default void onModelEnd(String model, TokenUsage usage, long durationMillis) { }default void onModelError(String model, Throwable t) { }Built-in OpenAiCompatibleClient fires these around every model dispatch
(including each tool-loop round). AiEventForwardingListener is a built-in
adapter that translates the hooks into AiEvent.Progress frames on the
streaming session — opt in by attaching it via context.withListeners(...):
var listeners = List.of(new AiEventForwardingListener(session));runtime.execute(context.withListeners(listeners), session);// Browser receives wire frames like:// {"type":"progress","message":"model:start (gpt-4o, msgs=3, tools=2)"}// {"type":"progress","message":"model:end (gpt-4o, in=120, out=85, ms=842)"}Conversation Memory
Section titled “Conversation Memory”Enable multi-turn conversations with one annotation attribute:
@AiEndpoint(path = "/ai/chat", systemPrompt = "You are a helpful assistant", conversationMemory = true, maxHistoryMessages = 20)public class MyChat {
@Prompt public void onPrompt(String message, StreamingSession session) { session.stream(message); }}When conversationMemory = true, the framework:
- Captures each user message and the streamed assistant response (via
MemoryCapturingSession) - Stores them as conversation turns per
AtmosphereResource - Injects the full history into every subsequent
AiRequest - Clears the history when the resource disconnects
The default implementation is InMemoryConversationMemory (capped at maxHistoryMessages, default 20). For external storage, implement the AiConversationMemory SPI:
public interface AiConversationMemory { List<ChatMessage> getHistory(String conversationId); void addMessage(String conversationId, ChatMessage message); void clear(String conversationId); int maxMessages();}@AiTool — Framework-Agnostic Tool Calling
Section titled “@AiTool — Framework-Agnostic Tool Calling”Declare tools with @AiTool and they work with every tool-capable runtime:
Built-in, Spring AI, LangChain4j, Google ADK, Embabel, Koog, Semantic
Kernel, Anthropic, Cohere, and CrewAI. No framework-specific annotations
are needed. AgentScope and Spring AI Alibaba are still AgentRuntime
adapters, but their current SDKs do not expose a native tool-dispatch
loop for Atmosphere to wrap.
Defining Tools
Section titled “Defining Tools”public class AssistantTools {
@AiTool(name = "get_weather", description = "Returns a weather report for a city") public String getWeather( @Param(value = "city", description = "City name to get weather for") String city) { return weatherService.lookup(city); }
@AiTool(name = "convert_temperature", description = "Converts between Celsius and Fahrenheit") public String convertTemperature( @Param(value = "value", description = "Temperature value") double value, @Param(value = "from_unit", description = "'C' or 'F'") String fromUnit) { return "C".equalsIgnoreCase(fromUnit) ? String.format("%.1f°C = %.1f°F", value, value * 9.0 / 5.0 + 32) : String.format("%.1f°F = %.1f°C", value, (value - 32) * 5.0 / 9.0); }}Wiring Tools to an Endpoint
Section titled “Wiring Tools to an Endpoint”@AiEndpoint(path = "/ai/chat", systemPrompt = "You are a helpful assistant", conversationMemory = true, tools = AssistantTools.class)public class MyChat {
@Prompt public void onPrompt(String message, StreamingSession session) { session.stream(message); // tools are automatically available to the LLM }}How It Works
Section titled “How It Works”@AiTool methods ↓ scan at startupDefaultToolRegistry (global) ↓ selected per-endpoint via tools = {...}AiRequest.withTools(tools) ↓ bridged to backend-native formatLangChain4jToolBridge / SpringAiToolBridge / AdkToolBridge ↓ LLM decides to call a toolToolExecutor.execute(args) → result fed back to LLM ↓StreamingSession → WebSocket → browserThe tool bridge layer converts @AiTool to the native format at runtime:
| Backend | Bridge Class | Native Format |
|---|---|---|
| LangChain4j | LangChain4jToolBridge | ToolSpecification |
| Spring AI | SpringAiToolBridge | ToolCallback |
| Google ADK | AdkToolBridge | BaseTool |
@AiTool vs Native Annotations
Section titled “@AiTool vs Native Annotations”@AiTool (Atmosphere) | @Tool (LangChain4j) | FunctionCallback (Spring AI) | |
|---|---|---|---|
| Portable | Any backend | LangChain4j only | Spring AI only |
| Parameter metadata | @Param annotation | @P annotation | JSON Schema |
| Registration | ToolRegistry (global) | Per-service | Per-ChatClient |
To swap the AI backend, change only the Maven dependency — no tool code changes:
<!-- Use LangChain4j --><artifactId>atmosphere-langchain4j</artifactId>
<!-- Or Spring AI --><artifactId>atmosphere-spring-ai</artifactId>
<!-- Or Google ADK --><artifactId>atmosphere-adk</artifactId>See the spring-boot-ai-tools sample.
AiInterceptor
Section titled “AiInterceptor”Cross-cutting concerns (RAG, guardrails, logging) go through AiInterceptor, not subclassing:
@AiEndpoint(path = "/ai/chat", interceptors = {RagInterceptor.class, LoggingInterceptor.class})public class MyChat { ... }
public class RagInterceptor implements AiInterceptor { @Override public AiRequest preProcess(AiRequest request, AtmosphereResource resource) { String context = vectorStore.search(request.message()); return request.withMessage(context + "\n\n" + request.message()); }}Filters, Routing, and Middleware
Section titled “Filters, Routing, and Middleware”The AI module includes filters and middleware that sit between the @Prompt method and the LLM:
| Class | What it does |
|---|---|
PiiRedactionFilter | Buffers messages to sentence boundaries, redacts email/phone/SSN/CC |
ContentSafetyFilter | Pluggable SafetyChecker SPI — block, redact, or pass |
CostMeteringFilter | Per-session/broadcaster message counting with budget enforcement |
RoutingLlmClient | Route by content, model, cost, or latency rules |
FanOutStreamingSession | Concurrent N-model streaming: AllResponses, FirstComplete, FastestStreamingTexts |
StreamingTextBudgetManager | Per-user/org budgets with graceful degradation |
AiResponseCacheInspector | Cache control for AI messages in BroadcasterCache |
AiResponseCacheListener | Aggregate per-session events instead of per-message noise |
Cost and Latency Routing
Section titled “Cost and Latency Routing”RoutingLlmClient supports cost-based and latency-based routing rules:
var router = RoutingLlmClient.builder(defaultClient, "gemini-2.5-flash") .route(RoutingRule.costBased(5.0, List.of( new ModelOption(openaiClient, "gpt-4o", 0.01, 200, 10), new ModelOption(geminiClient, "gemini-flash", 0.001, 50, 5)))) .route(RoutingRule.latencyBased(100, List.of( new ModelOption(ollamaClient, "llama3.2", 0.0, 30, 3), new ModelOption(openaiClient, "gpt-4o-mini", 0.005, 80, 7)))) .build();Config-driven routing (Spring Boot)
Section titled “Config-driven routing (Spring Boot)”The Spring Boot starter exposes all four RoutingRule families — content, model, cost, and latency — through atmosphere.ai.routing.* properties, with no Java wiring. Off by default. When atmosphere.ai.routing.enabled=true, the starter wraps the framework-resolved LLM client in a RoutingLlmClient and installs it via AiConfig.installClient(...), so it becomes the client every AgentRuntime dispatch reads on the request critical path. When disabled, the resolved client is left untouched and the request path is byte-identical to today’s behavior.
Compose order. Rules are added to the router (and therefore evaluated first-match-wins) in the fixed order content → model → cost → latency — most-specific intent first. Within each family, rules are evaluated in config order. Requests matching no rule fall through to the resolved client and the configured default-model (or the AiConfig model when default-model is omitted). The compose order is pinned by AtmosphereRoutingAutoConfigurationTest.
| Property | Type | Default | Description |
|---|---|---|---|
atmosphere.ai.routing.enabled | boolean | false | Wrap the resolved client in a RoutingLlmClient. |
atmosphere.ai.routing.default-model | string | (resolved AiConfig model) | Fallback model when no rule matches. |
Content rules (atmosphere.ai.routing.content-rules[i]) match on the latest user message by case-insensitive substring; a rule with no model or no keywords is skipped with a WARN:
| Property | Description |
|---|---|
…content-rules[i].keywords | Keywords matched case-insensitively against the latest user message. |
…content-rules[i].model | Model to route to when a keyword matches. |
…content-rules[i].base-url / .api-key | Optional: target a different OpenAI-compatible endpoint for this rule. |
Model rules (atmosphere.ai.routing.model-rules[i]) match on the incoming request.model() by literal case-insensitive equals (not regex; the request is routed unchanged — the model name is not rewritten). A blank model-pattern is skipped with a WARN:
| Property | Description |
|---|---|
…model-rules[i].model-pattern | Routed when request.model() equalsIgnoreCase this value. |
…model-rules[i].base-url / .api-key | Optional: dedicated endpoint for the matched model. |
Cost rules (atmosphere.ai.routing.cost-rules[i]) pick the highest-capability model whose total cost (cost-per-streaming-text × request.maxStreamingTexts()) is within max-cost. Latency rules (atmosphere.ai.routing.latency-rules[i]) pick the highest-capability model whose average-latency-ms is within max-latency-ms. Each lists candidate models[j] carrying model, cost-per-streaming-text (null → 0.0), average-latency-ms (null → 0), capability (null → 0), and optional per-option base-url / api-key. A cost/latency rule with a null budget or empty models is skipped with a WARN.
# application.yml — all four families on one router. Evaluated content → model# → cost → latency, first match wins.atmosphere: ai: model: gemini-2.5-flash # resolved default client + model routing: enabled: true default-model: gemini-2.5-flash content-rules: - keywords: [code, function, refactor, stack trace] model: gpt-4o # reuses the resolved client; only the model changes base-url: https://api.openai.com/v1 # optional: dedicated endpoint api-key: ${OPENAI_API_KEY} # optional: key for that endpoint model-rules: - model-pattern: gpt-4o # request.model()=="gpt-4o" → dedicated client, unchanged request base-url: https://api.openai.com/v1 api-key: ${OPENAI_API_KEY} cost-rules: - max-cost: 5.0 # highest-capability model fitting the budget models: - model: gpt-4o cost-per-streaming-text: 0.01 capability: 10 - model: gpt-4o-mini cost-per-streaming-text: 0.001 capability: 5 latency-rules: - max-latency-ms: 100 # highest-capability model under 100ms models: - model: gemini-2.5-flash average-latency-ms: 50 capability: 8Every rule reuses the resolved client by default (same provider/credentials; only the model name changes where applicable); set base-url and/or api-key to target a different OpenAI-compatible endpoint. For routing logic beyond these property shapes (custom predicates, budget-degradation), build a RoutingLlmClient in Java and install it with AiConfig.installClient(router).
The CLI scaffolds the opt-in block for you: atmosphere new my-app --template ai-chat --routing appends a commented, ready-to-uncomment atmosphere.ai.routing.* tree to the generated application.yml. --routing is only valid for AI templates that ship an application.yml, and the emitted block is commented so the scaffold is byte-identical until you uncomment it.
Direct Adapter Usage
Section titled “Direct Adapter Usage”You can bypass @AiEndpoint and use adapters directly:
Spring AI:
var session = StreamingSessions.start(resource);springAiAdapter.stream(chatClient, prompt, session);LangChain4j:
var session = StreamingSessions.start(resource);model.chat(ChatMessage.userMessage(prompt), new AtmosphereStreamingResponseHandler(session));Google ADK:
var session = StreamingSessions.start(resource);adkAdapter.stream(new AdkRequest(runner, userId, sessionId, prompt), session);Embabel:
val session = StreamingSessions.start(resource)embabelAdapter.stream(AgentRequest("assistant") { channel -> agentPlatform.run(prompt, channel)}, session)Browser — React
Section titled “Browser — React”import { useStreaming } from 'atmosphere.js/react';
function AiChat() { const { fullText, isStreaming, stats, routing, send } = useStreaming({ request: { url: '/ai/chat', transport: 'websocket' }, });
return ( <div> <button onClick={() => send('Explain WebSockets')} disabled={isStreaming}> Ask </button> <p>{fullText}</p> {stats && <small>{stats.totalStreamingTexts} streaming texts</small>} {routing.model && <small>Model: {routing.model}</small>} </div> );}AI in Rooms — Virtual Members
Section titled “AI in Rooms — Virtual Members”var client = AiConfig.get().client();var assistant = new LlmRoomMember("assistant", client, "gpt-5", "You are a helpful coding assistant");
Room room = rooms.room("dev-chat");room.joinVirtual(assistant);// Now when any user sends a message, the LLM responds in the same roomStreamingSession Wire Protocol
Section titled “StreamingSession Wire Protocol”The client receives JSON messages over WebSocket/SSE:
{"type":"streaming-text","content":"Hello"}— a single streaming text{"type":"progress","message":"Thinking..."}— status update{"type":"complete"}— stream finished{"type":"error","message":"..."}— stream failed
Configuration
Section titled “Configuration”Configure the built-in client with environment variables:
| Variable | Description | Default |
|---|---|---|
LLM_MODE | remote (cloud) or local (Ollama) | remote |
LLM_MODEL | gemini-2.5-flash, gpt-5, o3-mini, llama3.2, … | gemini-2.5-flash |
LLM_API_KEY | API key (or GEMINI_API_KEY for Gemini) | — |
LLM_BASE_URL | Override endpoint (auto-detected from model name) | auto |
Key Components
Section titled “Key Components”| Class | Description |
|---|---|
@AiEndpoint | Marks a class as an AI chat endpoint with a path, system prompt, and interceptors |
@Prompt | Marks the method that handles user messages |
@AiTool | Marks a method as an AI-callable tool (framework-agnostic) |
@Param | Describes a tool parameter’s name, description, and required flag |
AgentRuntime | SPI for AI framework backends (ServiceLoader-discovered) |
AiRequest | Framework-agnostic request record (message, systemPrompt, model, userId, sessionId, agentId, conversationId, metadata) |
AiEvent | Sealed interface: 15 structured event types (TextDelta, ToolStart, ToolResult, AgentStep, EntityStart, Handoff, ApprovalRequired, etc.) |
AiCapability | Enum for endpoint capability requirements (TEXT_STREAMING, TOOL_CALLING, STRUCTURED_OUTPUT, etc.) |
AiInterceptor | Pre/post processing hooks for RAG, guardrails, logging |
AiConversationMemory | SPI for conversation history storage |
MemoryStrategy | Pluggable memory selection: MessageWindowStrategy, TokenWindowStrategy, SummarizingStrategy |
StructuredOutputParser | SPI for JSON Schema generation and typed output parsing (built-in: JacksonStructuredOutputParser) |
StreamingSession | Delivers streaming texts, events, progress updates, and metadata to the client |
StreamingSessions | Factory for creating StreamingSession instances |
OpenAiCompatibleClient | Built-in HTTP client for OpenAI-compatible APIs |
RoutingLlmClient | Routes prompts to different LLM backends based on rules |
ToolRegistry | Global registry for @AiTool definitions |
ModelRouter | SPI for intelligent model routing and failover |
AiGuardrail | SPI for pre/post-LLM safety inspection |
AiMetrics | SPI for AI observability (streaming texts, latency, cost) |
ConversationPersistence | SPI for durable conversation storage (Redis, SQLite) |
RetryPolicy | Exponential backoff with circuit-breaker semantics |
Approval Gates
Section titled “Approval Gates”@RequiresApproval pauses tool execution until the client approves. The virtual thread parks cheaply on a CompletableFuture — no carrier thread consumed.
@AiTool(name = "delete_account", description = "Permanently delete a user account")@RequiresApproval("This will permanently delete the account. Are you sure?")public String deleteAccount(@Param("accountId") String accountId) { return accountService.delete(accountId);}Wire Protocol
Section titled “Wire Protocol”When the LLM calls a @RequiresApproval tool, the client receives an approval-required event:
{"event":"approval-required","data":{ "approvalId":"apr_a1b2c3d4e5f6", "toolName":"delete_account", "arguments":{"accountId":"user-42"}, "message":"This will permanently delete the account. Are you sure?", "expiresIn":300}}The client responds with:
/__approval/apr_a1b2c3d4e5f6/approve— tool executes/__approval/apr_a1b2c3d4e5f6/deny— tool returns cancelled
Default timeout: 5 minutes. Configurable via @RequiresApproval(timeoutSeconds = 120).
How It Works
Section titled “How It Works”AiStreamingSession.wrapApprovalGates()wraps@RequiresApprovaltools withApprovalGateExecutor- When the LLM calls the tool,
ApprovalGateExecutorparks the virtual thread onCompletableFuture.get(timeout) - The session emits
AiEvent.ApprovalRequiredto the client AiEndpointHandlerfast-paths/__approval/messages to the session’sApprovalRegistry(before prompt dispatch)ApprovalRegistry.tryResolve()completes the future, unparking the virtual thread- On transport reconnect, a fallback scan across all active sessions ensures the approval reaches the parked thread
ADK ToolConfirmation Bridge
Section titled “ADK ToolConfirmation Bridge”When running on Google ADK, Atmosphere also calls toolContext.requestConfirmation() to give ADK native visibility into the approval pause. If ADK resolves a confirmation before Atmosphere (e.g., via its own UI), the ADK denial short-circuits without calling the executor. This creates a two-layer model: Atmosphere-level (cross-runtime) + ADK-native (runtime-specific).
All runtimes with TOOL_CALLING also declare AiCapability.TOOL_APPROVAL (Built-in, Spring AI, LangChain4j, ADK, Embabel, Koog, Semantic Kernel, Anthropic, Cohere, CrewAI). AgentScope and Spring AI Alibaba are excluded because their underlying SDKs lack a native tool-call dispatch loop.
Context Compaction SPI
Section titled “Context Compaction SPI”The AiCompactionStrategy SPI controls how conversation history is compacted when it exceeds the configured limit. Unlike MemoryStrategy (which selects messages for the next request — read path), compaction permanently reduces stored history (write path).
public interface AiCompactionStrategy { List<ChatMessage> compact(List<ChatMessage> messages, int maxMessages); String name();}Built-in Strategies
Section titled “Built-in Strategies”SlidingWindowCompaction (default) — drops the oldest non-system messages until under the limit. System messages are always preserved.
SummarizingCompaction — condenses old messages into a single system-role summary, preserving the most recent messages verbatim. The recent window size is configurable (default: 6).
// Default: sliding windowvar memory = new InMemoryConversationMemory(20);
// Custom: summarization with 8-message recent windowvar memory = new InMemoryConversationMemory(20, new SummarizingCompaction(8));ADK Bridge
Section titled “ADK Bridge”AdkCompactionBridge.toAdkConfig() maps Atmosphere compaction settings to ADK’s EventsCompactionConfig for native compaction when using the ADK runtime.
Artifact Persistence SPI
Section titled “Artifact Persistence SPI”The ArtifactStore SPI provides binary artifact persistence across agent runs. Use cases include agent-generated reports, images, code files, and content shared between coordinated agents.
public interface ArtifactStore { Artifact save(Artifact artifact); // auto-versions Optional<Artifact> load(String namespace, String artifactId); // latest version Optional<Artifact> load(String namespace, String artifactId, int version); List<Artifact> list(String namespace); // latest of each boolean delete(String namespace, String artifactId); // all versions void deleteAll(String namespace);}Artifact Record
Section titled “Artifact Record”public record Artifact( String id, // unique identifier String namespace, // grouping key (session ID, agent name, user ID) String fileName, // human-readable name ("report.pdf") String mimeType, // MIME type ("application/pdf") byte[] data, // binary content (defensively copied) int version, // auto-incremented per save Map<String, String> metadata, // arbitrary key-value pairs Instant createdAt) { }Byte arrays are defensively copied on construction and on access — callers cannot mutate persisted data.
Implementations
Section titled “Implementations”InMemoryArtifactStore— default, for development and testing. Data does not survive JVM restart.- ADK bridge —
AdkArtifactBridge.toAdkService()wraps anArtifactStoreas ADK’sBaseArtifactService.
Interceptor Disconnect Lifecycle
Section titled “Interceptor Disconnect Lifecycle”AiInterceptor includes an onDisconnect hook called before conversation memory is cleared. This enables fact extraction, summary persistence, and other cleanup that requires access to the conversation history.
public interface AiInterceptor { default AiRequest preProcess(AiRequest request, AtmosphereResource resource) { return request; } default void postProcess(AiRequest request, AtmosphereResource resource) { } default void onDisconnect(String userId, String conversationId, List<ChatMessage> history) { }}LongTermMemoryInterceptor.onDisconnect() uses this to extract facts from the full conversation on session close via OnSessionCloseStrategy.
Execution order: preProcess runs FIFO, postProcess runs LIFO, onDisconnect runs FIFO. Exceptions in one interceptor do not prevent others from being called.
AiEvent Model
Section titled “AiEvent Model”The AiEvent sealed interface provides 15 structured event types emitted via session.emit(). All runtimes map their native events to this common model.
| Event | Description |
|---|---|
TextDelta | Streaming token |
TextComplete | Final assembled text |
ToolStart | Tool invocation begins (name + arguments) |
ToolResult | Tool executed successfully (name + result) |
ToolError | Tool execution failed |
AgentStep | Orchestration step (ADK agent steps, Embabel planning) |
StructuredField | Structured output field arrival |
EntityStart / EntityComplete | Structured entity streaming |
RoutingDecision | Backend routing event |
Progress | Long-running operation status |
Handoff | Agent handoff notification |
ApprovalRequired | Human approval gate |
Error | Error with recovery hint |
Complete | Stream completed with usage metadata |
Runtime Event Normalization
Section titled “Runtime Event Normalization”| Source | Atmosphere Event |
|---|---|
ADK event.functionCalls() | AiEvent.ToolStart |
ADK event.functionResponses() | AiEvent.ToolResult |
ADK event.author() (non-partial) | AiEvent.AgentStep |
ADK event.usageMetadata() | ai.tokens.input/output/total metadata |
Koog onToolCallStarting | AiEvent.ToolStart |
Koog onToolCallCompleted | AiEvent.ToolResult |
Koog onToolCallFailed | AiEvent.ToolError |
Koog StreamFrame.ReasoningDelta | AiEvent.Progress |
Embabel MessageOutputChannelEvent | AiEvent.TextDelta |
Embabel ProgressOutputChannelEvent | AiEvent.AgentStep |
Capability Matrix
Section titled “Capability Matrix”Each runtime declares capabilities via AiCapability. The framework uses
these flags for model routing, tool negotiation, and feature discovery. The
table below mirrors the twelve-runtime snapshot pinned by each runtime’s
expectedCapabilities() contract test; Y means the runtime declares the
capability and the contract tests assert it.
Legend: TS=TEXT_STREAMING, TC=TOOL_CALLING, SO=STRUCTURED_OUTPUT, SP=SYSTEM_PROMPT, AO=AGENT_ORCHESTRATION, CM=CONVERSATION_MEMORY, TA=TOOL_APPROVAL, V=VISION, A=AUDIO, MM=MULTI_MODAL, PC=PROMPT_CACHING, TU=TOKEN_USAGE, PRR=PER_REQUEST_RETRY, TCD=TOOL_CALL_DELTA, BE=BUDGET_ENFORCEMENT, CS=CONFIDENCE_SCORES, PSV=PASSIVATION.
| Runtime | Priority | TS | TC | SO | SP | AO | CM | TA | V | A | MM | PC | TU | PRR | TCD | BE | CS | PSV |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Built-in | 0 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| Spring AI | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| LangChain4j | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| Google ADK | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| Embabel | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |||
| JetBrains Koog | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| Alibaba AgentScope | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||
| Spring AI Alibaba | 100 | Y¹ | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||
| Microsoft Semantic Kernel | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||||
| Anthropic | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||
| Cohere | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||||
| CrewAI² | 50 | Y | Y | Y | Y | Y | Y | Y | Y |
¹ Spring AI Alibaba emits its final reply as one Atmosphere stream chunk, but
the upstream ReactAgent.call() path is buffered rather than token-by-token.
² CrewAI is the only out-of-process runtime: the Java side talks
HTTP + SSE to a Python sidecar (atmosphere-crewai-bridge,
FastAPI + crewai 1.14). isAvailable() is config-gated on
ATMOSPHERE_CREWAI_SIDECAR_URL pointing at a running sidecar whose
/health responds OK — Runtime Truth gate, no classpath-only
advertisement. The runtime does not own a Java-side conversation-memory
store; per-task memory lives inside the sidecar’s crew rather than
being declared at the Atmosphere layer.
How structured output works: AiPipeline wraps the streaming session with StructuredOutputCapturingSession and augments the system prompt with JSON-schema instructions before the runtime runs. Any runtime that honors SYSTEM_PROMPT therefore gets STRUCTURED_OUTPUT automatically via the pipeline — no per-runtime adapter code required. BuiltInAgentRuntime additionally enables native jsonMode on the OpenAI-compatible client for provider-level JSON enforcement on top of the pipeline wrap. Source: modules/ai/src/main/java/org/atmosphere/ai/pipeline/AiPipeline.java:128-135, modules/ai/src/main/java/org/atmosphere/ai/llm/BuiltInAgentRuntime.java:72-74.
Tool-dispatch bridges: every runtime that declares TOOL_CALLING routes
every Atmosphere @AiTool invocation through a runtime-native bridge that
calls ToolExecutionHelper.executeWithApproval, so @RequiresApproval gates
fire uniformly. AgentScope ships AgentScopeToolBridge, Spring AI Alibaba
ships SpringAiAlibabaToolBridge, Semantic Kernel ships
SemanticKernelToolBridge, Embabel ships EmbabelToolBridge, Koog ships
AtmosphereToolBridge, and the JDK runtimes (Built-in, Spring AI,
LangChain4j, ADK) wire their tool callbacks through the shared helper
directly.
Spring AI Alibaba token usage: ReactAgent.call() returns an
AssistantMessage without usage metadata, so Atmosphere wraps the configured
Spring AI ChatModel bean in a UsageCapturingChatModel decorator at
auto-configuration time. Every underlying ChatModel.call(Prompt) performed
by the ReAct graph during a single dispatch accumulates
ChatResponseMetadata.getUsage() into a per-thread collector; the runtime
emits one typed TokenUsage record via session.usage(...) after the agent
returns. Token-based AiBudget breaches therefore trip uniformly alongside
wall-clock breaches. Custom ReactAgent beans that bypass the auto-config
also bypass the wrapper — see AtmosphereSpringAiAlibabaAutoConfiguration
for the wrapping point.
Cross-Runtime Contract Tests (TCK)
Section titled “Cross-Runtime Contract Tests (TCK)”The AbstractAgentRuntimeContractTest base class in atmosphere-ai-test enforces a minimum contract across all runtime adapters.
public abstract class AbstractAgentRuntimeContractTest { protected abstract AgentRuntime createRuntime(); protected abstract AgentExecutionContext createTextContext(); protected abstract AgentExecutionContext createToolCallContext(); protected abstract AgentExecutionContext createErrorContext();
// Enforced contracts: // - runtimeDeclaresMinimumCapabilities (TEXT_STREAMING) // - runtimeHasNonBlankName // - runtimeIsAvailable // - textStreamingCompletesSession (10s timeout) // - toolCallExecutesIfSupported // - errorContextTriggersSessionError}Add atmosphere-ai-test as a test dependency and extend the base class:
<dependency> <groupId>org.atmosphere</groupId> <artifactId>atmosphere-ai-test</artifactId> <scope>test</scope></dependency>The RecordingSession test double captures all events, text chunks, metadata, and errors for assertion. The contract suite is implemented for all twelve runtime adapters.
Samples
Section titled “Samples”- Spring Boot AI Chat — built-in client with Gemini/OpenAI/Ollama
- Spring Boot AI Tools — framework-agnostic
@AiToolpipeline - Spring Boot AI Classroom — rooms-based multi-room AI with an Expo client
- Spring Boot RAG Chat — Spring AI VectorStore-backed RAG agent
- Quarkus AI Chat — five
@AiEndpointdemos on Quarkus + LangChain4j bridge - Spring Boot Dentist Agent —
@Agentwith tools, memory, and approval gates
See Also
Section titled “See Also”- Spring AI Adapter
- LangChain4j Adapter
- Google ADK Adapter
- Embabel Adapter
- MCP Server — AI-MCP bridge for tool-driven streaming
- Rooms & Presence — AI virtual members in rooms
- atmosphere.js —
useStreamingReact/Vue/Svelte hooks - Module README