Skip to content

AI / LLM

AI/LLM streaming module for Atmosphere. Provides @AiEndpoint, @Prompt, StreamingSession, the AgentRuntime SPI for auto-detected AI framework adapters, and a built-in OpenAiCompatibleClient that works with Gemini, OpenAI, Ollama, and any OpenAI-compatible API.

<dependency>
<groupId>org.atmosphere</groupId>
<artifactId>atmosphere-ai</artifactId>
<version>${project.version}</version>
</dependency>

Atmosphere has two pluggable SPI layers. AsyncSupport adapts web containers — Jetty, Tomcat, Undertow. AgentRuntime adapts AI frameworks across all twelve runtimes (Built-in, Spring AI, LangChain4j, Google ADK, Embabel, Koog, Semantic Kernel, AgentScope, Spring AI Alibaba, Anthropic, Cohere, CrewAI). Same design pattern, same discovery mechanism:

ConcernTransport layerAI layer
SPI interfaceAsyncSupportAgentRuntime
What it adaptsWeb containers (Jetty, Tomcat, Undertow)AI frameworks (all twelve AgentRuntime adapters)
DiscoveryClasspath scanningServiceLoader
ResolutionBest available containerHighest priority() among isAvailable()
Initializationinit(ServletConfig)configure(LlmSettings)
Core methodservice(req, res)execute(AgentExecutionContext, StreamingSession)
FallbackBlockingIOCometSupportBuilt-in AgentRuntime (OpenAI-compatible)

This is the Servlet model for AI agents: write your @Agent once, run it on any supported AgentRuntime — determined by classpath.

@AiEndpoint(path = "/ai/chat",
systemPrompt = "You are a helpful assistant",
conversationMemory = true)
public class MyChatBot {
@Prompt
public void onPrompt(String message, StreamingSession session) {
session.stream(message); // auto-detects the best available AgentRuntime from classpath
}
}

The @AiEndpoint annotation replaces the boilerplate of @ManagedService + @Ready + @Disconnect + @Message for AI streaming use cases. The @Prompt method runs on a virtual thread.

session.stream(message) auto-detects the best available AgentRuntime implementation via ServiceLoader — drop an adapter JAR on the classpath and it just works.

The AgentRuntime SPI dispatches the entire agent loop — tool calling, memory, RAG, retries — to the AI framework on the classpath. When multiple implementations are available, the one with the highest priority() that reports isAvailable() wins.

public interface AgentRuntime {
String name(); // e.g. "langchain4j", "spring-ai"
boolean isAvailable(); // checks classpath dependencies
int priority(); // higher wins
void configure(AiConfig.LlmSettings settings); // called once after resolution
Set<AiCapability> capabilities(); // feature discovery
void execute(AgentExecutionContext context, StreamingSession session); // full agent loop
}
Classpath JARAuto-detected AgentRuntimePriority
atmosphere-ai (default)Built-in OpenAiCompatibleClient (Gemini, OpenAI, Ollama)0
atmosphere-spring-aiSpring AI ChatClient100
atmosphere-langchain4jLangChain4j StreamingChatLanguageModel100
atmosphere-adkGoogle ADK Runner100
atmosphere-embabelEmbabel AgentPlatform100
atmosphere-koogJetBrains Koog AIAgent100
atmosphere-semantic-kernelMicrosoft Semantic Kernel ChatCompletionService100
atmosphere-agentscopeAlibaba AgentScope ReActAgent100
atmosphere-spring-ai-alibabaSpring AI Alibaba ReactAgent (see runtime caveat below)100
atmosphere-anthropicAnthropic AnthropicMessagesClient (built-in Messages API client) (requires anthropic.api.key)100
atmosphere-cohereCohere CohereChatClient (built-in Chat API client) (requires cohere.api.key)100
atmosphere-crewaiCrewAI CrewAiSidecarClient (out-of-process Python sidecar over HTTP+SSE) (requires ATMOSPHERE_CREWAI_SIDECAR_URL + live /health)50

To switch runtimes, change a single Maven dependency — no code changes needed.

Spring AI Alibaba runtime — Spring Boot 3 only today. Spring AI Alibaba 1.1.2.2 is compiled against Spring AI 1.1.6, and spring-ai-alibaba-graph-core-1.1.2.x hardcodes Spring AI 1.1.x-only types like DeepSeekAssistantMessage, so the runtime requires Spring AI 1.1.6. Spring AI 1.1.6 itself requires Spring Boot 3 — it pins the SB3-era FQN of RestClientAutoConfiguration, which Spring Boot 4 ships at a renamed FQN. Drop atmosphere-spring-ai-alibaba into a Spring Boot 3 sample (e.g. samples/spring-boot-ai-chat -Pspring-boot3) and it round-trips end-to-end (verified via chrome-devtools against Ollama). A Spring Boot 4 path will become possible once Alibaba publishes a Spring AI 2.x-aligned spring-ai-alibaba-agent-framework. atmosphere-agentscope is unaffected and works on Spring Boot 4.

Each AgentRuntime runs its framework’s “happy path” by default. For requests that need framework-native composition (Spring AI advisor chain, LangChain4j AiServices, Koog graph DSL, ADK multi-agent topology), a small per-request helper attaches the framework-native object to AgentExecutionContext.metadata() and the runtime applies it for that one call — no AgentRuntime SPI growth, no mutation of shared beans. Every helper follows the CacheHint pattern: from(context) and attach(context, ...) static methods, with strict type checking that throws IllegalArgumentException on a wrong-type slot (silent drops would mask the override never firing).

HelperRuntimeSlot it drives
SpringAiAdvisorsSpring AIChatClient.prompt().advisors(...) — RAG, memory, guardrails, observability (additive — multiple advisors compose into a chain)
LangChain4jAiServicesLangChain4jRoutes through caller’s AiServices-backed interface (TokenStream callbacks bridged to session) — gives access to maxSequentialToolsInvocations, custom system message providers, etc.
KoogStrategyKoogSwaps default chatAgentStrategy() with a custom AIAgentGraphStrategy<String, String> from the strategy {} DSL
AdkRootAgentADKReplaces the runtime’s default LlmAgent with SequentialAgent / ParallelAgent / LoopAgent / any BaseAgent subclass
SemanticKernelInvocationSemantic KernelPer-request InvocationContext — unlocks KernelHooks, withMaxAutoInvokeAttempts, custom PromptExecutionSettings
EmbabelPromptRunnerEmbabelUnaryOperator<PromptRunner> customizer applied AFTER the runtime’s default wiring — stack withTemperature / withModel / withGuardrails on top. Atmosphere-native dispatch path only
AgentScopeAgentAgentScopePer-request ReActAgent — useful when different prompts route through different agent topologies (planner vs. quick lookup) without re-installing the runtime client
SpringAiAlibabaRunnableConfigSpring AI AlibabaPer-request RunnableConfig — Alibaba’s natural per-invocation handle for threadId (memory thread continuation), checkPointId (resume), streamMode, metadata, store
ToolLoopPoliciesBuilt-in, KoogPer-request ToolLoopPolicy(maxIterations, OnMaxIterations) — Built-in honors via OpenAI-compatible tool loop, Koog via AIAgent.maxIterations

Example — Spring AI advisor scoped to one request:

var safeGuard = SafeGuardAdvisor.builder()
.sensitiveWords(List.of("badword"))
.failureResponse("I cannot answer that.")
.build();
var ctx = SpringAiAdvisors.attach(baseContext, safeGuard, new SimpleLoggerAdvisor());
runtime.execute(ctx, session);

Each helper ships with a unit-level *BridgeTest that proves the runtime honors the sidecar (e.g. SpringAiAgentRuntime.execute calls promptSpec.advisors(perRequestAdvisors) only when SpringAiAdvisors.from(context) returns non-empty). See the per-module READMEs for full DSL examples: modules/spring-ai, modules/langchain4j, modules/koog, modules/adk, and the ToolLoopPolicy section.

All eight framework-wrapping runtimes ship a per-request sidecar (SpringAiAdvisors, LangChain4jAiServices, KoogStrategy, AdkRootAgent, SemanticKernelInvocation, EmbabelPromptRunner, AgentScopeAgent, SpringAiAlibabaRunnableConfig). The three native runtimes — Anthropic and Cohere (direct HTTP clients) and CrewAI (a Python sidecar process) — wrap no third-party composition DSL, so they take per-request configuration through AgentExecutionContext (system prompt, retry policy, tool approval) rather than a dedicated sidecar. Embabel also has native streaming: when StreamingPromptRunnerBuilder.streaming().generateStream() is available the runtime emits Flux<String> chunks directly to the session, with graceful fallback to runner.generateText(...) when the streaming API is absent.

AgentLifecycleListener exposes three model-lifecycle hooks in addition to the tool hooks (onToolCall/onToolResult):

default void onModelStart(String model, int messageCount, int toolCount) { }
default void onModelEnd(String model, TokenUsage usage, long durationMillis) { }
default void onModelError(String model, Throwable t) { }

Built-in OpenAiCompatibleClient fires these around every model dispatch (including each tool-loop round). AiEventForwardingListener is a built-in adapter that translates the hooks into AiEvent.Progress frames on the streaming session — opt in by attaching it via context.withListeners(...):

var listeners = List.of(new AiEventForwardingListener(session));
runtime.execute(context.withListeners(listeners), session);
// Browser receives wire frames like:
// {"type":"progress","message":"model:start (gpt-4o, msgs=3, tools=2)"}
// {"type":"progress","message":"model:end (gpt-4o, in=120, out=85, ms=842)"}

Enable multi-turn conversations with one annotation attribute:

@AiEndpoint(path = "/ai/chat",
systemPrompt = "You are a helpful assistant",
conversationMemory = true,
maxHistoryMessages = 20)
public class MyChat {
@Prompt
public void onPrompt(String message, StreamingSession session) {
session.stream(message);
}
}

When conversationMemory = true, the framework:

  1. Captures each user message and the streamed assistant response (via MemoryCapturingSession)
  2. Stores them as conversation turns per AtmosphereResource
  3. Injects the full history into every subsequent AiRequest
  4. Clears the history when the resource disconnects

The default implementation is InMemoryConversationMemory (capped at maxHistoryMessages, default 20). For external storage, implement the AiConversationMemory SPI:

public interface AiConversationMemory {
List<ChatMessage> getHistory(String conversationId);
void addMessage(String conversationId, ChatMessage message);
void clear(String conversationId);
int maxMessages();
}

@AiTool — Framework-Agnostic Tool Calling

Section titled “@AiTool — Framework-Agnostic Tool Calling”

Declare tools with @AiTool and they work with every tool-capable runtime: Built-in, Spring AI, LangChain4j, Google ADK, Embabel, Koog, Semantic Kernel, Anthropic, Cohere, and CrewAI. No framework-specific annotations are needed. AgentScope and Spring AI Alibaba are still AgentRuntime adapters, but their current SDKs do not expose a native tool-dispatch loop for Atmosphere to wrap.

public class AssistantTools {
@AiTool(name = "get_weather",
description = "Returns a weather report for a city")
public String getWeather(
@Param(value = "city", description = "City name to get weather for")
String city) {
return weatherService.lookup(city);
}
@AiTool(name = "convert_temperature",
description = "Converts between Celsius and Fahrenheit")
public String convertTemperature(
@Param(value = "value", description = "Temperature value") double value,
@Param(value = "from_unit", description = "'C' or 'F'") String fromUnit) {
return "C".equalsIgnoreCase(fromUnit)
? String.format("%.1f°C = %.1f°F", value, value * 9.0 / 5.0 + 32)
: String.format("%.1f°F = %.1f°C", value, (value - 32) * 5.0 / 9.0);
}
}
@AiEndpoint(path = "/ai/chat",
systemPrompt = "You are a helpful assistant",
conversationMemory = true,
tools = AssistantTools.class)
public class MyChat {
@Prompt
public void onPrompt(String message, StreamingSession session) {
session.stream(message); // tools are automatically available to the LLM
}
}
@AiTool methods
↓ scan at startup
DefaultToolRegistry (global)
↓ selected per-endpoint via tools = {...}
AiRequest.withTools(tools)
↓ bridged to backend-native format
LangChain4jToolBridge / SpringAiToolBridge / AdkToolBridge
↓ LLM decides to call a tool
ToolExecutor.execute(args) → result fed back to LLM
StreamingSession → WebSocket → browser

The tool bridge layer converts @AiTool to the native format at runtime:

BackendBridge ClassNative Format
LangChain4jLangChain4jToolBridgeToolSpecification
Spring AISpringAiToolBridgeToolCallback
Google ADKAdkToolBridgeBaseTool
@AiTool (Atmosphere)@Tool (LangChain4j)FunctionCallback (Spring AI)
PortableAny backendLangChain4j onlySpring AI only
Parameter metadata@Param annotation@P annotationJSON Schema
RegistrationToolRegistry (global)Per-servicePer-ChatClient

To swap the AI backend, change only the Maven dependency — no tool code changes:

<!-- Use LangChain4j -->
<artifactId>atmosphere-langchain4j</artifactId>
<!-- Or Spring AI -->
<artifactId>atmosphere-spring-ai</artifactId>
<!-- Or Google ADK -->
<artifactId>atmosphere-adk</artifactId>

See the spring-boot-ai-tools sample.

Cross-cutting concerns (RAG, guardrails, logging) go through AiInterceptor, not subclassing:

@AiEndpoint(path = "/ai/chat", interceptors = {RagInterceptor.class, LoggingInterceptor.class})
public class MyChat { ... }
public class RagInterceptor implements AiInterceptor {
@Override
public AiRequest preProcess(AiRequest request, AtmosphereResource resource) {
String context = vectorStore.search(request.message());
return request.withMessage(context + "\n\n" + request.message());
}
}

The AI module includes filters and middleware that sit between the @Prompt method and the LLM:

ClassWhat it does
PiiRedactionFilterBuffers messages to sentence boundaries, redacts email/phone/SSN/CC
ContentSafetyFilterPluggable SafetyChecker SPI — block, redact, or pass
CostMeteringFilterPer-session/broadcaster message counting with budget enforcement
RoutingLlmClientRoute by content, model, cost, or latency rules
FanOutStreamingSessionConcurrent N-model streaming: AllResponses, FirstComplete, FastestStreamingTexts
StreamingTextBudgetManagerPer-user/org budgets with graceful degradation
AiResponseCacheInspectorCache control for AI messages in BroadcasterCache
AiResponseCacheListenerAggregate per-session events instead of per-message noise

RoutingLlmClient supports cost-based and latency-based routing rules:

var router = RoutingLlmClient.builder(defaultClient, "gemini-2.5-flash")
.route(RoutingRule.costBased(5.0, List.of(
new ModelOption(openaiClient, "gpt-4o", 0.01, 200, 10),
new ModelOption(geminiClient, "gemini-flash", 0.001, 50, 5))))
.route(RoutingRule.latencyBased(100, List.of(
new ModelOption(ollamaClient, "llama3.2", 0.0, 30, 3),
new ModelOption(openaiClient, "gpt-4o-mini", 0.005, 80, 7))))
.build();

The Spring Boot starter exposes all four RoutingRule families — content, model, cost, and latency — through atmosphere.ai.routing.* properties, with no Java wiring. Off by default. When atmosphere.ai.routing.enabled=true, the starter wraps the framework-resolved LLM client in a RoutingLlmClient and installs it via AiConfig.installClient(...), so it becomes the client every AgentRuntime dispatch reads on the request critical path. When disabled, the resolved client is left untouched and the request path is byte-identical to today’s behavior.

Compose order. Rules are added to the router (and therefore evaluated first-match-wins) in the fixed order content → model → cost → latency — most-specific intent first. Within each family, rules are evaluated in config order. Requests matching no rule fall through to the resolved client and the configured default-model (or the AiConfig model when default-model is omitted). The compose order is pinned by AtmosphereRoutingAutoConfigurationTest.

PropertyTypeDefaultDescription
atmosphere.ai.routing.enabledbooleanfalseWrap the resolved client in a RoutingLlmClient.
atmosphere.ai.routing.default-modelstring(resolved AiConfig model)Fallback model when no rule matches.

Content rules (atmosphere.ai.routing.content-rules[i]) match on the latest user message by case-insensitive substring; a rule with no model or no keywords is skipped with a WARN:

PropertyDescription
…content-rules[i].keywordsKeywords matched case-insensitively against the latest user message.
…content-rules[i].modelModel to route to when a keyword matches.
…content-rules[i].base-url / .api-keyOptional: target a different OpenAI-compatible endpoint for this rule.

Model rules (atmosphere.ai.routing.model-rules[i]) match on the incoming request.model() by literal case-insensitive equals (not regex; the request is routed unchanged — the model name is not rewritten). A blank model-pattern is skipped with a WARN:

PropertyDescription
…model-rules[i].model-patternRouted when request.model() equalsIgnoreCase this value.
…model-rules[i].base-url / .api-keyOptional: dedicated endpoint for the matched model.

Cost rules (atmosphere.ai.routing.cost-rules[i]) pick the highest-capability model whose total cost (cost-per-streaming-text × request.maxStreamingTexts()) is within max-cost. Latency rules (atmosphere.ai.routing.latency-rules[i]) pick the highest-capability model whose average-latency-ms is within max-latency-ms. Each lists candidate models[j] carrying model, cost-per-streaming-text (null → 0.0), average-latency-ms (null → 0), capability (null → 0), and optional per-option base-url / api-key. A cost/latency rule with a null budget or empty models is skipped with a WARN.

# application.yml — all four families on one router. Evaluated content → model
# → cost → latency, first match wins.
atmosphere:
ai:
model: gemini-2.5-flash # resolved default client + model
routing:
enabled: true
default-model: gemini-2.5-flash
content-rules:
- keywords: [code, function, refactor, stack trace]
model: gpt-4o # reuses the resolved client; only the model changes
base-url: https://api.openai.com/v1 # optional: dedicated endpoint
api-key: ${OPENAI_API_KEY} # optional: key for that endpoint
model-rules:
- model-pattern: gpt-4o # request.model()=="gpt-4o" → dedicated client, unchanged request
base-url: https://api.openai.com/v1
api-key: ${OPENAI_API_KEY}
cost-rules:
- max-cost: 5.0 # highest-capability model fitting the budget
models:
- model: gpt-4o
cost-per-streaming-text: 0.01
capability: 10
- model: gpt-4o-mini
cost-per-streaming-text: 0.001
capability: 5
latency-rules:
- max-latency-ms: 100 # highest-capability model under 100ms
models:
- model: gemini-2.5-flash
average-latency-ms: 50
capability: 8

Every rule reuses the resolved client by default (same provider/credentials; only the model name changes where applicable); set base-url and/or api-key to target a different OpenAI-compatible endpoint. For routing logic beyond these property shapes (custom predicates, budget-degradation), build a RoutingLlmClient in Java and install it with AiConfig.installClient(router).

The CLI scaffolds the opt-in block for you: atmosphere new my-app --template ai-chat --routing appends a commented, ready-to-uncomment atmosphere.ai.routing.* tree to the generated application.yml. --routing is only valid for AI templates that ship an application.yml, and the emitted block is commented so the scaffold is byte-identical until you uncomment it.

You can bypass @AiEndpoint and use adapters directly:

Spring AI:

var session = StreamingSessions.start(resource);
springAiAdapter.stream(chatClient, prompt, session);

LangChain4j:

var session = StreamingSessions.start(resource);
model.chat(ChatMessage.userMessage(prompt),
new AtmosphereStreamingResponseHandler(session));

Google ADK:

var session = StreamingSessions.start(resource);
adkAdapter.stream(new AdkRequest(runner, userId, sessionId, prompt), session);

Embabel:

val session = StreamingSessions.start(resource)
embabelAdapter.stream(AgentRequest("assistant") { channel ->
agentPlatform.run(prompt, channel)
}, session)
import { useStreaming } from 'atmosphere.js/react';
function AiChat() {
const { fullText, isStreaming, stats, routing, send } = useStreaming({
request: { url: '/ai/chat', transport: 'websocket' },
});
return (
<div>
<button onClick={() => send('Explain WebSockets')} disabled={isStreaming}>
Ask
</button>
<p>{fullText}</p>
{stats && <small>{stats.totalStreamingTexts} streaming texts</small>}
{routing.model && <small>Model: {routing.model}</small>}
</div>
);
}
var client = AiConfig.get().client();
var assistant = new LlmRoomMember("assistant", client, "gpt-5",
"You are a helpful coding assistant");
Room room = rooms.room("dev-chat");
room.joinVirtual(assistant);
// Now when any user sends a message, the LLM responds in the same room

The client receives JSON messages over WebSocket/SSE:

  • {"type":"streaming-text","content":"Hello"} — a single streaming text
  • {"type":"progress","message":"Thinking..."} — status update
  • {"type":"complete"} — stream finished
  • {"type":"error","message":"..."} — stream failed

Configure the built-in client with environment variables:

VariableDescriptionDefault
LLM_MODEremote (cloud) or local (Ollama)remote
LLM_MODELgemini-2.5-flash, gpt-5, o3-mini, llama3.2, …gemini-2.5-flash
LLM_API_KEYAPI key (or GEMINI_API_KEY for Gemini)
LLM_BASE_URLOverride endpoint (auto-detected from model name)auto
ClassDescription
@AiEndpointMarks a class as an AI chat endpoint with a path, system prompt, and interceptors
@PromptMarks the method that handles user messages
@AiToolMarks a method as an AI-callable tool (framework-agnostic)
@ParamDescribes a tool parameter’s name, description, and required flag
AgentRuntimeSPI for AI framework backends (ServiceLoader-discovered)
AiRequestFramework-agnostic request record (message, systemPrompt, model, userId, sessionId, agentId, conversationId, metadata)
AiEventSealed interface: 15 structured event types (TextDelta, ToolStart, ToolResult, AgentStep, EntityStart, Handoff, ApprovalRequired, etc.)
AiCapabilityEnum for endpoint capability requirements (TEXT_STREAMING, TOOL_CALLING, STRUCTURED_OUTPUT, etc.)
AiInterceptorPre/post processing hooks for RAG, guardrails, logging
AiConversationMemorySPI for conversation history storage
MemoryStrategyPluggable memory selection: MessageWindowStrategy, TokenWindowStrategy, SummarizingStrategy
StructuredOutputParserSPI for JSON Schema generation and typed output parsing (built-in: JacksonStructuredOutputParser)
StreamingSessionDelivers streaming texts, events, progress updates, and metadata to the client
StreamingSessionsFactory for creating StreamingSession instances
OpenAiCompatibleClientBuilt-in HTTP client for OpenAI-compatible APIs
RoutingLlmClientRoutes prompts to different LLM backends based on rules
ToolRegistryGlobal registry for @AiTool definitions
ModelRouterSPI for intelligent model routing and failover
AiGuardrailSPI for pre/post-LLM safety inspection
AiMetricsSPI for AI observability (streaming texts, latency, cost)
ConversationPersistenceSPI for durable conversation storage (Redis, SQLite)
RetryPolicyExponential backoff with circuit-breaker semantics

@RequiresApproval pauses tool execution until the client approves. The virtual thread parks cheaply on a CompletableFuture — no carrier thread consumed.

@AiTool(name = "delete_account", description = "Permanently delete a user account")
@RequiresApproval("This will permanently delete the account. Are you sure?")
public String deleteAccount(@Param("accountId") String accountId) {
return accountService.delete(accountId);
}

When the LLM calls a @RequiresApproval tool, the client receives an approval-required event:

{"event":"approval-required","data":{
"approvalId":"apr_a1b2c3d4e5f6",
"toolName":"delete_account",
"arguments":{"accountId":"user-42"},
"message":"This will permanently delete the account. Are you sure?",
"expiresIn":300
}}

The client responds with:

  • /__approval/apr_a1b2c3d4e5f6/approve — tool executes
  • /__approval/apr_a1b2c3d4e5f6/deny — tool returns cancelled

Default timeout: 5 minutes. Configurable via @RequiresApproval(timeoutSeconds = 120).

  1. AiStreamingSession.wrapApprovalGates() wraps @RequiresApproval tools with ApprovalGateExecutor
  2. When the LLM calls the tool, ApprovalGateExecutor parks the virtual thread on CompletableFuture.get(timeout)
  3. The session emits AiEvent.ApprovalRequired to the client
  4. AiEndpointHandler fast-paths /__approval/ messages to the session’s ApprovalRegistry (before prompt dispatch)
  5. ApprovalRegistry.tryResolve() completes the future, unparking the virtual thread
  6. On transport reconnect, a fallback scan across all active sessions ensures the approval reaches the parked thread

When running on Google ADK, Atmosphere also calls toolContext.requestConfirmation() to give ADK native visibility into the approval pause. If ADK resolves a confirmation before Atmosphere (e.g., via its own UI), the ADK denial short-circuits without calling the executor. This creates a two-layer model: Atmosphere-level (cross-runtime) + ADK-native (runtime-specific).

All runtimes with TOOL_CALLING also declare AiCapability.TOOL_APPROVAL (Built-in, Spring AI, LangChain4j, ADK, Embabel, Koog, Semantic Kernel, Anthropic, Cohere, CrewAI). AgentScope and Spring AI Alibaba are excluded because their underlying SDKs lack a native tool-call dispatch loop.

The AiCompactionStrategy SPI controls how conversation history is compacted when it exceeds the configured limit. Unlike MemoryStrategy (which selects messages for the next request — read path), compaction permanently reduces stored history (write path).

public interface AiCompactionStrategy {
List<ChatMessage> compact(List<ChatMessage> messages, int maxMessages);
String name();
}

SlidingWindowCompaction (default) — drops the oldest non-system messages until under the limit. System messages are always preserved.

SummarizingCompaction — condenses old messages into a single system-role summary, preserving the most recent messages verbatim. The recent window size is configurable (default: 6).

// Default: sliding window
var memory = new InMemoryConversationMemory(20);
// Custom: summarization with 8-message recent window
var memory = new InMemoryConversationMemory(20, new SummarizingCompaction(8));

AdkCompactionBridge.toAdkConfig() maps Atmosphere compaction settings to ADK’s EventsCompactionConfig for native compaction when using the ADK runtime.

The ArtifactStore SPI provides binary artifact persistence across agent runs. Use cases include agent-generated reports, images, code files, and content shared between coordinated agents.

public interface ArtifactStore {
Artifact save(Artifact artifact); // auto-versions
Optional<Artifact> load(String namespace, String artifactId); // latest version
Optional<Artifact> load(String namespace, String artifactId, int version);
List<Artifact> list(String namespace); // latest of each
boolean delete(String namespace, String artifactId); // all versions
void deleteAll(String namespace);
}
public record Artifact(
String id, // unique identifier
String namespace, // grouping key (session ID, agent name, user ID)
String fileName, // human-readable name ("report.pdf")
String mimeType, // MIME type ("application/pdf")
byte[] data, // binary content (defensively copied)
int version, // auto-incremented per save
Map<String, String> metadata, // arbitrary key-value pairs
Instant createdAt
) { }

Byte arrays are defensively copied on construction and on access — callers cannot mutate persisted data.

  • InMemoryArtifactStore — default, for development and testing. Data does not survive JVM restart.
  • ADK bridgeAdkArtifactBridge.toAdkService() wraps an ArtifactStore as ADK’s BaseArtifactService.

AiInterceptor includes an onDisconnect hook called before conversation memory is cleared. This enables fact extraction, summary persistence, and other cleanup that requires access to the conversation history.

public interface AiInterceptor {
default AiRequest preProcess(AiRequest request, AtmosphereResource resource) { return request; }
default void postProcess(AiRequest request, AtmosphereResource resource) { }
default void onDisconnect(String userId, String conversationId, List<ChatMessage> history) { }
}

LongTermMemoryInterceptor.onDisconnect() uses this to extract facts from the full conversation on session close via OnSessionCloseStrategy.

Execution order: preProcess runs FIFO, postProcess runs LIFO, onDisconnect runs FIFO. Exceptions in one interceptor do not prevent others from being called.

The AiEvent sealed interface provides 15 structured event types emitted via session.emit(). All runtimes map their native events to this common model.

EventDescription
TextDeltaStreaming token
TextCompleteFinal assembled text
ToolStartTool invocation begins (name + arguments)
ToolResultTool executed successfully (name + result)
ToolErrorTool execution failed
AgentStepOrchestration step (ADK agent steps, Embabel planning)
StructuredFieldStructured output field arrival
EntityStart / EntityCompleteStructured entity streaming
RoutingDecisionBackend routing event
ProgressLong-running operation status
HandoffAgent handoff notification
ApprovalRequiredHuman approval gate
ErrorError with recovery hint
CompleteStream completed with usage metadata
SourceAtmosphere Event
ADK event.functionCalls()AiEvent.ToolStart
ADK event.functionResponses()AiEvent.ToolResult
ADK event.author() (non-partial)AiEvent.AgentStep
ADK event.usageMetadata()ai.tokens.input/output/total metadata
Koog onToolCallStartingAiEvent.ToolStart
Koog onToolCallCompletedAiEvent.ToolResult
Koog onToolCallFailedAiEvent.ToolError
Koog StreamFrame.ReasoningDeltaAiEvent.Progress
Embabel MessageOutputChannelEventAiEvent.TextDelta
Embabel ProgressOutputChannelEventAiEvent.AgentStep

Each runtime declares capabilities via AiCapability. The framework uses these flags for model routing, tool negotiation, and feature discovery. The table below mirrors the twelve-runtime snapshot pinned by each runtime’s expectedCapabilities() contract test; Y means the runtime declares the capability and the contract tests assert it.

Legend: TS=TEXT_STREAMING, TC=TOOL_CALLING, SO=STRUCTURED_OUTPUT, SP=SYSTEM_PROMPT, AO=AGENT_ORCHESTRATION, CM=CONVERSATION_MEMORY, TA=TOOL_APPROVAL, V=VISION, A=AUDIO, MM=MULTI_MODAL, PC=PROMPT_CACHING, TU=TOKEN_USAGE, PRR=PER_REQUEST_RETRY, TCD=TOOL_CALL_DELTA, BE=BUDGET_ENFORCEMENT, CS=CONFIDENCE_SCORES, PSV=PASSIVATION.

RuntimePriorityTSTCSOSPAOCMTAVAMMPCTUPRRTCDBECSPSV
Built-in0YYYYYYYYYYYYYYYY
Spring AI100YYYYYYYYYYYYYYY
LangChain4j100YYYYYYYYYYYYYYY
Google ADK100YYYYYYYYYYYYYYYY
Embabel100YYYYYYYYYYYYYY
JetBrains Koog100YYYYYYYYYYYYYYYY
Alibaba AgentScope100YYYYYYYYYYY
Spring AI Alibaba100YYYYYYYYYY
Microsoft Semantic Kernel100YYYYYYYYYYY
Anthropic100YYYYYYYYYYYYY
Cohere100YYYYYYYYYYYYY
CrewAI²50YYYYYYYY

¹ Spring AI Alibaba emits its final reply as one Atmosphere stream chunk, but the upstream ReactAgent.call() path is buffered rather than token-by-token.

² CrewAI is the only out-of-process runtime: the Java side talks HTTP + SSE to a Python sidecar (atmosphere-crewai-bridge, FastAPI + crewai 1.14). isAvailable() is config-gated on ATMOSPHERE_CREWAI_SIDECAR_URL pointing at a running sidecar whose /health responds OK — Runtime Truth gate, no classpath-only advertisement. The runtime does not own a Java-side conversation-memory store; per-task memory lives inside the sidecar’s crew rather than being declared at the Atmosphere layer.

How structured output works: AiPipeline wraps the streaming session with StructuredOutputCapturingSession and augments the system prompt with JSON-schema instructions before the runtime runs. Any runtime that honors SYSTEM_PROMPT therefore gets STRUCTURED_OUTPUT automatically via the pipeline — no per-runtime adapter code required. BuiltInAgentRuntime additionally enables native jsonMode on the OpenAI-compatible client for provider-level JSON enforcement on top of the pipeline wrap. Source: modules/ai/src/main/java/org/atmosphere/ai/pipeline/AiPipeline.java:128-135, modules/ai/src/main/java/org/atmosphere/ai/llm/BuiltInAgentRuntime.java:72-74.

Tool-dispatch bridges: every runtime that declares TOOL_CALLING routes every Atmosphere @AiTool invocation through a runtime-native bridge that calls ToolExecutionHelper.executeWithApproval, so @RequiresApproval gates fire uniformly. AgentScope ships AgentScopeToolBridge, Spring AI Alibaba ships SpringAiAlibabaToolBridge, Semantic Kernel ships SemanticKernelToolBridge, Embabel ships EmbabelToolBridge, Koog ships AtmosphereToolBridge, and the JDK runtimes (Built-in, Spring AI, LangChain4j, ADK) wire their tool callbacks through the shared helper directly.

Spring AI Alibaba token usage: ReactAgent.call() returns an AssistantMessage without usage metadata, so Atmosphere wraps the configured Spring AI ChatModel bean in a UsageCapturingChatModel decorator at auto-configuration time. Every underlying ChatModel.call(Prompt) performed by the ReAct graph during a single dispatch accumulates ChatResponseMetadata.getUsage() into a per-thread collector; the runtime emits one typed TokenUsage record via session.usage(...) after the agent returns. Token-based AiBudget breaches therefore trip uniformly alongside wall-clock breaches. Custom ReactAgent beans that bypass the auto-config also bypass the wrapper — see AtmosphereSpringAiAlibabaAutoConfiguration for the wrapping point.

The AbstractAgentRuntimeContractTest base class in atmosphere-ai-test enforces a minimum contract across all runtime adapters.

public abstract class AbstractAgentRuntimeContractTest {
protected abstract AgentRuntime createRuntime();
protected abstract AgentExecutionContext createTextContext();
protected abstract AgentExecutionContext createToolCallContext();
protected abstract AgentExecutionContext createErrorContext();
// Enforced contracts:
// - runtimeDeclaresMinimumCapabilities (TEXT_STREAMING)
// - runtimeHasNonBlankName
// - runtimeIsAvailable
// - textStreamingCompletesSession (10s timeout)
// - toolCallExecutesIfSupported
// - errorContextTriggersSessionError
}

Add atmosphere-ai-test as a test dependency and extend the base class:

<dependency>
<groupId>org.atmosphere</groupId>
<artifactId>atmosphere-ai-test</artifactId>
<scope>test</scope>
</dependency>

The RecordingSession test double captures all events, text chunks, metadata, and errors for assertion. The contract suite is implemented for all twelve runtime adapters.