Streaming
How to work with our Server-Sent Events (SSE)
SSE provides a lightweight one-way stream over HTTP where the server pushes incremental tokens/events to the client for responsive UIs and long-running completions
Key concepts
Transport: HTTP with
Content-Type: text/event-stream, connection kept open.Events: Lines prefixed by
event:anddata:; each event ends with a blank line.Heartbeat: Periodic comments (
: ping) keep the connection alive.Termination: A final event (e.g.,
event: done) or stream close.
Typical flow
Client sends a completion request indicating streaming mode.
Server responds with
text/event-streamand starts sending tokens/events.Client renders tokens incrementally and listens for terminal event.
Full stream of SSE Example

How our SSE works

Our streaming communication flow consists of four main components:
Client
Streaming Service
Completion Service
AI Provider
The sequence can be broken down into several key phases:
Initial Connection Setup
The Client initiates a connection to the Streaming Service using URL endpoint
/message-stream/{convo_id}/{session_id}
Upon connection established, the Streaming Service responds with a
connectedevent
Request Initiation
The Client sends a request to the Completion Service using URL endpoint
/cortex/completion/user-input
[]userPossible values: []trueSuccessful Response
Validation Error
No content
The Completion Service forwards this request to the AI Provider with streaming enabled, the actual endpoint varies by each AI Provider (OpenAI, Azure, Anthropic, Google, ...)
Streaming Process
Streaming begins with a
message_startevent propagating from the Completion Service to the Streaming Service via the streaming bus, then the Streaming Service forwards it to the Client, indicating that the service has started working to generate a response.The AI Provider streams generated tokens incrementally to the Completion Service.
Each token is forwarded along with the
new_tokenevents through the chain: AI Provider โ Completion Service โ Streaming Service โ Client.This token streaming process repeats multiple times until text generation is complete.
Completion and Persistence
The Completion Service signals
message_endwhen text generation is complete.The response message is persisted internally by the Completion Service.
A
message_readyevent is sent to indicate the full response message is now available for further processing (copy, delete, generate text-to-speech, ...).The Completion Service returns a
200 OKresponse to the original client request.
Connection Termination
The sequence ends with the Streaming Service closing the connection to the Client.
List of SSE Events
Beside the main SSE Events, there are others that provide extra information about the completion request, the details are as below:
Last updated
Was this helpful?