oumi.core.types

oumi.core.types#

Types module for the Oumi (Open Universal Machine Intelligence) library.

This module provides custom types and exceptions used throughout the Oumi framework.

Exceptions:: HardwareException: Exception raised for hardware-related errors.

Example

>>> from oumi.core.types import HardwareException
>>> try:
...     # Some hardware-related operation
...     pass
... except HardwareException as e:
...     print(f"Hardware error occurred: {e}")

Note

This module is part of the core Oumi framework and is used across various components to ensure consistent error handling and type definitions.

class oumi.core.types.ContentItem(*, type: Type, content: str | None = None, binary: bytes | None = None)[source]#

Bases: BaseModel

A sub-part of Message.content.

For example, a multimodal message from USER may include two ContentItem-s: one for text, and another for image.

Note

Either content or binary must be provided when creating an instance.

__repr__() → str[source]#: Returns a string representation of the message item.

binary: bytes | None#

Optional binary data for the message content item, used for image data.

One of content or binary must be provided.

The field is required for IMAGE_BINARY, and can be optionally populated for IMAGE_URL, IMAGE_PATH in which case it must be the loaded bytes of the image specified in the content field.

The field must be None for TEXT.

content: str | None#

Optional text content of the content item.

One of content or binary must be provided.

is_image() → bool[source]#: Checks if the item contains an image.

is_text() → bool[source]#: Checks if the item contains text.

model_config = {'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ContentItem__context) → None[source]#

Post-initialization method for the ContentItem model.

This method is automatically called after the model is initialized. Performs additional validation e.g., to ensure that either content or binary is provided for the message.

Raises:: ValueError – If fields are set to invalid or inconsistent values.

type: Type#: The type of the content (e.g., text, image path, image URL).

class oumi.core.types.ContentItemCounts(total_items: int, text_items: int, image_items: int)[source]#

Bases: NamedTuple

Contains counts of content items in a message by type.

image_items: int#: The number of image content items in a message.

text_items: int#: The number of text content items in a message.

total_items: int#: The total number of content items in a message.

class oumi.core.types.Conversation(*, conversation_id: str | None = None, messages: list[~oumi.core.types.conversation.Message], metadata: dict[str, ~typing.Any] = <factory>, tools: list[~oumi.core.types.tool_call.ToolDefinition] | None = None)[source]#

Bases: BaseModel

Represents a conversation, which is a sequence of messages.

__getitem__(idx: int) → Message[source]#

Gets the message at the specified index.

Parameters:: idx (int) – The index of the message to retrieve.
Returns:: The message at the specified index.
Return type:: Any

__repr__() → str[source]#: Returns a string representation of the conversation.

append_id_to_string(s: str) → str[source]#

Appends conversation ID to a string.

Can be useful for log or exception errors messages to allow users to identify relevant conversation.

conversation_id: str | None#

Optional unique identifier for the conversation.

This attribute can be used to assign a specific identifier to the conversation, which may be useful for tracking or referencing conversations in a larger context.

filter_messages(*, role: Role | None = None, filter_fn: Callable[[Message], bool] | None = None) → list[Message][source]#

Gets all messages in the conversation, optionally filtered by role.

Parameters:

role (Optional) – The role to filter messages by. If None, no filtering by role is applied.
filter_fn (Optional) – A predicate to filter messages by. If the predicate returns True for a message, then the message is returned. Otherwise, the message is excluded.

Returns:

A list of all messages matching the criteria.

Return type:

List[Message]

first_message(role: Role | None = None) → Message | None[source]#

Gets the first message in the conversation, optionally filtered by role.

Parameters:

role – The role to filter messages by. If None, considers all messages.

Returns:

The first message matching the criteria,: or None if no messages are found.

Return type:

Optional[Message]

classmethod from_dict(data: dict) → Conversation[source]#: Converts a dictionary to a conversation.

classmethod from_json(data: str) → Conversation[source]#: Converts a JSON string to a conversation.

last_message(role: Role | None = None) → Message | None[source]#

Gets the last message in the conversation, optionally filtered by role.

Parameters:

role – The role to filter messages by. If None, considers all messages.

Returns:

The last message matching the criteria,: or None if no messages are found.

Return type:

Optional[Message]

messages: list[Message]#: List of Message objects that make up the conversation.

metadata: dict[str, Any]#

Optional metadata associated with the conversation.

This attribute allows for storing additional information about the conversation in a key-value format. It can be used to include any relevant contextual data.

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_dict()[source]#: Converts the conversation to a dictionary.

to_json() → str[source]#: Converts the conversation to a JSON string.

tools: list[ToolDefinition] | None#

Tool definitions available to the model for this conversation.

Accepts ToolDefinition instances, OpenAI-format dicts, or Python callables on input. Callables are converted via transformers.utils.chat_template_utils.get_json_schema (which requires a Google-style docstring and type hints on every user-facing argument) and dicts are validated by Pydantic into ToolDefinition. After construction, every entry is a ToolDefinition instance.

Uses the OpenAI function-calling schema, e.g.:

[{"type": "function",
  "function": {"name": "get_weather",
               "description": "...",
               "parameters": {...}}}]

When set, to_dict() emits tools as a top-level key (in dict form, via Pydantic’s model_dump(mode="json")), which TRL’s SFTTrainer forwards to tokenizer.apply_chat_template(messages, tools=...) so the model’s chat template can render tools natively.

class oumi.core.types.FinishReason(value)[source]#

Bases: str, Enum

Reason why the model stopped generating tokens.

CONTENT_FILTER = 'content_filter'#: Content was filtered due to safety/moderation.

ERROR = 'error'#: Generation failed due to an error.

LENGTH = 'length'#: Model reached max_tokens limit.

STOP = 'stop'#: Model hit a natural stopping point or stop sequence.

TOOL_CALLS = 'tool_calls'#: Model made a tool/function call.

UNKNOWN = 'unknown'#: Finish reason could not be determined.

__str__() → str[source]#

Return the string representation of the FinishReason enum.

Returns:: The string value of the FinishReason enum.
Return type:: str

class oumi.core.types.FunctionCall(*, name: str, arguments: str, **extra_data: Any)[source]#

Bases: BaseModel

A function call made by the model.

arguments: str#

The arguments to call the function with, as a JSON string.

OpenAI wire format keeps this as an unparsed JSON-encoded string (NOT a dict). Some providers return malformed JSON; downstream code is responsible for parsing and handling errors.

model_config = {'extra': 'allow', 'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str#: The name of the function being called.

class oumi.core.types.FunctionDefinition(*, name: str, description: str | None = None, parameters: JSONSchema | None = None, strict: bool | None = None, **extra_data: Any)[source]#

Bases: BaseModel

Definition of a function that can be called by the model.

description: str | None#

A description of what the function does.

Used by the model to choose when and how to call the function.

model_config = {'extra': 'allow', 'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str#

The name of the function to be called.

Must be a-z, A-Z, 0-9, underscores and dashes, with a maximum length of 64.

parameters: JSONSchema | None#

The parameters the function accepts, as a JSON Schema object.

See https://json-schema.org/understanding-json-schema/ for the format. To describe a function that accepts no parameters, provide {"type": "object", "properties": {}}.

strict: bool | None#

Whether to enable strict schema adherence in the generated call.

If true, the model will follow the exact schema provided in parameters. Only supported by OpenAI gpt-4o and later; ignored by other providers.

exception oumi.core.types.HardwareException[source]#

Bases: Exception

An exception thrown for invalid hardware configurations.

class oumi.core.types.JSONSchema(*, type: Literal['object', 'string', 'number', 'integer', 'boolean', 'array', 'null'] | list[Literal['object', 'string', 'number', 'integer', 'boolean', 'array', 'null']] | None = None, description: str | None = None, title: str | None = None, properties: dict[str, JSONSchema] | None = None, required: list[str] | None = None, items: JSONSchema | None = None, enum: list[Any] | None = None, default: Any = None, format: str | None = None, **extra_data: Any)[source]#

Bases: BaseModel

A JSON Schema object describing the shape of a value.

Models the subset of JSON Schema commonly used in LLM tool definitions. extra="allow" lets less-common keywords ($ref, $defs, additionalProperties, anyOf, numeric constraints, etc.) round-trip unchanged, matching the rest of this module — see the module docstring for why round-tripping matters.

default: Any#: Default value used when the field is omitted.

description: str | None#: Human-readable description, used by the model to choose values.

enum: list[Any] | None#: Restricts the value to a fixed set of allowed values.

format: str | None#: Semantic format hint (e.g., "date-time", "email").

items: JSONSchema | None#

schema for array elements.

Type:: For type="array"

model_config = {'extra': 'allow', 'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

properties: dict[str, JSONSchema] | None#

schema for each named property.

Type:: For type="object"

required: list[str] | None#

names of properties that must be present.

Type:: For type="object"

title: str | None#: Short human-readable label.

type: Literal['object', 'string', 'number', 'integer', 'boolean', 'array', 'null'] | list[Literal['object', 'string', 'number', 'integer', 'boolean', 'array', 'null']] | None#: JSON type(s) of this value. A list expresses a union (e.g., ["string", "null"] for a nullable string).

Bases: BaseModel

A message in a conversation.

This class represents a single message within a conversation, containing various attributes such as role, content, identifier.

__repr__() → str[source]#: Returns a string representation of the message.

compute_flattened_text_content(separator=' ') → str[source]#: Joins contents of all text items.

contains_image_content_items_only() → bool[source]#

Checks if the message contains only image items.

At least one image item is required.

contains_images() → bool[source]#: Checks if the message contains at least one image.

contains_single_image_content_item_only() → bool[source]#: Checks if the message contains exactly 1 image item, and nothing else.

contains_single_text_content_item_only() → bool[source]#

Checks if the message contains exactly 1 text item, and nothing else.

These are the most common and simple messages, and may need special handling.

contains_text() → bool[source]#: Checks if the message contains at least one text item.

contains_text_content_items_only() → bool[source]#

Checks if the message contains only text items.

At least one text item is required.

content: str | list[ContentItem] | None#

Content of the message.

For text messages, content can be set to a string value. For multimodal messages, content should be a list of content items of potentially different types e.g., text and image. May be None on assistant messages that only contain tool_calls (OpenAI tool-calling wire format).

property content_items: list[ContentItem]#: Returns a list of text content items.

count_content_items() → ContentItemCounts[source]#: Counts content items by type.

id: str | None#

Optional unique identifier for the message.

This attribute can be used to assign a specific identifier to the message, which may be useful for tracking or referencing messages within a conversation.

Returns:: The unique identifier of the message, if set; otherwise None.
Return type:: Optional[str]

property image_content_items: list[ContentItem]#: Returns a list of image content items.

model_config = {'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Message__context) → None[source]#

Post-initialization method for the Message model.

This method is automatically called after the model is initialized. It validates that the message has at least one of content or tool_calls and that content, when set, is a string or list.

Raises:: ValueError – If content is None and tool_calls is missing or empty, or if content has an unsupported type.

role: Role#: The role of the entity sending the message (e.g., user, assistant, system).

property text_content_items: list[ContentItem]#: Returns a list of text content items.

tool_call_id: str | None#

Identifier linking a tool response message to the call it responds to.

Only set on messages with role == 'tool', matching the id of a prior assistant tool_calls entry.

tool_calls: list[ToolCall] | None#

Structured tool calls emitted by an assistant message.

Uses the OpenAI function-calling wire format. Pydantic auto-coerces dict input (e.g., from JSONL) into ToolCall instances:

[{"id": "call_abc", "type": "function",
  "function": {"name": "get_weather", "arguments": "{...}"}}]

Only set on assistant messages. The chat template renders these via tokenizer.apply_chat_template(messages, tools=...); serialization (to_dict() / to_json()) emits them in the OpenAI dict shape.

class oumi.core.types.Role(value)[source]#

Bases: str, Enum

Role of the entity sending the message.

ASSISTANT = 'assistant'#: Represents an assistant message in the conversation.

SYSTEM = 'system'#: Represents a system message in the conversation.

TOOL = 'tool'#: Represents a tool message in the conversation.

USER = 'user'#: Represents a user message in the conversation.

__str__() → str[source]#

Return the string representation of the Role enum.

Returns:: The string value of the Role enum.
Return type:: str

class oumi.core.types.TemplatedMessage(*, template: str, role: Role)[source]#

Bases: BaseModel

Represents a templated message.

This class is used to create messages with dynamic content using a template. The template can be rendered with variables to produce the final message content.

property content: str#: Renders the content of the message.

property message: Message#: Returns the message in oumi format.

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: Role#: The role of the message sender (e.g., USER, ASSISTANT, SYSTEM).

template: str#: The template string used to generate the message content.

class oumi.core.types.ToolCall(*, id: str, type: ToolType = ToolType.FUNCTION, function: FunctionCall, **extra_data: Any)[source]#

Bases: BaseModel

A tool call emitted by the model.

function: FunctionCall#: The function the model called.

id: str#

The ID of the tool call.

Used to match a tool response message back to the call that requested it (via Message.tool_call_id).

model_config = {'extra': 'allow', 'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: ToolType#: The type of tool call. Defaulted; see ToolDefinition.type.

class oumi.core.types.ToolDefinition(*, type: ToolType = ToolType.FUNCTION, function: FunctionDefinition, **extra_data: Any)[source]#

Bases: BaseModel

Definition of a tool available to the model.

function: FunctionDefinition#: The function definition.

model_config = {'extra': 'allow', 'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: ToolType#

The type of the tool. Currently only function is supported.

Defaulted for ergonomics — when a second tool type lands, drop the default to force callers to be explicit.

class oumi.core.types.ToolResult(output: str | dict[str, Any], updated_state: dict[str, Any] | None = None)[source]#

Bases: object

Result returned by an environment step().

Runtime value (not an OpenAI wire-format type) — projected by the synthesizer into Message(role=TOOL, content=...) before output. output may be a string or a JSON-serializable dict; the synthesizer json-encodes dicts at the message boundary.

output: str | dict[str, Any]#

updated_state: dict[str, Any] | None = None#

class oumi.core.types.ToolType(value)[source]#

Bases: str, Enum

Type of tool available to the model.

FUNCTION = 'function'#: A callable function that the model can invoke.

__str__() → str[source]#: Return the string representation of the ToolType enum.

class oumi.core.types.Type(value)[source]#

Bases: str, Enum

Type of the message.

IMAGE_BINARY = 'image_binary'#: Represents an image stored as binary data.

IMAGE_PATH = 'image_path'#: Represents an image referenced by its file path.

IMAGE_URL = 'image_url'#: Represents an image referenced by its URL.

TEXT = 'text'#: Represents a text message.

__str__() → str[source]#

Return the string representation of the Type enum.

Returns:: The string value of the Type enum.
Return type:: str

oumi.core.types

Contents

oumi.core.types#