Chapter 2
Deep Dive into the Mixed Reality Toolkit (MRTK)
The Mixed Reality Toolkit is the cornerstone for building interactive, immersive, and adaptive MR applications. This chapter opens the doors to its architecture and workflows-unpacking the secrets behind its extensibility, service management, and streamlined input handling. Get ready to discover how leveraging MRTK can turn complex spatial and interaction design into accessible, robust solutions tailored to any MR scenario.
2.1 MRTK Architecture and Core Components
The Mixed Reality Toolkit (MRTK) is architected as a modular framework designed to facilitate the development of mixed reality (MR) applications across diverse platforms and hardware configurations. The core philosophy of MRTK's architecture centers on decomposition into loosely coupled, extensible components that collaborate through well-defined interfaces and service registries. This approach balances modular independence with integration, enabling scalable development and flexible customization without incurring coupling overhead that can hinder maintainability and portability.
At the heart of MRTK lies the Service Locator pattern, realized primarily through the MixedRealityToolkit core class. This singleton instance orchestrates access to and lifecycle management of core services, effectively serving as a centralized registry and factory. Services correspond to distinct functional domains-such as input, spatial awareness, diagnostics, boundary systems, and teleportation-each encapsulated in a discrete module implementing an interface defined by the toolkit. This pattern allows developers to swap out implementations or extend functionality by registering new services without modifying dependent components.
The principle of interface-based design is ubiquitous within MRTK. Each core service defines a contract-interfaces such as IMixedRealityInputSystem, IMixedRealitySpatialAwarenessSystem, and IMixedRealityBoundarySystem-which delineate capabilities and expected behaviors while abstracting away platform-specific details. Concrete implementations for these interfaces are provided out of the box but can be replaced or augmented to accommodate custom hardware, platform constraints, or experimental features. This encourages extensibility and interoperability while preserving the independence of the architectural layers.
A foundational building block is the Input System, responsible for aggregating and normalizing diverse input modalities including hand tracking, motion controllers, eye gaze, speech, and spatial gestures. The input system maintains a clear separation between input sources, which abstract raw physical devices, and input handlers, which process events derived from these sources. This event-driven architecture relies on a focus and pointer model, enabling MR applications to respond to complex user interactions in a consistent, scalable manner. The modularity of input components means new input devices or interaction paradigms can be integrated without rearchitecting the input system core.
Spatial awareness constitutes another critical subsystem, enabling applications to understand and interact with the physical environment. The Spatial Awareness System offers abstractions for environmental meshes, planes, and spatial anchors, acquired through platform-specific capabilities such as spatial mapping on HoloLens or environmental meshing on ARKit/ARCore. This system manages observer lifecycles, data caching, and updates, delivering a unified spatial representation for other systems to query and manipulate. The independence of this component from input and rendering modules is a deliberate design choice to enhance portability and composability.
Complementing the input and spatial systems is the Boundary System, which provides runtime awareness of user-defined or platform-defined spatial boundaries. It exposes APIs to query boundary geometry, trigger actions upon boundary crossing, and render visualizations to ensure user safety and spatial coherence. The boundary system integrates with other subsystems via standardized interfaces but remains self-contained to allow platform-specific boundary implementations or custom boundary services.
The Diagnostics System and the Performance Profiler modules encapsulate tooling functionality critical for monitoring application health and performance characteristics in real time. These components plug into the core framework via service registration and utilize extensible configurations to adapt metrics collection and visualization presentation, emphasizing MRTK's broader commitment to extensibility beyond input and spatial data.
In terms of internal structure, MRTK organizes components around a profile-driven configuration model. Profiles serve as declarative data containers, specifying service implementations, system parameters, and feature toggles. This separation of configuration from code enables rapid iteration and experimentation, as modifying the behavior of the entire toolkit can occur through editing or swapping profiles without recompilation or direct source-level modifications.
A critical MRTK architectural advantage is the independence of rendering concerns from input and sensor data processing. MRTK components interface through normalized events and shared data structures rather than direct references to rendering pipelines or scene hierarchies. This design choice facilitates cross-platform compatibility, supports multiple rendering engines, and allows developers to replace or extend rendering without impacting core interaction or spatial modules.
The modular structure promotes a clean dependency graph, wherein higher-level application logic depends on core services but avoids circular or tight coupling. Component lifecycles and service initialization are orchestrated by the MixedRealityToolkit instance, enforcing startup order, singleton guarantees, and dynamic reconfiguration when profiles are updated. Event propagation utilizes a broadcast model in combination with focused event routing, striking a balance between responsiveness and performance.
MRTK's architecture decomposes the complex requirements of MR application development into discrete, well-defined subsystems that interact through common interfaces and managed service registries. This modularity ensures that the toolkit can evolve with emerging hardware and interaction paradigms while providing developers with a stable, extensible foundation. The input, spatial awareness, boundary, and diagnostics systems together constitute the indispensable building blocks upon which robust MR applications are constructed, each encapsulated to promote maintainability, flexibility, and scalability inherent in modern software engineering principles.
2.2 Input System Architecture
The Mixed Reality Toolkit (MRTK) employs a comprehensive input system architecture designed to reconcile and unify a wide array of input modalities-including gaze, gestures, controllers, and voice-into a coherent and extensible framework. Central to this architecture is the abstraction model, which decouples input sources from application logic, thus enabling developers to create adaptive, multi-modal interactive experiences without binding their code to specific hardware devices or sensor technologies.
At its core, MRTK's input system is composed of three principal layers: input sources, input data providers, and the input system service. Each of these plays a vital role in abstracting and funneling user inputs in a manner that optimizes flexibility and scalability.
Input Sources represent the raw points of interaction emanating from hardware or sensors. These include spatial pointers such as gaze rays, articulated hand data, physical controllers, and voice command events. MRTK defines these sources using standardized interfaces, such as IMixedRealityInputSource and IMixedRealityController, which encapsulate essential properties like handedness, supported input types, and source state.
Input Data Providers act as adapters or bridges between the native platform APIs (e.g., Unity Input System, Windows Mixed Reality SDK, OpenXR) and the MRTK abstraction layer. Each provider implements the logic to translate raw hardware signals into MRTK-understood events and data structures. This decoupling allows the system to evolve as new input hardware and APIs emerge, without necessitating changes in the higher application layers.
At the highest level, the Input System Service orchestrates input event propagation,...