Controlled Network Environment for Ray Clusters in Red Hat OpenShift AI 3.0

[keyword]


The adoption of Ray for scalable AI and ML workloads has skyrocketed. The Ray framework is powerful, but as the official documentation emphasizeddevelopers or platform providers are responsible for their own security.

With Red Hat OpenShift AI, we are committed to providing a production-ready environment for complex AI workloads, and we recognize that robust security is important. That’s why we’re enhancing the existing Controlled Network Environment (CNE) for Ray Clusters in OpenShift AI 3.0 and delivering it natively with KubeRay. CNE is an arbitrary, platform-enforced policy that streamlines Ray’s recommended security best practices to protect your clusters by default.

The 3 pillars of the controlled network environment

Three images illustrating managed network best practices: isolation, authentication, and access

Figure 1: Different aspects of secure system design, focusing on network isolation, authenticated data flow and controlled user access.

The controlled network environment is built on 3 essential, platform-enforcing security features that are automatically applied to every Ray Cluster you create in OpenShift AI 3.0.

1. Network isolation

We’ve streamlined the mechanism for network isolation by automatically applying Kubernetes native network policies via the KubeRay operator. This setup strictly limits network traffic to within the Ray Cluster itself, effectively blocking access from other pods in the network and creating a secure perimeter around your workload.

2. Verified Back End (mTLS)

Security in OpenShift AI now includes a forcibly authenticated backend that uses mTLS (mutual transport layer security). This critical feature authenticates and encrypts all internal communications within the Ray Cluster. The re-architecture of this feature uses cert-manager to automatically manage the necessary certificates and secrets, simplifying deployment. For users of the codeflare-sdk client, your existing workflow remains unchanged.

3. Controlled access

OpenShift AI 3.0 also improves the user experience and security for accessing the Ray dashboard. The controlled access feature now integrates with the platform’s broader authentication redesign using the Gateway API.

The platform now uses the existing OpenShift AI session for authentication, delivering a consistent and uniform user experience (UX) without requiring repeated login actions.

Simplify and strengthen the platform

In addition to the security benefits, these changes also led to platform improvements.

  • Simplified design: The main driver of these changes was to simplify the overall architecture. Moving core security logic – such as network isolation and mTLS configuration – directly into the KubeRay Reconciler helped reduce complexity, paving the way for faster updates and feature delivery in the future.
  • Improved UX: The new Controlled Access uses a broader platform authentication redesign, providing smoother, more secure UX.
  • Platform Enforcement Policy: The entire CNE configuration automatically applies the necessary configuration to any Ray Cluster created within an OpenShift AI environment. This approach strengthens group security by default.

Contribute upstream

The re-architecture wasn’t just about simplification, it also helped lay the groundwork for future collaboration. We are already starting the process of contributing these changes to the upstream KubeRay community.

Next steps

Red Hat OpenShift AI 3.0 delivers a production-ready Ray experience by making robust security the default. Start with your Ray workloads today.

Want to learn more about Ray and Queue integration (currently in Technical Preview) on OpenShift AI 3? See technical deep dive: Tame Ray workloads on OpenShift AI with KubeRay and Kuue.



Eva Grace

Eva Grace

Leave a Reply

Your email address will not be published. Required fields are marked *