OKEP-6019: VRF-Lite with Shared Gateway Mode using Uplinks¶
- Issue: #6019
Networking Glossary¶
| Term | Definition |
|---|---|
| VRF-Lite | Linux host VRF separation where interfaces are attached to specific VRFs without MPLS. |
| Shared GW | OVN and host share an external OVS bridge for north/south traffic. |
| CUDN | ClusterUserDefinedNetwork, cluster-scoped primary or secondary user-defined network API. |
| MEG | Multiple External Gateways feature for pod egress steering using additional gateway(s). |
| Uplink | A named, cluster-scoped connectivity target that CUDNs can use for external traffic. |
| OVSBridge | An OVS bridge-backed uplink type. This is the only uplink type in this OKEP. |
Problem Statement¶
OVN-Kubernetes support for VRF-Lite is limited to local gateway mode today. VRF-Lite also requires manual host preparation, including interface-to-VRF attachment. This is operationally expensive and error-prone in heterogeneous node environments.
OVN-Kubernetes currently assumes one default external bridge and optionally one extra bridge via
config.Gateway.EgressGWInterface. This model is not sufficient for shared gateway mode VRF-Lite, where different CUDNs
may need to use different pre-provisioned OVS bridges and physical uplinks.
The goal of this OKEP is not to make OVN-Kubernetes a general host bridge provisioning system. Bridge creation, bridge port membership, IP addressing, default routes, MTU, and VLAN setup remain external preparation tasks, for example via NMState or other node configuration tooling. OVN-Kubernetes consumes that prepared state, wires it into OVN and VRF-Lite, and publishes typed per-node gateway state for CUDN reconciliation.
Goals¶
- Support VRF-Lite with shared gateway mode by allowing primary Layer2 and Layer3 CUDNs to reference a list of named
Uplinkobjects, limited to one item in this OKEP. - Introduce a generic
UplinkAPI that can be extended later beyond OVS bridges. - Support one
OVSBridgenode config type in this OKEP, where the backing OVS bridge is pre-created by the administrator. - Allow each
Uplinkto describe different host-side gateway interfaces for different node groups throughspec.nodeConfigs, while resolving to at most one link per node. - Store per-node discovery, validation, and gateway data in a typed
UplinkStateresource. - Use
UplinkStateas the typed gateway configuration source for this feature instead of adding CUDN entries to the nodek8s.ovn.org/l3-gateway-configannotation. - Configure OVN bridge mappings automatically for CUDNs that select a
Uplink. - Attach the deployment-specific gateway interface to the matching CUDN VRF only when matching
RouteAdvertisementsresolves the CUDN to a per-CUDN VRF. In a non-DPU deployment without hardware offload, this is the OVS bridge internal interface. With SmartNIC/offload, this is the accelerated VF/PF representor used as the host gateway interface. In a DPU deployment, the DPU-Host attaches the host PF to the host-side CUDN VRF, while the DPU side attaches the resolved bridgeLOCALinterface to the DPU-local route-import VRF. When the CUDN uses the default routing domain, these interfaces remain in the default VRF. - Preserve existing behavior for the default gateway bridge and current MEG deployments when the new API is not used.
- Support DPU deployments by splitting
UplinkStatediscovery between the DPU and DPU-Host while keeping OVS programming on the DPU side. - Support service traffic through the CUDN's resolved OVS bridge by programming the relevant per-UDN service flows there.
Non-Goals¶
- Configuring IP addresses, routes, MTU, or VLAN tags on the selected host interface or resolved bridge. OVN-Kubernetes only consumes existing VLAN tags from OVS and applies the corresponding OVN gateway configuration.
- Replacing NMState or other node network provisioning tools.
- Replacing FRR/FRR-K8S APIs or changing BGP peering semantics.
- Collapsing multiple CUDNs into the same non-default VRF for Uplink-backed VRF-Lite isolation. CUDNs may intentionally
share the default VRF/routing domain when
RouteAdvertisements targetVRF: defaultis used; that is not VRF isolation.
Future Goals¶
- Support
Uplinkin local gateway mode, including any requiredInterfaceuplink type or host-routing integration. - Support
Uplinkwith EVPN transport after shared gateway EVPN support exists and the relationship between Uplink bridge selection, VTEPs, and EVPN underlay selection is defined. - Allow localnet CUDNs to reference
Uplink, so OVN-Kubernetes can configure bridge mappings automatically instead of requiring the user to coordinatephysicalNetworkNameand bridge mappings separately. - Allow a CUDN to reference multiple
Uplinkobjects for ECMP, multi-homing, or reachability to additional external subnets. - Consider OVN-Kubernetes-owned bridge lifecycle in a follow-up proposal if there is clear demand.
- Evaluate consumers beyond VRF-Lite CUDNs, such as secondary EgressIP, EgressService, or ICNI, without expanding this OKEP's initial scope.
- Converge legacy
EgressGWInterfaceand default gateway bridge behavior into a more general uplink model. - Evaluate migrating the default shared gateway bridge and legacy MEG bridge gateway data from the
k8s.ovn.org/l3-gateway-confignode annotation into typedUplinkState-style status, so default and non-default gateway bridges eventually use the same gateway data model.
Introduction¶
The BGP enhancement introduced RouteAdvertisements and VRF-aware route export/import with FRR. It also introduced VRF-Lite use cases for CUDNs, but those use cases rely on manual host configuration and are currently scoped to local gateway mode in the documentation.
OVN-Kubernetes node gateway behavior today assumes:
- One default external bridge (
br-ex/breth0) that OVN-Kubernetes can create automatically. - One optional secondary bridge selected by
config.Gateway.EgressGWInterface, primarily for MEG related paths on the cluster default network (CDN); MEG is not supported for CUDNs.
This works for default networking and limited dual-uplink use cases, but not for shared gateway VRF-Lite environments where multiple tenant networks may need distinct uplinks while keeping the shared gateway datapath and DPU offload advantages.
In DPU deployments, OVS runs on the DPU rather than the DPU-Host. OVN-Kubernetes therefore needs a typed way for the DPU
and DPU-Host components to exchange gateway data for a selected uplink without relying on host-local PCI identifiers or
extending the node l3-gateway-config annotation with CUDN-specific entries.
User Stories¶
Story 1: Tenant VRFs on different pre-provisioned bridges¶
As an admin, I want to create CUDNs blue and red that use different already-created OVS bridges, br-blue and
br-red, on my nodes. I create Uplink CRs named blue and red, each with an OVSBridge node config that points to
the host-side gateway interface backed by the appropriate bridge for the matching node group. I then configure each CUDN
with spec.uplinks.
OVN-Kubernetes validates the host interface and resolves the backing bridge on each relevant node, configures the CUDN
localnet bridge mapping, attaches the selected host interface to the CUDN VRF when RouteAdvertisements resolves the
CUDN to per-CUDN VRF-Lite isolation, and publishes UplinkState with the gateway data needed by the CUDN controller and
RouteAdvertisements.
Story 2: Split default network and CUDN external traffic¶
As an admin, I want the cluster default network to keep using the existing default gateway bridge (br-ex / breth0),
while a primary CUDN uses a different pre-created OVS bridge. I create a Uplink for the host-side gateway interface
backed by that bridge and configure the CUDN with spec.uplinks. OVN-Kubernetes programs the CUDN external path on the
resolved bridge while leaving the default network bridge behavior unchanged.
Without BGP, CUDN egress follows normal shared gateway behavior and is SNAT'ed to the selected host interface or gateway IP when applicable. This OKEP does not add a general non-BGP ingress-to-pod routing mechanism.
With BGP, RouteAdvertisements determines whether the selected uplink participates in the default routing domain or in a
per-CUDN VRF-Lite routing domain. OVN-Kubernetes updates the selected uplink's VRF membership and the UDN VRF route
programming from that effective routing-domain decision.
Proposed Solution¶
Introduce a new cluster-scoped Uplink CRD and add an uplinks list to CUDN. This OKEP limits the CUDN list to one
item, but the list shape leaves room for future multi-uplink routing without another CUDN API shape change.
At a high level:
- The admin prepares the OVS bridges and bridge ports on each node using existing host configuration tooling.
- The admin creates a
Uplinkwhose node configs map node groups to host-side gateway interface names. - The admin creates a primary Layer2 or Layer3 CUDN with
spec.uplinks: [<uplink-name>]. - OVN-Kubernetes validates the selected host interface, resolves the backing OVS bridge on each relevant node, and
publishes one
UplinkStateperUplink/node. - OVN-Kubernetes configures OVN bridge mappings and the CUDN shared-gateway external path using
UplinkState. - OVN-Kubernetes derives the CUDN's effective routing domain from matching
RouteAdvertisements. When the CUDN resolves to a per-CUDN VRF, OVN-Kubernetes attaches the selected host interface to that VRF. When the CUDN resolves to the default VRF, or no matchingRouteAdvertisementsselects it, the selected host interface remains in the default VRF. - Existing RouteAdvertisements behavior remains;
targetVRF: autocan now work with shared gateway mode through the selected uplink andUplinkState.
Workflow Description¶
- Admin creates one or more pre-existing OVS bridges on the nodes.
- Admin creates a
UplinkCR with one or moreOVSBridgenode configs. - Admin creates a primary Layer2 or Layer3 CUDN with
spec.uplinksreferencing theUplink. - For VRF-Lite route advertisement, admin configures FRR peering as done today.
- For VRF-Lite route advertisement, admin creates
RouteAdvertisementswithtargetVRF: autoselecting the CUDN. - OVN-Kubernetes validates host interface and bridge state, updates
UplinkState, configures bridge mappings, and attaches the selected host interface to the CUDN VRF only when matchingRouteAdvertisementsresolves the CUDN to a per-CUDN VRF on nodes where the CUDN is active. - When BGP is configured, routes are advertised and received as today; dataplane uses shared gateway bridge flows and UDN VRF routing semantics.
Host interface and bridge validation means OVN-Kubernetes checks that the configured host interface exists, resolves to
one backing OVS bridge, the relevant link states are up, and the resolved bridge or gateway interface has the IP address
data OVN-Kubernetes needs for gateway configuration. In Full mode, resolution means the host interface is attached to the
local OVS bridge. In DPU mode, resolution means the host-side interface can be matched to the DPU-side representor and
bridge. OVN-Kubernetes discovers this information from existing node state; it does not configure these admin-provisioned
properties through the Uplink API.
API Details¶
New CRD: Uplink¶
An Uplink represents a physical network path out of the node. Different Uplink types can describe different ways to
model that path. In this OKEP, the only supported type is OVSBridge, which may represent a network path across host,
DPU, or SmartNIC boundaries. The Uplink gives OVN-Kubernetes a named path that it can connect the logical OVN topology
to.
apiVersion: k8s.ovn.org/v1alpha1
kind: Uplink
metadata:
name: blue-underlay
spec:
nodeConfigs:
- type: OVSBridge
nodeSelector:
matchLabels:
rack: rack-a
hostInterfaceName: pf0hpf
- type: OVSBridge
nodeSelector:
matchLabels:
rack: rack-b
hostInterfaceName: pf1hpf
- type: OVSBridge
nodeSelector:
matchLabels:
offload: "false"
hostInterfaceName: br-blue
status:
The Uplink name is the stable API reference used by CUDNs. It does not need to match the host interface name or the
resolved OVS bridge name on any node.
UplinkSpec¶
nodeConfigs(required): list of node-scoped uplink configs. Each item describes the single link thisUplinkuses on nodes selected by that item.nodeConfigs[*].type(required): uplink type and union discriminator. This OKEP supports onlyOVSBridge.nodeConfigs[*].nodeSelector(required): nodes where this config applies. An empty selector matches all nodes.nodeConfigs[*].hostInterfaceName(required): host-visible Linux interface name that carries the gateway L3 identity for this uplink on nodes selected by this config.
hostInterfaceName has the same API meaning in all modes: it is the host-visible Linux interface carrying the uplink
gateway L3 identity on the node. What differs by deployment mode is how OVN-Kubernetes resolves that interface to the OVS
bridge used for bridge mappings and OpenFlow. In Full mode, the interface is expected to be attached directly to the local
pre-existing OVS bridge, typically the bridge LOCAL interface or a host-side internal port. In DPU mode, the interface
is the DPU-Host-side gateway interface; OVN-Kubernetes uses its MAC and representor peer relationship to discover the
DPU-side representor and OVS bridge. DPU-side representor names are not part of the Uplink API or published in
UplinkState. Different interface names on different node groups are represented with multiple nodeConfigs and
nodeSelector; CIDR-based interface selection is out of scope for this OKEP.
nodeConfigs is a per-node selection table, not a request to use multiple links from one Uplink. For any given node,
one Uplink resolves to at most one host-side gateway interface and one backing OVS bridge.
This OKEP does not define OVSBridge-specific spec fields. type: OVSBridge means OVN-Kubernetes resolves
hostInterfaceName to a pre-existing OVS bridge and reports the resolved bridge name in UplinkState.status.ovsBridge.
The Uplink object intentionally does not describe the OVS bridge name, bridge ports, physical NICs, IP addresses, MTU,
default gateways, or VLANs. Those are properties of the pre-provisioned node network. OVN-Kubernetes discovers the subset
of that state needed for OVN gateway configuration and reports it through UplinkState.
Uplink validation¶
Static API-shape rules are enforced by CRD schema/kubebuilder validation and CEL. Rules that depend on live node labels
or node-local interface/OVS state are enforced by controllers and reported through Uplink and UplinkState conditions.
spec.nodeConfigshasMinItems=1andMaxItems=64.nodeConfigs[*].typemust beOVSBridge.nodeConfigs[*].hostInterfaceNamemust be a valid Linux interface name. Existence, resolution to exactly one backing OVS bridge, link state, IP address data on the resolved bridge or gateway interface, MAC address, route/default-gateway data when present, MTU, and any OVS VLAN tag are discovered or validated by the node reconcilers because they depend on node-local state. OVN-Kubernetes does not configure these admin-provisioned properties through theUplinkAPI.- For a given node, at most one node config in a
Uplinkmay apply in this OKEP. BecausenodeSelectoroverlap depends on live node labels, this cannot be fully enforced by CRD schema validation. The Uplink controller resolves selected node configs per node; if more than one applies to a node, theUplinkand affectedUplinkStateobjects report degraded status until the overlap is resolved. - If no node config applies to a node where a CUDN referencing the
Uplinkis active, cluster-manager reportsUplinkNotFoundForNodein CUDN status and no gateway configuration is generated for that node. Uplink.spec.nodeConfigsis mutable. Updating a node config can temporarily degrade CUDNs that reference theUplinkwhile node state is rediscovered.
Uplink conditions¶
Uplink.status.conditions is owned by the ovnkube-cluster-manager Uplink controller and contains aggregate state only.
The controller derives this state from Uplink, Nodes, and the per-node UplinkState objects:
Degraded: at least one selected node has an error or required missing state in its correspondingUplinkState, or current node labels make thenodeConfigsselection ambiguous.
Because nodeConfigs[*].nodeSelector depends on node labels, Uplink.status can change without Uplink.spec changing.
This is expected. For example, labeling a new node into a node config creates or updates that node's UplinkState; if
the host interface is missing or cannot be resolved to an OVS bridge on that node,
Uplink.status.conditions[Degraded] becomes true.
When one or more CUDNs reference an Uplink, ovnkube-cluster-manager keeps a finalizer on the Uplink so it cannot be
deleted while still selected by a CUDN.
Uplink.status.conditions[Degraded] is aggregate, operator-facing health for the Uplink object. It is not consumed
directly as CUDN readiness, because two Dynamic UDN-backed CUDNs can reference the same Uplink while being active on
different node sets. CUDN uplink readiness is derived by cluster-manager from the referenced Uplink spec and the
UplinkState objects relevant to that specific CUDN/UDN active node set. This OKEP allows only one referenced Uplink,
but the status logic should be structured around the list.
New CRD: UplinkState¶
Per-node discovery and gateway state is stored in a separate cluster-scoped UplinkState resource instead of a large
list in Uplink.status or a node annotation. This avoids many nodes contending on the same Uplink object and keeps
node-owned gateway data in a typed API.
UplinkState is keyed by Uplink and node. A typical object name is <uplink-name>.<node-name> when that fits within
Kubernetes object name limits. If the combined name would be too long, OVN-Kubernetes uses a deterministic
DNS-subdomain-safe name built from truncated name prefixes plus a stable hash of the full <uplink-name>/<node-name>
pair. Controllers must treat status.uplinkName and status.nodeName as the canonical identity, not parse identity from
the object name. Labels are lookup aids; when full names do not fit label value limits, OVN-Kubernetes should use stable
hash label values and store the full names in the k8s.ovn.org/uplink and k8s.ovn.org/node annotations and status. On
UplinkState delete events, controllers should derive the affected Uplink from status.uplinkName when available, or
from the k8s.ovn.org/uplink annotation when status is unavailable. If neither identity source is present in the delete
tombstone, the controller may fall back to reconciling all Uplink objects. The user does not set spec fields on this
resource; OVN-Kubernetes creates and updates it.
apiVersion: k8s.ovn.org/v1alpha1
kind: UplinkState
metadata:
name: blue-underlay.ovn-worker-a
labels:
k8s.ovn.org/uplink: blue-underlay
k8s.ovn.org/node: ovn-worker-a
status:
uplinkName: blue-underlay
nodeName: ovn-worker-a
type: OVSBridge
hostInterfaceName: pf0hpf
ovsBridge:
name: br-blue
macAddress: 02:42:c0:00:02:06
ipAddresses:
- 192.0.2.6/24
defaultGateways:
- 192.0.2.1
conditions:
- type: Ready
status: "True"
reason: Ready
UplinkState status¶
These fields are reconciliation inputs as well as user-visible status:
The top-level status fields describe generic resolved L3 gateway data that OVN-Kubernetes needs regardless of the uplink
type: selected host interface, MAC address, IP addresses, default gateways, and readiness. Type-specific resolved data
belongs under the corresponding union field. For this OKEP, only ovsBridge.name is OVS bridge-specific. Future uplink
types should add their own type-specific status blocks rather than moving the common gateway fields under ovsBridge.
uplinkName:Uplinkthis state belongs to.nodeName: node this state belongs to.type: resolved uplink type. This OKEP supports onlyOVSBridge.hostInterfaceName: host-visible Linux interface selected by theUplinknode config. This field has the same meaning in all modes: the interface carrying the host-side gateway L3 identity. In Full mode, OVN-Kubernetes resolves it to a local OVS bridge. In DPU mode, OVN-Kubernetes uses it as the DPU-Host-side handoff for discovering the DPU-side representor and bridge. DPU-side representor names are not exposed inUplinkState.ovsBridge(required whentypeisOVSBridge): resolved OVS bridge data.ovsBridge.name: resolved OVS bridge used for CUDN bridge mappings and OpenFlow on this node. In Full mode this is resolved locally fromhostInterfaceName. In DPU mode this is resolved and published by the DPU-side reconciler after matching the host interface MAC to the DPU-side representor and bridge.macAddress: MAC address used for the OVN gateway interface on this bridge. In DPU deployments, the DPU-Host side publishes the selected host interface MAC here so the DPU can find the corresponding representor without publishing PF IDs or host PCI addresses.ipAddresses: host-side shared gateway IP addresses discovered from the selected host interface.defaultGateways: default route next-hop IPs discovered from host routing for the selected host interface, when present. This maps tonext-hopsin the existingl3-gateway-configshape when internal compatibility requires it. If no default route exists, this field can be empty; egress can still work for destinations covered by BGP-learned routes imported by OVN-Kubernetes. WhenRouteAdvertisements targetVRF: autoresolves the CUDN to per-CUDN VRF-Lite isolation, OVN-Kubernetes does not install the discovered default gateway as the CUDN VRF default route; BGP-imported routes drive external reachability for that isolated VRF. FortargetVRF: defaultor no matchingRouteAdvertisements, normal shared gateway default-route behavior remains.
In DPU deployments, the DPU-Host initializes the top-level status with host-side L3 data. The DPU side patches DPU-local
fields on the same object, including ovsBridge.name and DPU-side validation state.
Gateway mode is derived from OVN-Kubernetes configuration rather than
stored per UplinkState. UplinkState stores only reusable per-node host gateway data and is not serialized into
k8s.ovn.org/l3-gateway-config for CUDNs.
UplinkState conditions¶
UplinkState.status.conditions reports node-local discovery state with a single Ready condition. Reasons describe the
current state:
Ready: bridge discovery and gateway data discovery are complete for thisUplink/node.HostInterfaceNotFound: the selectedhostInterfaceNamedoes not exist on the node or DPU-Host.BridgeNotFound: the selected host interface cannot be resolved to a backing OVS bridge.BridgeInvalid: the bridge exists but has an unsupported or ambiguous layout.MTUInvalid: the resolved bridge/gateway path MTU is smaller than the effective OVN-Kubernetes gateway/pod MTU this network must carry.GatewayInfoUnavailable: OVN-Kubernetes cannot discover the gateway MAC/IP data needed for OVN configuration.WaitingForDPU: DPU-Host has published host-side data but is waiting for DPU-side bridge validation.WaitingForDPUHost: DPU-side reconciliation is waiting for the DPU-Host to publish host-side L3 data.NodeSelectorOverlap: more than oneUplink.spec.nodeConfigsentry applies to this node, so OVN-Kubernetes cannot select a single node config for thisUplink/node pair.
When discovery is complete, Ready=True with reason Ready. For the other reasons above, Ready=False.
Per-CUDN reconciliation failures, such as VRF attachment failure, bridge mapping failure, or unsupported bridge sharing,
are surfaced through CUDN/UDN status rather than UplinkState. UplinkState remains the reusable gateway-data source for
the Uplink/node pair.
VLAN configuration is not part of the Uplink or UplinkState API. When the selected host interface in Full mode, or
the resolved DPU-side representor or bridge-local gateway interface in DPU mode, has an OVS VLAN tag, ovnkube-node or
ovnkube-node-dpu detects that tag from OVS and configures the OVN gateway router external port to use the same VLAN
behavior. The user is responsible for provisioning the OVS bridge with the correct tagged host-facing interface. In DPU
deployments, the DPU-side representor is discovered locally and is not published in UplinkState.
MTU is also not part of the Uplink or UplinkState API. The user must provision a consistent MTU on the physical
uplink, OVS bridge, and host gateway interface. In DPU deployments, this includes the host-side gateway interface and the
DPU-side representor path that backs the resolved bridge. OVN-Kubernetes validates that the resolved bridge/gateway path
MTU is at least large enough for the effective OVN-Kubernetes gateway/pod MTU this network must carry. If that validation
fails, the node's UplinkState reports Ready=False with reason MTUInvalid. Uplink does not provide a way to raise
a CUDN above OVN-Kubernetes' effective network MTU.
CUDN CRD extension¶
Add an optional uplinks field to CUDN spec.
apiVersion: k8s.ovn.org/v1
kind: ClusterUserDefinedNetwork
metadata:
name: blue
spec:
namespaceSelector:
matchLabels:
tenant: blue
network:
topology: Layer3
layer3:
role: Primary
subnets:
- cidr: 10.20.0.0/16
uplinks:
- br-blue
CUDN validation and status¶
spec.uplinksis optional. CUDNs that do not set it behave exactly as they do today.spec.uplinks[*]is a direct reference to oneUplinkby name.spec.uplinkshasMaxItems=1in this OKEP.spec.uplinksis valid for primary Layer2 and Layer3 CUDNs only when OVN-Kubernetes is configured for shared gateway mode in this OKEP.- CEL rule:
!has(self.uplinks) || size(self.uplinks) == 0 || ((self.network.topology == 'Layer2' && has(self.network.layer2) && self.network.layer2.role == 'Primary') || (self.network.topology == 'Layer3' && has(self.network.layer3) && self.network.layer3.role == 'Primary')) spec.uplinksis not valid when the CUDN setsspec.network.transport: EVPN.- CEL rule:
!has(self.uplinks) || size(self.uplinks) == 0 || !has(self.network.transport) || self.network.transport != 'EVPN' spec.uplinksis not valid for localnet topology in this OKEP. Localnet support is a future goal.spec.uplinksis immutable after CUDN creation.- CEL rule on
spec.uplinks:self == oldSelf - If OVN-Kubernetes is configured for local gateway mode and a CUDN sets
spec.uplinks, set a CUDN condition with reasonUplinkUnsupportedGatewayMode. - If a CUDN sets both
spec.uplinksandspec.network.transport: EVPN, set a CUDN condition with reasonUplinkUnsupportedTransport. - If a referenced
Uplinkdoes not exist, set a CUDN condition with reasonUplinkNotFound. - If more than one
Uplink.spec.nodeConfigsitem applies to a selected node, set reasonUplinkOverlapOnNode. - If no
Uplink.spec.nodeConfigsitem applies to a node where the CUDN/UDN is active, set reasonUplinkNotFoundForNode. - If the matching
UplinkStateis not ready for a node where the CUDN/UDN is active, set reasonUplinkNotReadyForNode. - If OVN-Kubernetes cannot attach or detach the selected host interface according to the CUDN's effective routing domain,
set reason
UplinkVRFAttachmentFailed. - If OVN-Kubernetes cannot configure the CUDN bridge mapping on the resolved OVS bridge, set reason
UplinkBridgeMappingFailed. - For Dynamic UDN, if a node is selected by both the CUDN and
Uplinkbut the UDN is not active on that node, OVN-Kubernetes delays Uplink reconciliation until the UDN becomes active. If the UDN later becomes inactive on that node, OVN-Kubernetes cleans up OVN-Kubernetes-owned per-CUDN artifacts for that node and leaves the admin-owned OVS bridge and host network configuration unchanged.
These reasons are surfaced through a dedicated CUDN condition rather than overloading NetworkReady semantics:
type: UplinksReadystatus: Truewhen the CUDN does not setspec.uplinks, or when all required uplinks for every active node for the CUDN/UDN have usable referenced Uplink state and required per-CUDN uplink programming has succeeded.status: Falsewhen one or more required uplinks are missing, unsupported, not ready on an active node, or cannot be programmed for the CUDN. The condition uses the reasons listed above, includingUplinkUnsupportedGatewayMode,UplinkUnsupportedTransport,UplinkNotFound,UplinkOverlapOnNode,UplinkNotFoundForNode,UplinkNotReadyForNode,UplinkVRFAttachmentFailed,UplinkBridgeMappingFailed, andUplinkConfigurationConflict.
UplinksReady is aggregate readiness for the CUDN's full spec.uplinks selection. This OKEP limits that list to one
item. Future multi-uplink work may add keyed per-uplink status if detailed per-uplink reporting is needed.
When multiple active nodes fail uplink readiness for the same CUDN, UplinksReady remains a single aggregate condition.
The condition reason is selected deterministically from the failing reasons. The message contains a bounded summary, such
as the number of affected active nodes and a small sample of node names. It must not list every failed node in large
clusters. Full per-node details remain in the corresponding UplinkState objects, which can be queried by the
k8s.ovn.org/uplink and k8s.ovn.org/node labels or annotations.
UplinksReady=True for CUDNs without spec.uplinks keeps the condition usable as a component of any future aggregate
CUDN readiness condition without requiring consumers to special-case the absence of an uplink.
spec.uplinks is modeled as a list now to preserve room for future multi-uplink routing. For example, a future CUDN
could select three Uplinks. If RouteAdvertisements targetVRF: auto selects that CUDN, OVN-Kubernetes would attach all
three selected host interfaces to the CUDN VRF. If the node already had kernel routes with two equal-cost default routes
through the first two uplinks and a more specific subnet route through the third uplink, traffic selection would follow
those routes. With RouteAdvertisements and BGP, targetVRF: auto could import and export routes only in the CUDN VRF:
two peers could advertise ECMP paths to one network while a third peer advertises a different reachable subnet. This
routing policy belongs to host routing and RouteAdvertisements/FRR-K8S configuration, not to the Uplink object.
targetVRF: default would intentionally leave those interfaces in the shared default routing table and can include routes
from unrelated default-network or CDN uplinks, so it is not the isolated multi-uplink CUDN case this future expansion is
meant to cover. A mode where OVN-Kubernetes installs routes in the absence of BGP or pre-existing node routes would
require a separate managed routing/provisioning API and is out of scope.
The CUDN status controller runs in ovnkube-cluster-manager. It watches the referenced Uplink objects, Nodes, and
UplinkState objects, then derives the CUDN's external uplink readiness from the node-local state relevant to that CUDN.
It does not treat aggregate Uplink.status.conditions[Degraded] as a direct CUDN readiness input, and it does not rely on
ovnkube-node directly updating CUDN status. This OKEP allows only one referenced Uplink.
For node-local checks, the CUDN status controller filters UplinkState by the CUDN/UDN's relevant node set. For Dynamic
UDN, nodes where the CUDN and Uplink both select the node but the UDN is not active do not block CUDN readiness and do
not produce UplinkNotFoundForNode or UplinkNotReadyForNode.
Bridge sharing¶
Multiple CUDNs can reference the same Uplink, but whether they can use the same resolved OVS bridge on the same node
depends on the routing domain:
RouteAdvertisements targetVRF: auto: OVN-Kubernetes resolves a distinct CUDN VRF name for each selected CUDN. Sharing the same effective OVS bridge between multiple active CUDNs is not supported for this per-CUDN VRF-Lite isolation mode.RouteAdvertisements targetVRF: default: selected CUDNs intentionally share the default routing domain. In this mode, multiple CUDNs may share the same effective OVS bridge. This is not VRF isolation, and overlapping CUDN subnets remain user misconfiguration.- No RouteAdvertisements/BGP: selected CUDNs use normal shared gateway egress behavior on the resolved bridge. Egress
routing follows the default gateways discovered for that node's selected
Uplinkand published inUplinkState.status.defaultGateways; pod traffic is SNAT'ed to the selected host interface or gateway IP when applicable. For bridge-sharing conflict detection, this is treated like shared/default routing-domain use, so multiple CUDNs may share the same effective OVS bridge. This is not VRF-Lite route isolation. - Explicit non-default
targetVRF: assigning multiple CUDNs to the same custom non-default VRF is not supported by this OKEP.
Unsupported sharing is treated as a symmetric invalid configuration. All active CUDNs in the unsupported conflict set are
reported not ready with reason UplinkConfigurationConflict; OVN-Kubernetes does not use first-writer-wins ordering or
preserve a previous CUDN assignment as the winner. If a new CUDN creates an unsupported conflict with an already-programmed
CUDN, OVN-Kubernetes does not program new conflicting uplink artifacts while the conflict exists. It also does not
proactively tear down previously programmed dataplane for the existing CUDN solely because another CUDN introduced a
conflict. Existing traffic may continue until normal reconciliation or deletion removes the previously programmed
artifacts, but the conflict is reported as unsupported and users should not rely on that dataplane behavior. Admin-owned
OVS bridge and host network configuration is left unchanged.
RouteAdvertisements interaction¶
No new RouteAdvertisements API is required.
Uplink selection is not limited to CUDNs using BGP or RouteAdvertisements. A CUDN may select a Uplink without
configuring BGP. In shared gateway mode, egress traffic uses the resolved OVS bridge and follows normal shared gateway
behavior, including SNAT to the selected host interface or gateway IP when applicable. This OKEP does not add a general
non-BGP ingress-to-pod routing mechanism.
OVN-Kubernetes derives whether an Uplink-backed CUDN needs VRF attachment from the RouteAdvertisements that select that
CUDN and node:
targetVRF: auto: the CUDN resolves to its per-network VRF. OVN-Kubernetes attaches the selected host interface to that VRF in Full/DPU-Host mode. In DPU mode, OVN-Kubernetes creates the DPU-local route-import VRF and attaches the resolved bridgeLOCALinterface to it.targetVRF: default: the CUDN uses the default routing domain. OVN-Kubernetes leaves the selected host interface and DPU bridgeLOCALinterface in the default VRF.- No matching
RouteAdvertisements: the CUDN uses the default routing domain for Uplink purposes. OVN-Kubernetes leaves the selected host interface and DPU bridgeLOCALinterface in the default VRF.
This decision is level driven. Changes to RouteAdvertisements, selected FRR configuration, CUDN labels, node labels, or
network activation can move an Uplink-backed CUDN between the default routing domain and per-CUDN VRF-Lite isolation.
OVN-Kubernetes owns only the VRF membership it creates for the selected Uplink interface; it does not create routes,
delete admin-owned routes, or move bridge/IP/MTU/VLAN configuration owned by the administrator.
For per-CUDN VRF-Lite isolation, OVN-Kubernetes omits or removes the normal shared gateway default route from the UDN VRF
so egress follows BGP-imported routes instead of falling back to a default route from another routing domain. For the
default routing domain, or when no matching RouteAdvertisements selects the CUDN, the UDN VRF keeps the normal shared
gateway default route and service route behavior.
Expected usage with VRF-Lite + shared gateway:
apiVersion: k8s.ovn.org/v1
kind: RouteAdvertisements
metadata:
name: blue-ra
spec:
targetVRF: auto
advertisements:
- PodNetwork
nodeSelector: {}
frrConfigurationSelector:
matchLabels:
routeAdvertisements: vpn-blue
networkSelectors:
- networkSelectionType: ClusterUserDefinedNetworks
clusterUserDefinedNetworkSelector:
networkSelector:
matchLabels:
k8s.ovn.org/metadata.name: blue
In DPU deployments, targetVRF: auto still resolves to the CUDN VRF name. Since the DPU does not naturally have the
host-side CUDN VRF, ovnkube on the DPU creates a DPU-local route-import VRF with that name only for this per-CUDN
VRF-Lite case. FRR peers in that VRF and installs learned routes into the VRF routing table; OVN-Kubernetes route import
then reflects those routes into the CUDN gateway router. This DPU-local VRF is separate from the DPU-Host VRF and
nftables state. For targetVRF: default or CUDNs without matching RouteAdvertisements, the DPU bridge LOCAL interface
remains in the default VRF and no DPU-local CUDN route-import VRF is created for that CUDN.
The DPU-local FRR peering IP is not stored in UplinkState. For targetVRF: auto, it is expected to be configured on the
resolved OVS bridge's LOCAL interface as local DPU state. OVNKube on the DPU enslaves that LOCAL interface to the
DPU-local route-import VRF and uses that interface for FRR peering. The advertised next-hop for pod routes remains the
host shared gateway IP from UplinkState.status.ipAddresses.
For DPU deployments, generated FRR configuration for the host node is applied to the corresponding DPU FRR instance. The
RouteAdvertisements controller uses UplinkState.status.ipAddresses as the source for the host shared gateway next-hop
when generating next-hop override configuration. If the matching UplinkState does not yet have host shared gateway IPs,
RouteAdvertisements reconciliation waits and reports pending/degraded status rather than generating incomplete FRR
configuration.
For Uplink-backed networks using route import, OVN-Kubernetes also scopes imported BGP routes to the selected uplink
bridge. RouteImport resolves the Linux link index from the matching UplinkState.status.ovsBridge.name and imports only
BGP routes whose kernel route output link matches that bridge. This prevents routes learned or leaked through the default
bridge or another uplink from being installed into a CUDN gateway router that is attached only to the selected uplink
bridge. If the matching UplinkState is missing, not ready, or has no resolved bridge, RouteImport treats the route set
as pending for that network rather than importing routes from another interface.
Implementation Details¶
Controller changes¶
Uplink controller (cluster manager)¶
- Watches
Uplink,UplinkState, Nodes, and relevant CUDNs. - Relies on CRD schema and CEL for static
UplinkAPI-shape validation. - Resolves
Uplink.spec.nodeConfigsagainst live Nodes and reports controller-detected configuration problems, including multiple node configs matching the same node and CUDN active nodes that have no matching node config. - Aggregates
UplinkState.status.conditions[Ready]reasons such asHostInterfaceNotFound,BridgeNotFound,BridgeInvalid,MTUInvalid,GatewayInfoUnavailable,WaitingForDPU,WaitingForDPUHost, andNodeSelectorOverlapintoUplink.status.conditions[Degraded]. - Derives CUDN reason
UplinkOverlapOnNodefrom affectedUplinkStateobjects with reasonNodeSelectorOverlap. - Resolves nodes selected by a referencing active CUDN but not selected by any
Uplink.spec.nodeConfigsentry into CUDN reasonUplinkNotFoundForNode. - Maintains a finalizer on
Uplinkobjects referenced by one or more CUDNs. The finalizer is removed only after no CUDNs reference theUplink. - Aggregates per-node state without storing per-node details in
Uplink.status. - Reflects uplink readiness into CUDN status using dedicated uplink readiness reasons derived from the referenced
Uplinkspec and active-node-scopedUplinkStateobjects, not from aggregateUplink.status.conditions[Degraded].
Node uplink reconciler (OVNKube-Node Full mode)¶
For each CUDN with spec.uplinks on a node where the CUDN/UDN is active:
- Resolve the referenced
Uplinkand the node config that applies to this node. If more than oneUplink.spec.nodeConfigsentry applies, update this node'sUplinkStatewithReady=Falseand reasonNodeSelectorOverlap. - Validate that
nodeConfigs[*].hostInterfaceNameexists and carries the expected host-side gateway L3 identity. - Resolve the OVS bridge that contains the host interface and validate the bridge layout.
- Discover the host interface MAC address, IP addresses, default gateways, MTU, and any OVS VLAN tag on that interface from existing host and OVS state.
- Determine the CUDN's effective routing domain from matching
RouteAdvertisements. If the CUDN resolves to a per-CUDN VRF, attach the selected host interface to the CUDN VRF. If the CUDN resolves to the default VRF, remove any OVN-Kubernetes-owned CUDN VRF attachment for that interface and leave it in the default VRF. OVN-Kubernetes does not create/delete the bridge, remove bridge ports, move IP addresses, or alter admin-owned VLAN/MTU/route configuration. - Configure OVS bridge mappings using the normalized CUDN network name and the resolved OVS bridge name.
- Manage OpenFlow on the resolved bridge according to the CUDN running on it, including per-UDN service, masquerade, and return-path flows for services reachable through that uplink.
- Configure the OVN gateway router external port to use the discovered VLAN tag, when the host-facing gateway interface is tagged in OVS.
- Detect unsupported effective per-node bridge conflicts, including multiple active CUDNs selecting the same bridge while
requiring distinct non-default VRFs, and record the node-local conflict in
UplinkState. - Create or update this node's
UplinkState, including ownership labels,hostInterfaceName,ovsBridge.name, gateway data, and node-local discovery conditions. - Record node-local failure details for VRF attachment failure, bridge mapping failure, or unsupported bridge sharing in
UplinkState, so cluster-manager can derive the corresponding CUDN status reason such asUplinkConfigurationConflict.
The node reconciler should requeue affected Uplinks when local host or OVS state changes. Netlink subscriptions cover interface, address, route, and link-state changes; OVSDB or equivalent OVS change notifications cover bridge, port, VLAN tag, and bridge-layout changes. Normal retry handling remains required for transient failures, with a periodic resync as a fallback so admin-side network changes are eventually reflected even if an event is missed.
RouteAdvertisements and network advertised-VRF changes also requeue affected CUDN gateway reconciliation. Those changes can require attaching or detaching the selected uplink interface, adding or removing the normal default route from the UDN VRF, updating service route steering, and refreshing route-import filtering.
DPU reconciliation split¶
DPU deployments run ovnkube on both the DPU-Host and the DPU. OVS runs on the DPU on behalf of the host, so the Uplink reconciler is split across the two components.
DPU-Host reconciler:
- Watch
Uplinkand the matchingUplinkStatefor the host node. - Resolve the referenced
Uplinkand the node config that applies to the host node. - Find
nodeConfigs[*].hostInterfaceNameon the DPU-Host and discover its MAC address, IP addresses, default gateways, and MTU. - Determine the CUDN's effective routing domain from matching
RouteAdvertisements. Attach the selected host interface to the host-side CUDN VRF only for per-CUDN VRF-Lite isolation; otherwise leave it in the default VRF. - Update this node's
UplinkState.statuswithhostInterfaceName,macAddress,ipAddresses,defaultGateways, and host-side discovery conditions.
DPU-side reconciler:
- Watch the matching
UplinkStatefor the host node it represents. - Wait for the DPU-Host side to publish
status.macAddress. - Detect the DPU-side host gateway/PF representor by matching the host interface MAC to the existing representor peer relationship. The bridge is pre-provisioned, so the representor must already be attached to an OVS bridge on the DPU.
- Resolve the OVS bridge that contains the representor, validate the bridge layout, and publish
status.ovsBridge.nameplus DPU-side bridge validation conditions inUplinkState. - Create or reconcile the DPU-local route-import VRF for the CUDN only when the CUDN resolves to per-CUDN VRF-Lite
isolation. This VRF uses the same name that
RouteAdvertisements targetVRF: autoresolves to for the CUDN. - For per-CUDN VRF-Lite isolation, use the resolved bridge's
LOCALinterface as the DPU-local FRR peering interface and enslave thatLOCALinterface to the DPU-local route-import VRF. Any DPU-local FRR peering IP is expected to be configured on that interface and is not published inUplinkState. For default routing-domain use, leave theLOCALinterface in the default VRF. - Detect any OVS VLAN tag on the DPU-side host gateway representor or bridge-local gateway interface and configure the OVN gateway router external port to use the same VLAN behavior.
- Reconcile the OVS bridge mapping, DPU-local route-import VRF state when applicable, gateway interface data, and
OpenFlow on the DPU side using host shared gateway data from
UplinkState.status.
Although both DPU-Host and DPU-side reconcilers update the same UplinkState, updates are dependency ordered and low
frequency. The DPU side waits for the DPU-Host to publish host-side gateway data, especially status.macAddress, before
resolving and publishing DPU-local bridge data such as status.ovsBridge.name. Implementations should use status patches
or server-side apply field ownership for the fields each side owns, so normal retry-on-conflict handling is sufficient
and the object should not become a hot write target.
This avoids publishing PF IDs, host PCI addresses, or DPU-local bridge names in the Uplink spec. The DPU-Host publishes
the host interface MAC as the handoff point, and the DPU side uses the existing representor peer relationship to discover
the DPU-local representor and OVS bridge.
For gateway interfaces backed by VF, SF, or PF representors, OVN-Kubernetes continues to use the dynamic representor
derivation mechanisms used today rather than adding representor names to the Uplink API. Full mode derives the
representor from device details, and DPU mode uses existing switchdev/DPU representor discovery.
CUDN/UDN controller integration (OVNKube-Controller)¶
- When configuring the CUDN topology in OVN with shared gateway mode, the UDN controller reads the matching
UplinkStateand derives the gateway router configuration from that typed status. - If existing OVN-Kubernetes internals require an
L3GatewayConfig-like structure, the implementation should build it in memory fromUplinkState,Uplink, node, and CUDN context rather than serializing CUDN gateway data through a node annotation. - The CUDN remains a primary UDN. The external connectivity path uses OVN localnet plumbing for the CUDN's external logical switch port. OVN-Kubernetes configures that localnet port to use the normalized CUDN network name, and ovnkube-node configures the matching per-node OVS bridge mapping:
<normalized-cudn-network-name>:<resolved-ovs-bridge>
This mapping is the join point between OVN's logical localnet port and the Uplink selected on that node. In DPU
deployments, the same logical localnet network name is used, but the bridge mapping is reconciled by the DPU-side
component because OVS runs on the DPU.
Shared gateway datapath updates¶
The current openflow manager models one default bridge plus one optional secondary egress bridge. Each bridge is
represented by a BridgeConfiguration, and BridgeConfiguration already carries per-network state in its
netConfig map[string]*BridgeUDNConfiguration. This enhancement reuses that model but makes the non-default bridge set
dynamic and explicitly records which CUDN network is assigned to which resolved Uplink on the node.
Required updates:
- Keep the default bridge as the existing fixed
BridgeConfiguration. - Replace the single optional
externalGatewayBridgefield with a map of resolved uplink OVS bridges toBridgeConfigurationobjects. - Replace the single secondary-bridge flow cache with per-uplink-bridge flow caches keyed by the resolved bridge.
- Add a node-local assignment cache that maps each CUDN network name to the selected resolved
Uplinkand OVS bridge. The assignment cache carries the CUDN/Uplink relationship used for conflict detection, cleanup, bridge mappings, and status;BridgeConfigurationremains the per-OVS-bridge object. - When a CUDN selects a
Uplinkon the node, create or reuse that bridge'sBridgeConfigurationand add that CUDN'sBridgeUDNConfigurationonly to the resolved bridge. - A
BridgeConfigurationmay contain multiple CUDN network configs only for supported shared-routing-domain cases such astargetVRF: default. For per-CUDN VRF-Lite isolation or explicit non-default VRF targets, reject another active CUDN assignment to the same resolved bridge withUplinkConfigurationConflict. - Generate and apply ingress/egress flows for each active uplink bridge using the CUDN network configurations assigned to that bridge.
- Preserve existing default bridge flow behavior for clusters not using
Uplink.
Gateway configuration source¶
The existing k8s.ovn.org/l3-gateway-config node annotation remains the source of truth for the default network gateway
and existing legacy consumers. This enhancement does not add CUDN or non-default uplink entries to that annotation.
For CUDNs using Uplink, OVN-Kubernetes uses UplinkState as the typed gateway configuration source. The status object
carries the host interface, resolved OVS bridge, MAC, IP addresses, and default gateways needed to configure the CUDN
gateway router. OVN interface identifiers are derived from UplinkState, node, and CUDN context. This avoids
duplicating the same data in both a CRD status object and a node annotation, and avoids stale dual-source-of-truth
behavior.
In DPU deployments, UplinkState.status.ipAddresses is also the source for the host shared gateway IP used as the
advertised BGP next-hop. The DPU-local FRR peering IP is expected on the resolved bridge's LOCAL interface and is not
included in UplinkState.
Compatibility requirement:
- Existing
defaultkey behavior ink8s.ovn.org/l3-gateway-configremains unchanged. - Existing single egress-gateway fields remain supported.
- New CUDN uplink gateway data is not serialized into node annotations.
- Internal code that still expects an
L3GatewayConfigshape must derive that structure fromUplinkState,Uplink, node, and CUDN context for this feature.
Feature Compatibility¶
Local gateway mode¶
Uplink is not supported when OVN-Kubernetes is configured for local gateway mode in this OKEP. If a CUDN sets
spec.uplinks while the cluster uses local gateway mode, OVN-Kubernetes reports UplinkUnsupportedGatewayMode and does
not generate Uplink-backed gateway configuration for that CUDN.
Local gateway VRF-Lite continues to work as documented by the BGP OKEP for CUDNs that do not set spec.uplinks; the user
remains responsible for attaching the relevant IP interface to the matching CUDN VRF. Extending Uplink to automate that
local gateway interface selection and VRF attachment is a future goal.
EVPN¶
Uplink is not supported for CUDNs using EVPN transport in this OKEP. EVPN is a distinct external connectivity model and
is currently scoped to local gateway mode, while this OKEP defines Uplink behavior for shared gateway mode. If a CUDN sets
both spec.uplinks and spec.network.transport: EVPN, OVN-Kubernetes reports UplinkUnsupportedTransport and does not
generate Uplink-backed gateway configuration for that CUDN.
Shared gateway EVPN support and any future mapping between Uplink, EVPN VTEPs, and EVPN underlay selection should be
covered by a separate proposal.
Egress IP¶
Egress IP behavior for CUDNs should continue to follow existing shared gateway behavior. Supporting additional consumers
such as secondary EgressIP through Uplink is a future goal and is not part of this OKEP's initial scope.
Egress Service¶
Egress Service is not currently supported with CUDN. Future behavior can use the same Uplink bridge selection model once CUDN support exists.
MEG¶
MEG remains supported only for CDN and remains outside CUDN scope.
Service access¶
Ingress traffic into the selected uplink¶
Service flows must be configured on the OVS bridge selected by the CUDN on that node. The flow programming is per-CUDN and per-bridge: a CUDN bound to one uplink bridge gets its service ingress and return-path flows on that bridge, while a different CUDN bound to a different uplink bridge gets its own flows on its resolved bridge. Traffic entering the resolved bridge uplink will function like shared gateway does today for services, with traffic forwarded to the patch port on the gateway router of the respective CUDN. When service traffic enters the selected OVS bridge from that bridge's uplink port and the service belongs to a CUDN mapped to that bridge, bridge OpenFlow steers the packet directly to the CUDN gateway router patch port.
Traffic that enters the selected uplink destined for a non-CUDN service like the Kubernetes API is sent to the local host path and then handled through existing UDN service behavior.
Ingress traffic from other host interfaces¶
For CUDNs that use the default routing domain (targetVRF: default or no matching RouteAdvertisements), service traffic
may enter the node through another default-VRF host interface, including the normal shared gateway bridge. NodePort,
ExternalIP, or LoadBalancer traffic is DNAT'ed to the service cluster IP as it is today. The existing service packet mark
and IP rule path selects the UDN VRF for the forward lookup, and that UDN VRF service CIDR route must use the selected
uplink gateway interface. That route gets the post-DNAT packet onto the selected OVS bridge. OpenFlow on that bridge then
uses the preserved service mark to dispatch the packet to the CUDN's patch port and gateway router. The selected uplink
bridge must therefore have the same per-UDN service handling, masquerade, and return-path flows that the default shared
gateway bridge has for UDN services.
In the default routing-domain case, reply traffic does not need to be marked back into the CUDN VRF for routing. The forward path uses the UDN VRF lookup to reach the selected bridge, but after the reply returns from OVN through that bridge and the service/masquerade conntrack state restores the host-side tuple, the final host routing lookup can use the default routing domain where the relevant routes are intentionally shared.
For CUDNs using per-CUDN VRF-Lite isolation (targetVRF: auto), service ingress is intentionally scoped to interfaces and
routes in that CUDN VRF. Traffic that enters a different VRF is not implicitly steered into the CUDN VRF by spec.uplinks.
If the ingress interface is part of the same CUDN VRF, service DNAT and the VRF-local service route can steer traffic to
the selected uplink bridge, and the reply remains in that CUDN VRF because the selected uplink interface is enslaved to
it.
Because service reply traffic can re-enter the kernel through the selected uplink bridge while reverse-path lookup points at another route, OVN-Kubernetes must configure reverse-path filtering for the involved bridge/interface path consistently with existing shared gateway service handling.
If a future requirement needs stricter service isolation based on ingress interface or uplink, that should be handled by a
separate policy design rather than being implicit in spec.uplinks.
Pod service access from a CUDN using Uplink¶
Pods should still be able to access services on their CUDN internally. The selected uplink bridge must install the same per-UDN masquerade and service fallback flows used by shared gateway mode today, using the CUDN's own masquerade IPs and packet marks to keep networks identified.
Pods trying to access exposed Kubernetes services like the Kubernetes API service should still work through the existing UDN-enabled default service path. This requires the selected uplink bridge to have the relevant per-UDN service flows and the static FDB/MAC programming needed to send host-bound masqueraded traffic to the correct LOCAL or DPU host-representor path.
Failure and Cleanup Behavior¶
Uplink¶
- If the selected host interface is missing, or the resolved OVS bridge is missing or malformed on a node, the matching
UplinkStateis degraded. Retries continue, and local netlink/OVS change notifications or periodic resync requeue the Uplink when admin-owned host or OVS state changes. - Per-node failures are reported on the matching
UplinkState;Uplink.status.conditionsreports aggregate health only. - If an admin deletes or changes the OVS bridge, OVN-Kubernetes does not recreate or repair the bridge. It updates status and retries validation.
- If an admin requests deletion of a
Uplinkwhile one or more CUDNs reference it, OVN-Kubernetes keeps theUplinkfinalizer and deletion remains pending. This prevents accidental disruption of active CUDNs. Sincespec.uplinksis immutable in this OKEP, clearing the reference requires deleting or recreating the CUDN without thatUplink. Once no CUDNs reference theUplink, OVN-Kubernetes removes the finalizer and deletion can complete. - In Full mode, cleanup removes OVN-Kubernetes-owned CUDN VRF membership from the selected host interface when such membership was created, and removes OVN-Kubernetes-owned bridge mappings and OpenFlow for the CUDN. OVN-Kubernetes does not delete the OVS bridge, remove bridge ports, or restore admin-owned IP/VLAN/MTU/route state.
- In DPU-Host mode, cleanup removes OVN-Kubernetes-owned CUDN VRF membership from the selected host interface for the uplink when such membership was created. DPU-side cleanup is handled separately by the DPU-side reconciler.
- In DPU mode, cleanup removes OVN-Kubernetes-owned state only: generated FRR configuration for the per-CUDN VRF when
applicable, RouteImport-owned static routes from the CUDN gateway router, the resolved bridge
LOCALinterface's membership in the DPU-local route-import VRF when such membership was created, the DPU-local route-import VRF itself when no longer needed, CUDN bridge mappings, and CUDN OpenFlow. OVN-Kubernetes does not delete the OVS bridge, remove bridge ports, or remove admin-owned DPU-local peering IP/VLAN state.
CUDN Uplinks¶
- If an uplink for a node does not exist, the CUDN reports error status for that node, but OVN-Kubernetes does not halt configuration for all other nodes.
- If a CUDN references a non-existent
Uplink, CUDN reconciliation reportsUplinkNotFounduntil theUplinkis created or the CUDN is recreated with a valid reference. - If a
Uplinkbecomes valid later for a node, OVNKube-Controller and OVNKube-Node UDN reconciliation should succeed.
Testing Details¶
- Unit tests:
UplinkAPI validation for required node configs, supported types, host interface names, and selector shape.- selector resolution and per-node overlap handling.
Uplinkfinalizer behavior when CUDNs reference or stop referencing anUplink.UplinkStatestatus ownership and aggregation intoUplink.status.conditions.- CUDN validation for
spec.uplinks. - CUDN external gateway data is derived from
UplinkStatewithout mutating thek8s.ovn.org/l3-gateway-confignode annotation. - rejection/degraded status when multiple active CUDNs select the same effective OVS bridge on a node for unsupported distinct non-default VRF use.
- local gateway mode rejecting
spec.uplinkswithUplinkUnsupportedGatewayMode. - EVPN transport rejecting
spec.uplinkswithUplinkUnsupportedTransport. - localnet topology rejecting
spec.uplinksin this OKEP. - Node integration tests:
- host interface validation and pre-existing OVS bridge resolution.
- missing or ambiguous bridge layout status.
- host interface MAC address, IP address, default gateway, MTU, resolved OVS bridge, and OVS VLAN tag discovery from existing node state.
- full-mode RouteAdvertisements-driven VRF attachment, detachment, and cleanup for the selected host interface.
- RouteAdvertisements-driven UDN VRF default-route add/remove behavior for Uplink-backed networks.
- bridge mapping creation and cleanup for the normalized CUDN network name.
- DPU-Host publishing of selected host interface MAC, IPs, and default gateways.
- DPU-side host gateway representor detection from the DPU-Host-published MAC.
- DPU-side publishing of resolved OVS bridge name into
UplinkState. - DPU-Host RouteAdvertisements-driven VRF attachment, detachment, and cleanup for the selected host interface.
- DPU-side route-import VRF creation using the CUDN VRF name for
targetVRF: auto. - DPU-side route-import VRF enslaving and detachment of the resolved bridge's
LOCALinterface fortargetVRF: auto. - DPU-side cleanup for route-import VRF state.
- DPU-side OVS VLAN tag discovery from existing bridge gateway ports.
- RouteImport filtering of BGP routes by the selected uplink bridge link index.
- E2E tests:
- shared gateway VRF-Lite with two CUDNs on two distinct pre-created uplink bridges.
- heterogeneous nodes using different
Uplink.spec.nodeConfigsfor the sameUplink. - service access from a CUDN on a Uplink, including verification that per-UDN service flows are programmed on the resolved bridge and that CUDN services plus UDN-enabled default services such as the Kubernetes API are reachable.
- default-routing-domain service ingress where NodePort or LoadBalancer traffic enters through the normal shared gateway interface and reaches a service backed by an Uplink-backed CUDN through the selected uplink bridge.
- DPU shared gateway VRF-Lite with FRR peering in the DPU-local CUDN VRF and advertised next-hop set to the host shared gateway IP.
- route advertisement + ingress/egress datapath verification.
- Cross-feature coverage:
- MEG coexistence with legacy
EgressGWInterface. - default bridge behavior unchanged when no CUDN uses
spec.uplinks. - local gateway mode unchanged behavior for CUDNs without
spec.uplinks.
Documentation Details¶
- Add a user-facing guide for preparing OVS bridges and referencing the host-side gateway interface with
Uplink. - Document that bridge provisioning is external to OVN-Kubernetes; NMState is one valid way to prepare bridges.
- Update BGP/RouteAdvertisements docs to include VRF-Lite shared gateway workflows.
- Add troubleshooting docs for
UplinkandUplinkStateconditions.
Risks, Known Limitations and Mitigations¶
- Pre-provisioned bridge dependency.
- Mitigation: explicitly document the boundary and include examples using existing node configuration tooling such as NMState.
- Incorrect or ambiguous bridge layout.
- Mitigation: deterministic validation, node-local
UplinkStateconditions, and E2E coverage for malformed bridge layouts. - Incorrect default route behavior if the selected host interface has no usable default route.
- Mitigation: publish discovered
defaultGateways; allow operation with BGP-learned/imported routes when no default route is present. - Increased complexity from dynamic multi-bridge flow programming.
- Mitigation: incremental implementation and focused scale/perf testing.
- Service access breaks if per-UDN service flows, service CIDR routing, reverse-path filtering, or static FDB/MAC entries are only programmed for the default bridge.
- Mitigation: program UDN service, masquerade, route-steering, reverse-path filtering, and return-path flows on each CUDN's selected uplink bridge and add E2E coverage for CUDN service access, cross-interface default-VRF service ingress, and UDN-enabled default services.
- Dynamic UDN first-pod latency can increase when the first pod activates a CUDN that selects a
Uplink. - Mitigation: keep bridge validation and bridge mapping reconciliation incremental and idempotent. Since bridge lifecycle is external, OVN-Kubernetes does not perform bridge creation or IP migration on first pod startup.
- Legacy field/API overlap (
EgressGWInterfacevsUplink). - Mitigation: additive rollout and clear precedence documentation.
- Hot status object contention in large clusters.
- Mitigation: store per-node state in one
UplinkStateobject per uplink/node pair and keepUplink.statuslimited to aggregate conditions. - DPU representor or peer MAC discovery fails.
- Mitigation: report a clear per-node condition. The DPU-Host interface must have a discoverable peer representor that is attached to a pre-provisioned OVS bridge on the DPU.
- DPU route import cannot map BGP routes to the CUDN for
targetVRF: auto. - Mitigation: create a DPU-local route-import VRF using the CUDN VRF name and have RouteAdvertisements target that VRF for shared gateway VRF-Lite mode. For default routing-domain use, do not create a CUDN route-import VRF on the DPU.
- Multiple CUDNs select the same uplink bridge on a node for unsupported routing-domain combinations.
- Mitigation: allow sharing only for supported shared-routing-domain cases such as
targetVRF: default; reject or degrade mappings that would place distinct non-default CUDN VRFs on the same effective bridge.
Known limitations:
- A CUDN can reference only one
Uplinkin this OKEP. - One effective OVS bridge per node per CUDN.
- Only
OVSBridgeuplink node configs are supported. - One
Uplinkcan resolve to only one host-side gateway interface and one backing OVS bridge per node. - The selected host interface must exist on selected nodes and resolve to a pre-created OVS bridge that is correctly connected to the external network.
- Local gateway mode cannot use
spec.uplinksin this OKEP. - EVPN transport cannot use
spec.uplinksin this OKEP. - Localnet topology cannot use
spec.uplinksin this OKEP. - Uplink sharing for per-CUDN VRF-Lite isolation is not supported. Multiple CUDNs may share an effective OVS bridge only when using a supported shared routing domain such as the default VRF; overlapping subnets remain user misconfiguration.
OVN-Kubernetes Version Skew¶
Planned introduction: 1.4.
Backwards Compatibility¶
- Clusters not using
Uplinkhave no behavior change. - Existing default bridge and
EgressGWInterfacebehavior remains supported. - New CUDN fields are additive and optional.
- The
k8s.ovn.org/l3-gateway-configannotation is not extended for this feature.
Alternatives¶
- Keep all uplink handling outside OVN-Kubernetes.
- Rejected as the only model because CUDN, RouteAdvertisements, DPU route-import VRFs, bridge mappings, and gateway router configuration need a typed contract. Manually prepared networking with tools such as NMState remains the expected way to create the bridge itself.
- Use only node annotations for gateway data.
- Rejected: annotations are an untyped, shared surface and would extend the existing
l3-gateway-configannotation into a multi-network source of truth. - Put Uplink selection and lifecycle into RouteAdvertisements only.
- Rejected: the selected external path belongs to network connectivity intent on the CUDN. RouteAdvertisements decides whether that CUDN uses the default routing domain or per-CUDN VRF-Lite isolation for BGP route exchange.
- Keep the API named
ExternalBridge. - Rejected: the bridge is an implementation detail of the first uplink type.
Uplinkkeeps the API extensible for future non-bridge types.