Connecting to Kubeflow Pipelines using the SDK client
This guide demonstrates how to connect to Kubeflow Pipelines using the Kubeflow Pipelines SDK client, and how to configure the SDK client using environment variables.
The Kubeflow Pipelines REST API is available at the same endpoint as the Kubeflow Pipelines user interface (UI). The SDK client can send requests to this endpoint to upload pipelines, create pipeline runs, schedule recurring runs, and more.
Before you begin
- You need a Kubeflow Pipelines deployment using one of the installation options.
- Install the Kubeflow Pipelines SDK.
Connect to Kubeflow Pipelines from outside your cluster
Kubeflow distributions secure the Kubeflow Pipelines public endpoint with authentication and authorization. Since Kubeflow distributions can have different authentication and authorization requirements, the steps needed to connect to your Kubeflow Pipelines instance might be different depending on the Kubeflow distribution you installed. Refer to documentation for your Kubeflow distribution:
- Connecting to Kubeflow Pipelines on Google Cloud using the SDK
- Kubeflow Pipelines on AWS
- Authentication using OIDC in Azure
- Pipelines on IBM Cloud Kubernetes Service (IKS)
For Kubeflow Pipelines standalone and Google Cloud AI Platform Pipelines, you can also connect to the API via kubectl port-forward
.
Kubeflow Pipelines standalone deploys a Kubernetes service named ml-pipeline-ui
in your Kubernetes cluster without extra authentication.
You can use kubectl port-forward to port forward the Kubernetes service locally to your laptop outside of the cluster:
# Change the namespace if you deployed Kubeflow Pipelines in a different
# namespace.
$ kubectl port-forward svc/ml-pipeline-ui 3000:80 --namespace kubeflow
You can verify that port forwarding is working properly by visiting http://localhost:3000 in your browser. If port forwarding is working properly, the Kubeflow Pipelines UI appears.
Run the following python code to instantiate the kfp.Client
:
import kfp
client = kfp.Client(host='http://localhost:3000')
print(client.list_experiments())
Note, for Kubeflow Pipelines in multi-user mode, you cannot access the API using kubectl port-forward because it requires authentication. Refer to distribution specific documentation as recommended above.
Connect to Kubeflow Pipelines from the same cluster
Non-multi-user mode
As mentioned above, the Kubeflow Pipelines API Kubernetes service is ml-pipeline-ui
.
Using Kubernetes standard mechanisms to discover the service, you can access ml-pipeline-ui
service from a Pod in the same namespace by DNS name:
import kfp
client = kfp.Client(host='http://ml-pipeline-ui:80')
print(client.list_experiments())
Or, you can access ml-pipeline-ui
service by using environment variables:
import kfp
import os
host = os.getenv('ML_PIPELINE_UI_SERVICE_HOST')
port = os.getenv('ML_PIPELINE_UI_SERVICE_PORT')
client = kfp.Client(host=f'http://{host}:{port}')
print(client.list_experiments())
When accessing Kubeflow Pipelines from a Pod in a different namespace, you must access by the service name and the namespace:
import kfp
namespace = 'kubeflow' # or the namespace you deployed Kubeflow Pipelines
client = kfp.Client(host=f'http://ml-pipeline-ui.{namespace}:80')
print(client.list_experiments())
Multi-User mode
Note, multi-user mode technical details were put in the How in-cluster authentication works section below.
Choose your use-case from one of the options below:
-
Access Kubeflow Pipelines from Jupyter notebook
In order to access Kubeflow Pipelines from Jupyter notebook, an additional per namespace (profile) manifest is required:
apiVersion: kubeflow.org/v1alpha1 kind: PodDefault metadata: name: access-ml-pipeline namespace: "<YOUR_USER_PROFILE_NAMESPACE>" spec: desc: Allow access to Kubeflow Pipelines selector: matchLabels: access-ml-pipeline: "true" volumes: - name: volume-kf-pipeline-token projected: sources: - serviceAccountToken: path: token expirationSeconds: 7200 audience: pipelines.kubeflow.org volumeMounts: - mountPath: /var/run/secrets/kubeflow/pipelines name: volume-kf-pipeline-token readOnly: true env: - name: KF_PIPELINES_SA_TOKEN_PATH value: /var/run/secrets/kubeflow/pipelines/token
After the manifest is applied, newly created Jupyter notebook contains an additional option in the configurations section. Read more about configurations in the Jupyter notebook server.
Note, Kubeflow
kfp.Client
expects token either inKF_PIPELINES_SA_TOKEN_PATH
environment variable or mounted to/var/run/secrets/kubeflow/pipelines/token
. Do not change these values in the manifest. Similarly,audience
should not be modified as well. No additional setup is required to refresh tokens.Remember the setup has to be repeated per each namespace (profile) that should have access to Kubeflow Pipelines API from within Jupyter notebook.
-
Access Kubeflow Pipelines from within any Pod
In this case, the configuration is almost similar to the Jupyter Notebook case described above. The Pod manifest has to be extended with projected volume and mounted into either
KF_PIPELINES_SA_TOKEN_PATH
or/var/run/secrets/kubeflow/pipelines/token
.Manifest below shows example Pod with token mounted into
/var/run/secrets/kubeflow/pipelines/token
:apiVersion: v1 kind: Pod metadata: name: access-kfp-example namespace: my-namespace spec: containers: - image: my-image:latest name: access-kfp-example volumeMounts: - mountPath: /var/run/secrets/kubeflow/pipelines name: volume-kf-pipeline-token readOnly: true serviceAccountName: default-editor volumes: - name: volume-kf-pipeline-token projected: sources: - serviceAccountToken: path: token expirationSeconds: 7200 audience: pipelines.kubeflow.org
Note that this example uses
default-editor
inmy-namespace
as the service account identity, but you can configure to use any service account that runs in your Pod. You need to bind service account to cluster rolekubeflow-pipelines-edit
orkubeflow-pipelines-view
documented in view-edit-cluster-roles.yaml.
Managing access to Kubeflow Pipelines API across namespaces
As already mentioned, access to Kubeflow Pipelines API requires per namespace setup. Alternatively, you can configure the access in a single namespace and allow other namespaces to access Kubeflow Pipelines API through it.
Note, the examples below assume that namespace-1
is a namespace (profile) that will be granted access to Kubeflow Pipelines API
through the namespace-2
namespace. The namespace-2
should already be configured to access Kubeflow Pipelines API.
Cross-namespace access can be achieved in two ways:
-
With additional RBAC settings.
This option requires that only
namespace-2
has to havePodDefault
manifest configured.Access is granted by giving
namespace-1:ServiceAccount/default-editor
theClusterRole/kubeflow-edit
innamespace-2
:apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kubeflow-edit-namespace-1 namespace: namespace-2 roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kubeflow-edit subjects: - kind: ServiceAccount name: default-editor namespace: namespace-1
-
By sharing access to the other profile.
In this scenario, access is granted by
namespace-2
addingnamespace-1
as a
contributor. Specifically, the owner of thenamespace-2
uses Kubeflow UI “Manage contributors” page. In the “Contributors to your namespace” textbox he adds email address associated with thenamespace-1
.
How Multi-User mode in-cluster authentication works
When calling Kubeflow Pipelines API in the same cluster, Kubeflow Pipelines SDK authenticates itself as your Pod’s service account in your namespace using ServiceAccountToken projection. This is where a verifiable token with a limited lifetime is being injected into a Pod (e.g. Jupyter notebook’s).
Then Kubeflow Pipelines SDK uses this token to authorize against Kubeflow Pipelines API.
It is important to understand that serviceAccountToken
method respects the Kubeflow Pipelines RBAC,
and does not allow access beyond what the ServiceAcount running the notebook Pod has.
More details about PodDefault
can be found here.
Configure SDK client by environment variables
It’s usually beneficial to configure the Kubeflow Pipelines SDK client using Kubeflow Pipelines environment variables,
so that you can initiate kfp.Client
instances without any explicit arguments.
For example, when the API endpoint is http://localhost:3000, run the following to configure environment variables in bash:
export KF_PIPELINES_ENDPOINT=http://localhost:3000
Or configure in a Jupyter Notebook by using the IPython built-in %env
magic command:
%env KF_PIPELINES_ENDPOINT=http://localhost:3000
Then you can use the SDK client without explicit arguments.
import kfp
# When not specified, host defaults to env var KF_PIPELINES_ENDPOINT.
# This is now equivalent to `client = kfp.Client(host='http://localhost:3000')`
client = kfp.Client()
print(client.list_experiments())
Refer to more configurable environment variables here.
Next Steps
- Using the Kubeflow Pipelines SDK
- Kubeflow Pipelines SDK Reference
- Experiment with the Kubeflow Pipelines API
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.