Merge branch 'feat/add-private-eks' into 'master' (b0354f13) · Commits · to be continuous... / tools / aws-auth-provider

Dockerfile

+20 −2

Original line number	Diff line number	Diff line
		@@ -6,8 +6,26 @@ WORKDIR /code
		COPY pyproject.toml poetry.lock README.md /code/
		COPY ./aws_auth_provider /code/aws_auth_provider

		# Use ash and enable pipefail so commands using \| fail the RUN step on errors (fix DL4006)
		SHELL ["/bin/ash", "-o", "pipefail", "-c"]

		# Install dependencies and Session Manager plugin
		RUN apk upgrade --no-cache \
		&& pip install --no-cache-dir .
		&& apk add --no-cache \
		curl=8.14.1-r2 \
		gcompat=1.1.0-r4 \
		rpm=4.19.1.1-r2 \
		cpio=2.15-r0 \
		# Install pip packages
		&& pip install --no-cache-dir . \
		# Install Session Manager plugin (extract binary directly from RPM)
		&& curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/linux_64bit/session-manager-plugin.rpm" -o "session-manager-plugin.rpm" \
		&& rpm2cpio session-manager-plugin.rpm \| cpio -idmv \
		&& mv usr/local/sessionmanagerplugin/bin/session-manager-plugin /usr/local/bin/ \
		&& chmod +x /usr/local/bin/session-manager-plugin \
		&& rm -rf usr session-manager-plugin.rpm \
		# Verify installation
		&& session-manager-plugin --version

		EXPOSE ${PORT}
		# hadolint ignore=DL3025

README.md

+187 −0

Original line number	Diff line number	Diff line
		@@ -118,8 +118,169 @@ This API retrieves the [CodeArtifact repository endpoint](https://docs.aws.amazo
		\| `region` \| AWS region to use \| no _(can be retrieved from env)_ \|
		\| `env_ctx` \| the [environment context to consider](#the-notion-of-env_ctx) \| no _(can be guessed)_ \|

		### `GET /ssm/port-forward`

		This API starts an [AWS Systems Manager (SSM) port forwarding session](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-sessions-start.html#sessions-start-port-forwarding) to tunnel traffic from gitlab-ci to or through an EC2 instance

		Returns: Plain text URL to access the forwarded port (e.g., `http://aws-auth-provider:14240`)

		#### Query Parameters

		\| Name \| Description \| Required \|
		\|---------------\|---------------------------------------------------------------\|-----------------------\|
		\| `instance_id` \| EC2 instance ID to connect to \| yes \|
		\| `remote_port` \| Port on the remote instance \| yes \|
		\| `remote_host` \| Remote host for advanced port forwarding scenarios \| no \|
		\| `local_port` \| Local port to bind (auto-allocated if not specified) \| no \|
		\| `protocol` \| URL protocol (`http` or `https`, default: `http`) \| no \|
		\| `region` \| AWS region to use \| no _(can be retrieved from env)_ \|
		\| `env_ctx` \| the [environment context to consider](#the-notion-of-env_ctx) \| no _(can be guessed)_ \|
		\| `role_arn` \| AWS IAM role ARN to assume \| no _(can be retrieved from env)_ \|

		### `GET /kubeconfig`

		This API generates a complete kubeconfig file for an AWS EKS cluster with a valid authentication token.

		The generated kubeconfig includes:
		- Cluster endpoint and certificate authority
		- Pre-authenticated token (matches `aws eks get-token` format)
		- Configurable namespace and user name
		- Automatic private endpoint handling via SSM port forwarding

		The authentication token is generated using AWS STS and is compatible with EKS authentication requirements.

		#### Private Endpoint Support

		This endpoint automatically detects private EKS clusters (clusters with `endpointPublicAccess: false` and `endpointPrivateAccess: true`) and establishes an SSM port forwarding session to provide connectivity.

		How it works:
		1. The API checks the cluster's VPC configuration and automatically determines if it's a private cluster
		2. Then it establishes an SSM Session Manager tunnel through the specified EC2 instance to forward traffic to the private Kubernetes API server
		3. Then it configures the kubeconfig with appropriate TLS settings (insecure-skip-tls-verify for proxied connections)
		4. And validates the connection by making a test request to the Kubernetes API `/version` endpoint to verify the certificate

		Requirements for private clusters:
		- You must provide an `instance_id` parameter pointing to a running EC2 instance in the EKS cluster's VPC with SSM agent installed and configured
		- Appropriate IAM permissions for SSM Session Manager and Cluster Access
		- Instance must be reachable from the cluster's subnets

		You can also force SSM tunneling for public clusters by providing the `instance_id` parameter.

		#### Query Parameters

		\| Name \| Description \| Required \| Default \|
		\|----------------\|---------------------------------------------------------------\|-----------------------\|-----------------\|
		\| `cluster_name` \| Name of the EKS cluster \| yes \| - \|
		\| `region` \| AWS region where the cluster is located \| no _(can be retrieved from env)_ \| - \|
		\| `env_ctx` \| the [environment context to consider](#the-notion-of-env_ctx) \| no _(can be guessed)_ \| - \|
		\| `role_arn` \| AWS role ARN for authentication \| no _(can be retrieved from env)_ \| - \|
		\| `ttl_minutes` \| Time-to-live for the token in minutes (max: 15) \| no \| `15` \|
		\| `namespace` \| Default namespace for kubectl context \| no \| `default` \|
		\| `user_name` \| Username in the kubeconfig \| no \| `kubectl-user` \|
		\| `instance_id` \| EC2 instance ID for SSM port forwarding \| yes _(for private clusters only)_ \| - \|

		#### Example Usage

		```bash
		# Generate kubeconfig for a public cluster
		curl -s "http://aws-auth-provider/kubeconfig?cluster_name=my-cluster&region=us-east-1" > kubeconfig.yaml

		# Generate kubeconfig for a private cluster (requires instance_id)
		curl -s "http://aws-auth-provider/kubeconfig?cluster_name=my-private-cluster&region=us-east-1&instance_id=i-1234567890abcdef0" > kubeconfig.yaml

		# Force SSM tunneling for a public cluster with a specific instance
		curl -s "http://aws-auth-provider/kubeconfig?cluster_name=my-cluster&instance_id=i-1234567890abcdef0" > kubeconfig.yaml

		# Use the kubeconfig
		export KUBECONFIG=kubeconfig.yaml
		kubectl get nodes
		```

		#### Example Response

		Public Cluster:
		```yaml
		apiVersion: v1
		kind: Config
		clusters:
		- name: my-cluster
		cluster:
		server: https://ABC123.gr7.us-east-1.eks.amazonaws.com
		certificate-authority-data: LS0tLS1CRUdJTi...
		contexts:
		- name: my-cluster-context
		context:
		cluster: my-cluster
		user: kubectl-user
		namespace: default
		current-context: my-cluster-context
		users:
		- name: kubectl-user
		user:
		token: k8s-aws-v1.aHR0cHM6Ly9zdHMuYW1hem9u...
		preferences: {}
		metadata:
		token-ttl-minutes: 60
		token-expires-at: '2025-10-03T12:00:00Z'
		cluster-arn: arn:aws:eks:us-east-1:123456789012:cluster/my-cluster
		cluster-version: '1.33'
		cluster-status: ACTIVE
		authentication-mode: API
		endpoint-public-access: true
		endpoint-private-access: false
		ssm-forwarded: false
		note: If authentication fails, ensure your AWS IAM user/role is mapped in the cluster's access entries (for API mode) or aws-auth ConfigMap (for CONFIG_MAP mode)
		```

		Private Cluster (with SSM Port Forwarding):
		```yaml
		apiVersion: v1
		kind: Config
		clusters:
		- name: my-private-cluster
		cluster:
		server: https://localhost:12345
		insecure-skip-tls-verify: true
		contexts:
		- name: my-private-cluster-context
		context:
		cluster: my-private-cluster
		user: kubectl-user
		namespace: default
		current-context: my-private-cluster-context
		users:
		- name: kubectl-user
		user:
		token: k8s-aws-v1.aHR0cHM6Ly9zdHMuYW1hem9u...
		preferences: {}
		metadata:
		token-ttl-minutes: 15
		token-expires-at: '2025-10-05T12:15:00Z'
		cluster-arn: arn:aws:eks:us-east-1:123456789012:cluster/my-private-cluster
		cluster-version: '1.33'
		cluster-status: ACTIVE
		authentication-mode: API
		endpoint-public-access: false
		endpoint-private-access: true
		ssm-forwarded: true
		original-endpoint: https://ABC123.gr7.us-east-1.eks.amazonaws.com
		tls-hostname-verification: disabled
		note: This is a PRIVATE cluster. SSM port forwarding has been established. TLS hostname verification is disabled (insecure-skip-tls-verify: true) because traffic is proxied through localhost. Ensure your AWS IAM user/role is mapped in the cluster's access entries (for API mode) or aws-auth ConfigMap (for CONFIG_MAP mode). The SSM session will remain active as long as the aws-auth-provider service is running.
		authentication-mode: API
		```

		#### Notes

		- The generated token TTL can be configured from 1 to 15 minutes (default: 15 minutes)
		- Token TTL is capped at 15 minutes for security best practices (as recommended by AWS)
		- Ensure your AWS credentials have `eks:DescribeCluster` permission for the cluster and permissions to access the cluster



		Environment Variables:
		- `API_HOST`: Host for generated URLs (default: `aws-auth-provider`)
		- `PORT_RANGE_START`: Minimum port for auto-allocation (default: `10000`)
		- `PORT_RANGE_END`: Maximum port for auto-allocation (default: `20000`)

		## Use in GitLab CI

		@@ -164,4 +325,30 @@ docker-build-step1:
		# build and push my image
		- docker build --tag ecr_registry/my-image:latest .
		- docker push ecr_registry/my-image:latest

		deploy-to-eks:
		image: bitnami/kubectl:latest
		services:
		# add AWS Auth Provider as a service
		- name: $CI_REGISTRY/to-be-continuous/tools/aws-auth-provider:latest
		alias: aws-auth-provider
		variables:
		EKS_CLUSTER_NAME: my-production-cluster
		# secrets have to be explicitly declared in the YAML to be exported to the service
		AWS_JWT: "$AWS_JWT"
		id_tokens:
		# required by the AWS auth provider service (OIDC authentication method)
		AWS_JWT:
		aud: "https://sts.amazonaws.com"
		before_script:
		# retrieve kubeconfig from AWS Auth Provider
		- curl -s -S -f "http://aws-auth-provider/kubeconfig?cluster_name=$EKS_CLUSTER_NAME&env_ctx=PROD" > /tmp/kubeconfig.yaml
		- export KUBECONFIG=/tmp/kubeconfig.yaml
		# verify connection
		- kubectl cluster-info
		- kubectl get nodes
		script:
		# deploy your application
		- kubectl apply -f k8s/deployment.yaml
		- kubectl rollout status deployment/my-app
		```

aws_auth_provider/kubeconfig.py

0 → 100644

+431 −0

File added.

Preview size limit exceeded, changes collapsed.

aws_auth_provider/main.py

+60 −92

Original line number	Diff line number	Diff line
		import base64
		import json
		import logging
		import os
		import re
		import tempfile
		from functools import cache
		from http import HTTPStatus
		from typing import Literal, Optional
		from typing import Literal

		import boto3
		from fastapi import FastAPI, HTTPException, Query, Request, Response
		@@ -14,96 +13,16 @@ from fastapi.exceptions import RequestValidationError
		from fastapi.responses import PlainTextResponse
		from starlette.exceptions import HTTPException as StarletteHTTPException

		logger = logging.getLogger("aws-auth-provider")
		app = FastAPI()


		# manages the different AWS authentication methods
		def configure_boto(env_ctx: str = None, region: str = None, role_arn: str = None):
		# auto-determine env type
		if not env_ctx:
		env_ctx = guess_env_ctx()

		# set region
		if region is None:
		region = (
		getenv_cleared(f"AWS_{env_ctx}_REGION")
		or getenv_cleared("AWS_REGION")
		or getenv_cleared("AWS_DEFAULT_REGION")
		)
		if not region:
		logger.error("AWS region not found")
		raise HTTPException(status_code=400, detail="AWS region not found")
		os.environ["AWS_DEFAULT_REGION"] = region

		# determine auth method
		jwt_token = os.environ.get("AWS_JWT")
		if role_arn is None:
		role_arn = getenv_cleared(f"AWS_{env_ctx}_OIDC_ROLE_ARN") or getenv_cleared(
		"AWS_OIDC_ROLE_ARN"
		)
		if jwt_token and role_arn:
		# Assume Role with Web Identity Provider
		# see: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#assume-role-with-web-identity-provider
		logger.info("Auth method: STS Assume Role with Web Identity Provider")
		with tempfile.NamedTemporaryFile(
		mode="w", encoding="utf-8", delete=False
		) as token_file:
		token_file.write(jwt_token)
		token_file.close()
		os.environ["AWS_ROLE_ARN"] = role_arn
		os.environ["AWS_WEB_IDENTITY_TOKEN_FILE"] = token_file.name
		os.environ["AWS_ROLE_SESSION_NAME"] = (
		f"GitLabRunner-{os.getenv('CI_PROJECT_ID')}-{os.getenv('CI_PIPELINE_ID')}"
		)
		return
		from . import kubeconfig, ssm
		from .utils import configure_boto, getenv_cleared

		access_key_id = getenv_cleared(f"AWS_{env_ctx}_ACCESS_KEY_ID") or getenv_cleared(
		"AWS_DEFAULT_ACCESS_KEY_ID"
		logger = logging.getLogger("aws-auth-provider")
		setting_level = os.getenv("LOG_LEVEL", "INFO").upper()
		logging.basicConfig(
		level=setting_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
		)
		secret_access_key = getenv_cleared(
		f"AWS_{env_ctx}_SECRET_ACCESS_KEY"
		) or getenv_cleared("AWS_DEFAULT_SECRET_ACCESS_KEY")
		if access_key_id and secret_access_key:
		# see: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables
		logger.info("Auth method: basic (access key ID & secret access key)")
		os.environ["AWS_ACCESS_KEY_ID"] = access_key_id
		os.environ["AWS_SECRET_ACCESS_KEY"] = secret_access_key
		return

		logger.error("Auth method not found: no credentials available")
		raise HTTPException(status_code=401, detail="AWS credentials not found")


		def guess_env_ctx() -> str:
		# guess from GitLab CI predefined vars
		ref_name = os.getenv("CI_COMMIT_REF_NAME", "-")
		prod_ref = os.getenv("PROD_REF", "/^(master\|main)$/").strip("/")
		if re.match(prod_ref, ref_name):
		# could be staging or prod
		if os.getenv("CI_JOB_STAGE", "-") in [
		"publish",
		"infra-prod",
		"production",
		".post",
		]:
		return "PROD"
		return "STAGING"

		integ_ref = os.getenv("INTEG_REF", "/^develop$/").strip("/")
		if re.match(integ_ref, ref_name):
		return "INTEG"

		return "REVIEW"


		# Workaround the GitLab bug with forced exposed variables:
		# variables:
		# SOMEVAR: "$SOMEVAR"
		# os.getenv("SOMEVAR") may have value '$SOMEVAR' if the variable is not defined as a project variable
		def getenv_cleared(name: str) -> Optional[str]:
		value = os.getenv(name)
		return None if value == f"${name}" else value
		logger.setLevel(setting_level)
		app = FastAPI()


		@app.get("/health", response_class=PlainTextResponse)
		@@ -133,7 +52,7 @@ async def request_validation_exception_handler(
		) -> Response:
		return PlainTextResponse(
		status_code=HTTPStatus.UNPROCESSABLE_ENTITY,
		content=jsonable_encoder(exc.errors()),
		content=json.dumps(jsonable_encoder(exc.errors()), indent=2),
		)


		@@ -269,3 +188,52 @@ def get_codeartifact_repository_endpoint(
		)

		return response["repositoryEndpoint"]


		@app.get("/ssm/port-forward", response_class=PlainTextResponse)
		def start_ssm_port_forward_endpoint(
		env_ctx: str = Query(default=None, alias="env_ctx"),
		region: str = Query(default=None, alias="region"),
		role_arn: str = Query(default=None, alias="role_arn"),
		instance_id: str = Query(alias="instance_id"),
		remote_port: int = Query(alias="remote_port"),
		remote_host: str = Query(default=None, alias="remote_host"),
		local_port: int = Query(default=None, alias="local_port"),
		protocol: Literal["http", "https"] = Query(default="http", alias="protocol"),
		) -> PlainTextResponse:
		"""Start an SSM port forwarding session."""
		result = ssm.start_ssm_port_forward(
		env_ctx=env_ctx,
		region=region,
		role_arn=role_arn,
		instance_id=instance_id,
		remote_port=remote_port,
		remote_host=remote_host,
		local_port=local_port,
		protocol=protocol,
		)
		return PlainTextResponse(content=result.url, status_code=result.status_code)


		@app.get("/kubeconfig", response_class=PlainTextResponse)
		def generate_kubeconfig(
		cluster_name: str = Query(alias="cluster_name"),
		env_ctx: str = Query(default=None, alias="env_ctx"),
		region: str = Query(default=None, alias="region"),
		role_arn: str = Query(default=None, alias="role_arn"),
		ttl_minutes: int = Query(default=15, ge=1, le=15, alias="ttl_minutes"),
		namespace: str = Query(default="default", alias="namespace"),
		user_name: str = Query(default="kubectl-user", alias="user_name"),
		instance_id: str = Query(default=None, alias="instance_id"),
		) -> str:
		"""Generate a kubeconfig file by retrieving cluster information from AWS EKS."""
		return kubeconfig.generate_kubeconfig(
		cluster_name=cluster_name,
		env_ctx=env_ctx,
		region=region,
		role_arn=role_arn,
		ttl_minutes=ttl_minutes,
		namespace=namespace,
		user_name=user_name,
		instance_id=instance_id,
		)

aws_auth_provider/proxy.py

0 → 100644

+182 −0

Original line number	Diff line number	Diff line
		import logging
		import socket
		import threading
		from typing import Optional

		logger = logging.getLogger("aws-auth-provider.proxy")


		class TcpProxy:
		"""
		A simple TCP proxy that forwards traffic from one port to another.
		Allows binding to 0.0.0.0 while the session-manager-plugin binds to 127.0.0.1.
		"""

		def __init__(
		self, listen_host: str, listen_port: int, target_host: str, target_port: int
		):
		"""
		Initialize the TCP proxy.

		Args:
		listen_host: Host to listen on (typically 0.0.0.0)
		listen_port: Port to listen on
		target_host: Target host to forward to (typically 127.0.0.1)
		target_port: Target port to forward to
		"""
		self.listen_host = listen_host
		self.listen_port = listen_port
		self.target_host = target_host
		self.target_port = target_port
		self.server_socket: Optional[socket.socket] = None
		self.running = False
		self.server_thread: Optional[threading.Thread] = None

		def _forward_data(self, source: socket.socket, destination: socket.socket):
		"""
		Forward data from source socket to destination socket.

		Args:
		source: Source socket to read from
		destination: Destination socket to write to
		"""
		try:
		while self.running:
		data = source.recv(4096)
		if not data:
		break
		destination.sendall(data)
		except Exception as e:
		logger.debug(f"Connection closed: {e}")
		finally:
		try:
		source.close()
		except Exception:
		pass
		try:
		destination.close()
		except Exception:
		pass

		def _handle_client(self, client_socket: socket.socket, client_address):
		"""
		Handle a client connection by creating a connection to the target
		and forwarding data bidirectionally.

		Args:
		client_socket: Client socket
		client_address: Client address
		"""
		target_socket = None
		try:
		logger.debug(f"New connection from {client_address}")

		# Connect to target
		target_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
		target_socket.connect((self.target_host, self.target_port))
		logger.debug(f"Connected to target {self.target_host}:{self.target_port}")

		# Create two threads for bidirectional forwarding
		client_to_target = threading.Thread(
		target=self._forward_data,
		args=(client_socket, target_socket),
		daemon=True,
		)
		target_to_client = threading.Thread(
		target=self._forward_data,
		args=(target_socket, client_socket),
		daemon=True,
		)

		client_to_target.start()
		target_to_client.start()

		# Wait for both threads to complete
		client_to_target.join()
		target_to_client.join()

		except Exception as e:
		logger.error(f"Error handling client: {e}")
		finally:
		if target_socket:
		try:
		target_socket.close()
		except Exception:
		pass
		try:
		client_socket.close()
		except Exception:
		pass

		def start(self):
		"""Start the proxy server."""
		if self.running:
		logger.warning("Proxy is already running")
		return

		try:
		self.server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
		self.server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
		self.server_socket.bind((self.listen_host, self.listen_port))
		self.server_socket.listen(5)
		self.running = True

		logger.info(
		f"Proxy listening on {self.listen_host}:{self.listen_port}, forwarding to {self.target_host}:{self.target_port}"
		)

		def accept_connections():
		while self.running:
		try:
		self.server_socket.settimeout(1.0)
		try:
		client_socket, client_address = self.server_socket.accept()
		except socket.timeout:
		continue

		# Handle each client in a separate thread
		client_thread = threading.Thread(
		target=self._handle_client,
		args=(client_socket, client_address),
		daemon=True,
		)
		client_thread.start()
		except Exception as e:
		if self.running:
		logger.error(f"Error accepting connection: {e}")

		self.server_thread = threading.Thread(
		target=accept_connections, daemon=True
		)
		self.server_thread.start()

		except Exception as e:
		logger.error(f"Failed to start proxy: {e}")
		self.running = False
		if self.server_socket:
		try:
		self.server_socket.close()
		except Exception:
		pass
		raise

		def stop(self):
		"""Stop the proxy server."""
		if not self.running:
		return

		logger.info(f"Stopping proxy on {self.listen_host}:{self.listen_port}")
		self.running = False

		if self.server_socket:
		try:
		self.server_socket.close()
		except Exception:
		pass

		if self.server_thread:
		self.server_thread.join(timeout=2)

		def is_running(self) -> bool:
		"""Check if the proxy is currently running."""
		return self.running