deploy flink
This commit is contained in:
parent
f6f464dbae
commit
c9f681e44b
235
cluster/manifests/freeleaps-data-platform/flink/README.md
Normal file
235
cluster/manifests/freeleaps-data-platform/flink/README.md
Normal file
@ -0,0 +1,235 @@
|
||||
# Flink High Availability Cluster Deployment
|
||||
|
||||
## Overview
|
||||
This project uses Apache Flink Kubernetes Operator to deploy a high availability Flink cluster with persistent storage and automatic failover capabilities.
|
||||
|
||||
## Component Architecture
|
||||
- **JobManager**: 2 replicas with high availability configuration
|
||||
- **TaskManager**: 3 replicas for distributed processing
|
||||
- **High Availability**: Kubernetes-based HA with persistent storage
|
||||
- **Checkpointing**: Persistent checkpoints and savepoints storage
|
||||
|
||||
## File Description
|
||||
|
||||
### 1. flink-operator-v2.yaml
|
||||
Flink Kubernetes Operator deployment configuration:
|
||||
- Operator deployment in `flink-system` namespace
|
||||
- RBAC configuration for cluster-wide permissions
|
||||
- Health checks and resource limits
|
||||
- Enhanced CRD definitions with additional printer columns
|
||||
|
||||
### 2. flink-crd.yaml
|
||||
Custom Resource Definitions for Flink:
|
||||
- FlinkDeployment CRD
|
||||
- FlinkSessionJob CRD
|
||||
- Required for Flink Operator to function
|
||||
|
||||
### 3. ha-flink-cluster-v2.yaml
|
||||
Production-ready HA Flink cluster configuration:
|
||||
- 2 JobManager replicas with HA enabled
|
||||
- 3 TaskManager replicas with anti-affinity rules
|
||||
- Persistent storage for HA data, checkpoints, and savepoints
|
||||
- Memory and CPU resource allocation
|
||||
- Exponential delay restart strategy
|
||||
- Proper volume mounts and storage configuration
|
||||
|
||||
### 4. simple-ha-flink-cluster.yaml
|
||||
Simplified HA Flink cluster configuration:
|
||||
- Uses ephemeral storage to avoid PVC binding issues
|
||||
- Basic HA configuration for testing and development
|
||||
- Minimal resource requirements
|
||||
- Recommended for development and testing
|
||||
|
||||
### 5. flink-storage.yaml
|
||||
Storage and RBAC configuration:
|
||||
- PersistentVolumeClaims for HA data, checkpoints, and savepoints
|
||||
- ServiceAccount and RBAC permissions for Flink cluster
|
||||
- Azure Disk storage class configuration with correct access modes
|
||||
|
||||
### 6. flink-rbac.yaml
|
||||
Enhanced RBAC configuration:
|
||||
- Complete permissions for Flink HA functionality
|
||||
- Both namespace-level and cluster-level permissions
|
||||
- Includes watch permissions for HA operations
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### 1. Install Flink Operator
|
||||
```bash
|
||||
# Apply Flink Operator configuration
|
||||
kubectl apply -f flink-operator-v2.yaml
|
||||
|
||||
# Verify operator installation
|
||||
kubectl get pods -n flink-system
|
||||
```
|
||||
|
||||
### 2. Create Storage Resources (Optional - for production)
|
||||
```bash
|
||||
# Apply storage configuration
|
||||
kubectl apply -f flink-storage.yaml
|
||||
|
||||
# Verify PVC creation
|
||||
kubectl get pvc -n freeleaps-data-platform
|
||||
```
|
||||
|
||||
### 3. Deploy HA Flink Cluster
|
||||
```bash
|
||||
# Option A: Deploy with persistent storage (production)
|
||||
kubectl apply -f ha-flink-cluster-v2.yaml
|
||||
|
||||
# Option B: Deploy with ephemeral storage (development/testing)
|
||||
kubectl apply -f simple-ha-flink-cluster.yaml
|
||||
|
||||
# Check deployment status
|
||||
kubectl get flinkdeployments -n freeleaps-data-platform
|
||||
kubectl get pods -n freeleaps-data-platform -l app=flink
|
||||
```
|
||||
|
||||
## High Availability Features
|
||||
- **JobManager HA**: 2 JobManager replicas with Kubernetes-based leader election
|
||||
- **Persistent State**: Checkpoints and savepoints stored on persistent volumes
|
||||
- **Automatic Failover**: Exponential delay restart strategy with backoff
|
||||
- **Pod Anti-affinity**: Ensures components are distributed across different nodes
|
||||
- **Storage Persistence**: HA data, checkpoints, and savepoints persist across restarts
|
||||
|
||||
## Network Configuration
|
||||
- **JobManager**: Port 8081 (Web UI), 6123 (RPC), 6124 (Blob Server)
|
||||
- **TaskManager**: Port 6121 (Data), 6122 (RPC), 6126 (Metrics)
|
||||
- **Service Type**: ClusterIP for internal communication
|
||||
|
||||
## Storage Configuration
|
||||
- **HA Data**: 10Gi for high availability metadata
|
||||
- **Checkpoints**: 20Gi for application checkpoints
|
||||
- **Savepoints**: 20Gi for manual savepoints
|
||||
- **Storage Class**: azure-disk-std-ssd-lrs
|
||||
- **Access Mode**: ReadWriteOnce (Azure Disk limitation)
|
||||
|
||||
## Monitoring and Operations
|
||||
- **Health Checks**: Built-in readiness and liveness probes
|
||||
- **Web UI**: Accessible through JobManager service
|
||||
- **Metrics**: Exposed on port 8080 for Prometheus collection
|
||||
- **Logging**: Centralized logging through Kubernetes
|
||||
|
||||
## Configuration Details
|
||||
|
||||
### High Availability Settings
|
||||
- **Type**: kubernetes (native Kubernetes HA)
|
||||
- **Storage**: Persistent volume for HA metadata
|
||||
- **Cluster ID**: ha-flink-cluster-v2
|
||||
|
||||
### Checkpointing Configuration
|
||||
- **Interval**: 60 seconds
|
||||
- **Timeout**: 10 minutes
|
||||
- **Min Pause**: 5 seconds
|
||||
- **Backend**: Filesystem with persistent storage
|
||||
|
||||
### Resource Allocation
|
||||
- **JobManager**: 0.5 CPU, 1024MB memory (HA), 1.0 CPU, 1024MB memory (Simple)
|
||||
- **TaskManager**: 0.5 CPU, 2048MB memory (HA), 2.0 CPU, 2048MB memory (Simple)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues and Solutions
|
||||
|
||||
#### 1. PVC Binding Issues
|
||||
```bash
|
||||
# Check PVC status
|
||||
kubectl get pvc -n freeleaps-data-platform
|
||||
|
||||
# PVC stuck in Pending state - usually due to:
|
||||
# - Insufficient storage quota
|
||||
# - Wrong access mode (ReadWriteMany not supported by Azure Disk)
|
||||
# - Storage class not available
|
||||
|
||||
# Solution: Use ReadWriteOnce access mode or ephemeral storage
|
||||
```
|
||||
|
||||
#### 2. Pod CrashLoopBackOff
|
||||
```bash
|
||||
# Check pod status
|
||||
kubectl get pods -n freeleaps-data-platform -l app=flink
|
||||
|
||||
# Check pod logs
|
||||
kubectl logs <pod-name> -n freeleaps-data-platform
|
||||
|
||||
# Check pod events
|
||||
kubectl describe pod <pod-name> -n freeleaps-data-platform
|
||||
```
|
||||
|
||||
#### 3. ServiceAccount Issues
|
||||
```bash
|
||||
# Verify ServiceAccount exists
|
||||
kubectl get serviceaccount -n freeleaps-data-platform
|
||||
|
||||
# Check RBAC permissions
|
||||
kubectl get rolebinding -n freeleaps-data-platform
|
||||
```
|
||||
|
||||
#### 4. Storage Path Issues
|
||||
```bash
|
||||
# Ensure storage paths match volume mounts
|
||||
# For persistent storage: /opt/flink/ha-data, /opt/flink/checkpoints
|
||||
# For ephemeral storage: /tmp/flink/ha-data, /tmp/flink/checkpoints
|
||||
```
|
||||
|
||||
### Diagnostic Commands
|
||||
```bash
|
||||
# Check Flink Operator logs
|
||||
kubectl logs -n flink-system -l app.kubernetes.io/name=flink-kubernetes-operator
|
||||
|
||||
# Check Flink cluster status
|
||||
kubectl describe flinkdeployment <cluster-name> -n freeleaps-data-platform
|
||||
|
||||
# Check pod events
|
||||
kubectl get events -n freeleaps-data-platform --sort-by='.lastTimestamp'
|
||||
|
||||
# Check storage status
|
||||
kubectl get pvc -n freeleaps-data-platform
|
||||
kubectl describe pvc <pvc-name> -n freeleaps-data-platform
|
||||
|
||||
# Check operator status
|
||||
kubectl get pods -n flink-system
|
||||
kubectl logs -n flink-system deployment/flink-kubernetes-operator
|
||||
```
|
||||
|
||||
## Important Notes
|
||||
1. **Storage Limitations**: Azure Disk storage class only supports ReadWriteOnce access mode
|
||||
2. **ServiceAccount**: Ensure the correct ServiceAccount is specified in cluster configuration
|
||||
3. **Resource Requirements**: Verify cluster has enough CPU/memory for all replicas
|
||||
4. **Network Policies**: May need adjustment for inter-pod communication
|
||||
5. **Ephemeral vs Persistent**: Use ephemeral storage for development/testing, persistent for production
|
||||
|
||||
## Quick Start (Recommended for Testing)
|
||||
```bash
|
||||
# 1. Deploy operator
|
||||
kubectl apply -f flink-operator-v2.yaml
|
||||
|
||||
# 2. Wait for operator to be ready
|
||||
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=flink-kubernetes-operator -n flink-system
|
||||
|
||||
# 3. Deploy simple HA cluster (no persistent storage)
|
||||
kubectl apply -f simple-ha-flink-cluster.yaml
|
||||
|
||||
# 4. Monitor deployment
|
||||
kubectl get flinkdeployments -n freeleaps-data-platform
|
||||
kubectl get pods -n freeleaps-data-platform -l app=flink
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
```bash
|
||||
# 1. Deploy operator
|
||||
kubectl apply -f flink-operator-v2.yaml
|
||||
|
||||
# 2. Deploy storage resources
|
||||
kubectl apply -f flink-storage.yaml
|
||||
|
||||
# 3. Deploy production HA cluster
|
||||
kubectl apply -f ha-flink-cluster-v2.yaml
|
||||
|
||||
# 4. Monitor deployment
|
||||
kubectl get flinkdeployments -n freeleaps-data-platform
|
||||
kubectl get pods -n freeleaps-data-platform -l app=flink
|
||||
```
|
||||
|
||||
|
||||
|
||||
@ -0,0 +1,43 @@
|
||||
apiVersion: apiextensions.k8s.io/v1
|
||||
kind: CustomResourceDefinition
|
||||
metadata:
|
||||
name: flinkdeployments.flink.apache.org
|
||||
spec:
|
||||
group: flink.apache.org
|
||||
versions:
|
||||
- name: v1beta1
|
||||
served: true
|
||||
storage: true
|
||||
schema:
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
x-kubernetes-preserve-unknown-fields: true
|
||||
subresources:
|
||||
status: {}
|
||||
scope: Namespaced
|
||||
names:
|
||||
plural: flinkdeployments
|
||||
singular: flinkdeployment
|
||||
kind: FlinkDeployment
|
||||
---
|
||||
apiVersion: apiextensions.k8s.io/v1
|
||||
kind: CustomResourceDefinition
|
||||
metadata:
|
||||
name: flinksessionjobs.flink.apache.org
|
||||
spec:
|
||||
group: flink.apache.org
|
||||
versions:
|
||||
- name: v1beta1
|
||||
served: true
|
||||
storage: true
|
||||
schema:
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
x-kubernetes-preserve-unknown-fields: true
|
||||
subresources:
|
||||
status: {}
|
||||
scope: Namespaced
|
||||
names:
|
||||
plural: flinksessionjobs
|
||||
singular: flinksessionjob
|
||||
kind: FlinkSessionJob
|
||||
@ -0,0 +1,298 @@
|
||||
apiVersion: apiextensions.k8s.io/v1
|
||||
kind: CustomResourceDefinition
|
||||
metadata:
|
||||
name: flinkdeployments.flink.apache.org
|
||||
spec:
|
||||
group: flink.apache.org
|
||||
names:
|
||||
kind: FlinkDeployment
|
||||
listKind: FlinkDeploymentList
|
||||
plural: flinkdeployments
|
||||
singular: flinkdeployment
|
||||
shortNames:
|
||||
- fd
|
||||
- flinkdeploy
|
||||
scope: Namespaced
|
||||
versions:
|
||||
- name: v1beta1
|
||||
served: true
|
||||
storage: true
|
||||
schema:
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
x-kubernetes-preserve-unknown-fields: true
|
||||
subresources:
|
||||
status: {}
|
||||
additionalPrinterColumns:
|
||||
- name: Job Status
|
||||
type: string
|
||||
jsonPath: .status.jobStatus
|
||||
- name: Flink Version
|
||||
type: string
|
||||
jsonPath: .spec.flinkVersion
|
||||
- name: Age
|
||||
type: date
|
||||
jsonPath: .metadata.creationTimestamp
|
||||
---
|
||||
apiVersion: apiextensions.k8s.io/v1
|
||||
kind: CustomResourceDefinition
|
||||
metadata:
|
||||
name: flinksessionjobs.flink.apache.org
|
||||
spec:
|
||||
group: flink.apache.org
|
||||
names:
|
||||
kind: FlinkSessionJob
|
||||
listKind: FlinkSessionJobList
|
||||
plural: flinksessionjobs
|
||||
singular: flinksessionjob
|
||||
shortNames:
|
||||
- fsj
|
||||
- flinksessionjob
|
||||
scope: Namespaced
|
||||
versions:
|
||||
- name: v1beta1
|
||||
served: true
|
||||
storage: true
|
||||
schema:
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
x-kubernetes-preserve-unknown-fields: true
|
||||
subresources:
|
||||
status: {}
|
||||
additionalPrinterColumns:
|
||||
- name: Job Status
|
||||
type: string
|
||||
jsonPath: .status.jobStatus
|
||||
- name: Flink Deployment
|
||||
type: string
|
||||
jsonPath: .spec.deploymentName
|
||||
- name: Age
|
||||
type: date
|
||||
jsonPath: .metadata.creationTimestamp
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: flink-system
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: flink-kubernetes-operator
|
||||
namespace: flink-system
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: flink-kubernetes-operator
|
||||
rules:
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- configmaps
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- deletecollection
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- events
|
||||
verbs:
|
||||
- create
|
||||
- get
|
||||
- patch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- nodes
|
||||
verbs:
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- namespaces
|
||||
verbs:
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- persistentvolumeclaims
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- pods
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- deletecollection
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- secrets
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- serviceaccounts
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- ""
|
||||
resources:
|
||||
- services
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- deletecollection
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- apps
|
||||
resources:
|
||||
- deployments
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- deletecollection
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs
|
||||
- cronjobs
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- deletecollection
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
- apiGroups:
|
||||
- flink.apache.org
|
||||
resources:
|
||||
- flinkdeployments
|
||||
- flinkdeployments/status
|
||||
- flinksessionjobs
|
||||
- flinksessionjobs/status
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- deletecollection
|
||||
- get
|
||||
- list
|
||||
- patch
|
||||
- update
|
||||
- watch
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: flink-kubernetes-operator
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: flink-kubernetes-operator
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: flink-kubernetes-operator
|
||||
namespace: flink-system
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: flink-kubernetes-operator
|
||||
namespace: flink-system
|
||||
labels:
|
||||
app: flink-kubernetes-operator
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: flink-kubernetes-operator
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: flink-kubernetes-operator
|
||||
spec:
|
||||
serviceAccountName: flink-kubernetes-operator
|
||||
containers:
|
||||
- name: flink-kubernetes-operator
|
||||
image: apache/flink-kubernetes-operator:1.8.0
|
||||
command: ["/docker-entrypoint.sh"]
|
||||
args: ["operator"]
|
||||
env:
|
||||
- name: POD_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
- name: OPERATOR_NAME
|
||||
value: flink-kubernetes-operator
|
||||
- name: LEADER_ELECTION_ID
|
||||
value: flink-kubernetes-operator
|
||||
- name: LEADER_ELECTION_NAMESPACE
|
||||
value: flink-system
|
||||
ports:
|
||||
- containerPort: 8085
|
||||
name: metrics
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /healthz
|
||||
port: 8085
|
||||
initialDelaySeconds: 15
|
||||
periodSeconds: 20
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /readyz
|
||||
port: 8085
|
||||
@ -0,0 +1,66 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: flink
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: flink-role
|
||||
namespace: freeleaps-data-platform
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["configmaps", "secrets", "services", "pods", "events", "endpoints"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
|
||||
- apiGroups: ["apps"]
|
||||
resources: ["deployments", "statefulsets"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
|
||||
- apiGroups: ["batch"]
|
||||
resources: ["jobs", "cronjobs"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: flink-role-binding
|
||||
namespace: freeleaps-data-platform
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: flink-role
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: flink
|
||||
namespace: freeleaps-data-platform
|
||||
---
|
||||
# Additional permissions for HA functionality
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: flink-ha-cluster-role
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["configmaps", "secrets", "services", "pods", "events", "endpoints"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
|
||||
- apiGroups: ["apps"]
|
||||
resources: ["deployments", "statefulsets"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
|
||||
- apiGroups: ["batch"]
|
||||
resources: ["jobs", "cronjobs"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: flink-ha-cluster-role-binding
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: flink-ha-cluster-role
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: flink
|
||||
namespace: freeleaps-data-platform
|
||||
@ -0,0 +1,82 @@
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: ha-flink-ha-data
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
component: ha-storage
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
storageClassName: azure-disk-std-ssd-lrs
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: ha-flink-checkpoints
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
component: checkpoint-storage
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 20Gi
|
||||
storageClassName: azure-disk-std-ssd-lrs
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: ha-flink-savepoints
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
component: savepoint-storage
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 20Gi
|
||||
storageClassName: azure-disk-std-ssd-lrs
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: flink
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: flink-role
|
||||
namespace: freeleaps-data-platform
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["configmaps", "secrets", "services", "pods"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete"]
|
||||
- apiGroups: ["apps"]
|
||||
resources: ["deployments"]
|
||||
verbs: ["get", "list", "create", "update", "patch", "delete"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: flink-role-binding
|
||||
namespace: freeleaps-data-platform
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: flink-role
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: flink
|
||||
namespace: freeleaps-data-platform
|
||||
@ -0,0 +1,94 @@
|
||||
apiVersion: flink.apache.org/v1beta1
|
||||
kind: FlinkDeployment
|
||||
metadata:
|
||||
name: ha-flink-cluster-v2
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
component: streaming
|
||||
cluster-type: ha
|
||||
spec:
|
||||
flinkVersion: v1_19
|
||||
image: flink:1.19.0
|
||||
flinkConfiguration:
|
||||
# High Availability Configuration
|
||||
high-availability.type: kubernetes
|
||||
high-availability.storageDir: file:///opt/flink/ha-data
|
||||
# Checkpointing Configuration
|
||||
state.backend.type: filesystem
|
||||
state.checkpoints.dir: file:///opt/flink/checkpoints
|
||||
state.savepoints.dir: file:///opt/flink/savepoints
|
||||
execution.checkpointing.interval: 60s
|
||||
execution.checkpointing.min-pause: 5s
|
||||
execution.checkpointing.timeout: 10min
|
||||
# JobManager Configuration
|
||||
jobmanager.rpc.address: ha-flink-cluster-v2-jobmanager
|
||||
jobmanager.rpc.port: "6123"
|
||||
jobmanager.bind-host: "0.0.0.0"
|
||||
# REST Configuration
|
||||
rest.address: ha-flink-cluster-v2-jobmanager
|
||||
rest.port: "8081"
|
||||
rest.bind-address: "0.0.0.0"
|
||||
# Blob Server Configuration
|
||||
blob.server.port: "6124"
|
||||
# TaskManager Configuration
|
||||
taskmanager.numberOfTaskSlots: "2"
|
||||
# Memory Configuration
|
||||
taskmanager.memory.process.size: 2048m
|
||||
jobmanager.memory.process.size: 1024m
|
||||
# Restart Strategy
|
||||
restart-strategy.type: exponential-delay
|
||||
restart-strategy.exponential-delay.initial-backoff: 10s
|
||||
restart-strategy.exponential-delay.max-backoff: 2min
|
||||
restart-strategy.exponential-delay.backoff-multiplier: "2.0"
|
||||
restart-strategy.exponential-delay.reset-backoff-threshold: 10min
|
||||
restart-strategy.exponential-delay.jitter-factor: "0.1"
|
||||
serviceAccount: flink
|
||||
jobManager:
|
||||
replicas: 2
|
||||
resource:
|
||||
memory: "1024m"
|
||||
cpu: 0.5
|
||||
podTemplate:
|
||||
spec:
|
||||
containers:
|
||||
- name: flink-main-container
|
||||
volumeMounts:
|
||||
- name: ha-data
|
||||
mountPath: /opt/flink/ha-data
|
||||
- name: checkpoints
|
||||
mountPath: /opt/flink/checkpoints
|
||||
- name: savepoints
|
||||
mountPath: /opt/flink/savepoints
|
||||
volumes:
|
||||
- name: ha-data
|
||||
persistentVolumeClaim:
|
||||
claimName: ha-flink-ha-data
|
||||
- name: checkpoints
|
||||
persistentVolumeClaim:
|
||||
claimName: ha-flink-checkpoints
|
||||
- name: savepoints
|
||||
persistentVolumeClaim:
|
||||
claimName: ha-flink-savepoints
|
||||
taskManager:
|
||||
replicas: 3
|
||||
resource:
|
||||
memory: "2048m"
|
||||
cpu: 0.5
|
||||
podTemplate:
|
||||
spec:
|
||||
containers:
|
||||
- name: flink-main-container
|
||||
volumeMounts:
|
||||
- name: checkpoints
|
||||
mountPath: /opt/flink/checkpoints
|
||||
- name: savepoints
|
||||
mountPath: /opt/flink/savepoints
|
||||
volumes:
|
||||
- name: checkpoints
|
||||
persistentVolumeClaim:
|
||||
claimName: ha-flink-checkpoints
|
||||
- name: savepoints
|
||||
persistentVolumeClaim:
|
||||
claimName: ha-flink-savepoints
|
||||
|
||||
@ -0,0 +1,46 @@
|
||||
apiVersion: flink.apache.org/v1beta1
|
||||
kind: FlinkDeployment
|
||||
metadata:
|
||||
name: simple-ha-flink-cluster
|
||||
namespace: freeleaps-data-platform
|
||||
labels:
|
||||
app: flink
|
||||
component: streaming
|
||||
cluster-type: simple-ha
|
||||
spec:
|
||||
flinkVersion: v1_18
|
||||
image: flink:1.18
|
||||
flinkConfiguration:
|
||||
# Basic Configuration
|
||||
taskmanager.numberOfTaskSlots: "2"
|
||||
# High Availability Configuration (using ephemeral storage)
|
||||
high-availability.type: kubernetes
|
||||
high-availability.storageDir: file:///tmp/flink/ha-data
|
||||
# Checkpointing Configuration (using ephemeral storage)
|
||||
state.backend.type: filesystem
|
||||
state.checkpoints.dir: file:///tmp/flink/checkpoints
|
||||
state.savepoints.dir: file:///tmp/flink/savepoints
|
||||
execution.checkpointing.interval: 60s
|
||||
execution.checkpointing.min-pause: 5s
|
||||
execution.checkpointing.timeout: 10min
|
||||
# Memory Configuration
|
||||
taskmanager.memory.process.size: 2048m
|
||||
jobmanager.memory.process.size: 1024m
|
||||
# Restart Strategy
|
||||
restart-strategy.type: exponential-delay
|
||||
restart-strategy.exponential-delay.initial-backoff: 10s
|
||||
restart-strategy.exponential-delay.max-backoff: 2min
|
||||
restart-strategy.exponential-delay.backoff-multiplier: "2.0"
|
||||
restart-strategy.exponential-delay.reset-backoff-threshold: 10min
|
||||
restart-strategy.exponential-delay.jitter-factor: "0.1"
|
||||
serviceAccount: flink
|
||||
jobManager:
|
||||
replicas: 2
|
||||
resource:
|
||||
memory: "1024m"
|
||||
cpu: 1.0
|
||||
taskManager:
|
||||
replicas: 3
|
||||
resource:
|
||||
memory: "2048m"
|
||||
cpu: 2.0
|
||||
Loading…
Reference in New Issue
Block a user