434 lines
9.1 KiB
Markdown
434 lines
9.1 KiB
Markdown
# Kubernetes Bootstrap Guide
|
|
|
|
## 🎯 **Overview**
|
|
|
|
This guide explains how to bootstrap a complete Kubernetes cluster from scratch using Azure VMs and the `freeleaps-ops` repository. **Kubernetes does NOT create automatically** - you need to manually bootstrap the entire infrastructure.
|
|
|
|
## 📋 **Prerequisites**
|
|
|
|
### **1. Azure Infrastructure**
|
|
- ✅ Azure VMs (already provisioned)
|
|
- ✅ Network connectivity between VMs
|
|
- ✅ Azure AD tenant configured
|
|
- ✅ Resource group: `k8s`
|
|
|
|
### **2. Local Environment**
|
|
- ✅ `freeleaps-ops` repository cloned
|
|
- ✅ Ansible installed (`pip install ansible`)
|
|
- ✅ Azure CLI installed and configured
|
|
- ✅ SSH access to VMs
|
|
|
|
### **3. VM Requirements**
|
|
- **Master Nodes**: 2+ VMs for control plane
|
|
- **Worker Nodes**: 2+ VMs for workloads
|
|
- **Network**: All VMs in same subnet
|
|
- **OS**: Ubuntu 20.04+ recommended
|
|
|
|
---
|
|
|
|
## 🚀 **Step-by-Step Bootstrap Process**
|
|
|
|
### **Step 1: Verify Azure VMs**
|
|
|
|
```bash
|
|
# Check VM status
|
|
az vm list --resource-group k8s --query "[].{name:name,powerState:powerState,privateIP:privateIps}" -o table
|
|
|
|
# Ensure all VMs are running
|
|
az vm start --resource-group k8s --name <vm-name>
|
|
```
|
|
|
|
### **Step 2: Configure Inventory**
|
|
|
|
Edit the Ansible inventory file:
|
|
|
|
```bash
|
|
cd freeleaps-ops
|
|
vim cluster/ansible/manifests/inventory.ini
|
|
```
|
|
|
|
**Example inventory structure:**
|
|
```ini
|
|
[all:vars]
|
|
ansible_user=wwwadmin@mathmast.com
|
|
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
|
|
|
[kube_control_plane]
|
|
prod-usw2-k8s-freeleaps-master-01 ansible_host=10.10.0.4 etcd_member_name=freeleaps-etcd-01 host_name=prod-usw2-k8s-freeleaps-master-01
|
|
prod-usw2-k8s-freeleaps-master-02 ansible_host=10.10.0.5 etcd_member_name=freeleaps-etcd-02 host_name=prod-usw2-k8s-freeleaps-master-02
|
|
|
|
[kube_node]
|
|
prod-usw2-k8s-freeleaps-worker-nodes-01 ansible_host=10.10.0.6 host_name=prod-usw2-k8s-freeleaps-worker-nodes-01
|
|
prod-usw2-k8s-freeleaps-worker-nodes-02 ansible_host=10.10.0.7 host_name=prod-usw2-k8s-freeleaps-worker-nodes-02
|
|
|
|
[etcd]
|
|
prod-usw2-k8s-freeleaps-master-01
|
|
prod-usw2-k8s-freeleaps-master-02
|
|
|
|
[k8s_cluster:children]
|
|
kube_control_plane
|
|
kube_node
|
|
```
|
|
|
|
### **Step 3: Test Connectivity**
|
|
|
|
```bash
|
|
cd cluster/ansible/manifests
|
|
ansible -i inventory.ini all -m ping -kK
|
|
```
|
|
|
|
### **Step 4: Bootstrap Kubernetes Cluster**
|
|
|
|
```bash
|
|
cd ../../3rd/kubespray
|
|
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b
|
|
```
|
|
|
|
**What this does:**
|
|
- Installs Docker/containerd on all nodes
|
|
- Downloads Kubernetes binaries (v1.31.4)
|
|
- Generates certificates and keys
|
|
- Bootstraps etcd cluster
|
|
- Starts Kubernetes control plane
|
|
- Joins worker nodes
|
|
- Configures Calico networking
|
|
- Sets up OIDC authentication
|
|
|
|
### **Step 5: Get Kubeconfig**
|
|
|
|
```bash
|
|
# Get kubeconfig from master node
|
|
ssh wwwadmin@mathmast.com@10.10.0.4 "sudo cat /etc/kubernetes/admin.conf" > ~/.kube/config
|
|
|
|
# Test cluster access
|
|
kubectl get nodes
|
|
kubectl get pods -n kube-system
|
|
```
|
|
|
|
### **Step 6: Deploy Infrastructure**
|
|
|
|
```bash
|
|
cd ../../cluster/manifests
|
|
|
|
# Deploy in order
|
|
kubectl apply -f freeleaps-controls-system/
|
|
kubectl apply -f freeleaps-devops-system/
|
|
kubectl apply -f freeleaps-monitoring-system/
|
|
kubectl apply -f freeleaps-logging-system/
|
|
kubectl apply -f freeleaps-data-platform/
|
|
```
|
|
|
|
### **Step 7: Setup Authentication**
|
|
|
|
```bash
|
|
cd ../../cluster/bin
|
|
./freeleaps-cluster-authenticator auth
|
|
```
|
|
|
|
---
|
|
|
|
## 🤖 **Automated Bootstrap Script**
|
|
|
|
Use the provided bootstrap script for automated deployment:
|
|
|
|
```bash
|
|
cd freeleaps-ops/docs
|
|
./bootstrap-k8s-cluster.sh
|
|
```
|
|
|
|
**Script Features:**
|
|
- ✅ Prerequisites verification
|
|
- ✅ Azure VM status check
|
|
- ✅ Connectivity testing
|
|
- ✅ Automated cluster bootstrap
|
|
- ✅ Infrastructure deployment
|
|
- ✅ Authentication setup
|
|
- ✅ Status verification
|
|
|
|
**Usage Options:**
|
|
```bash
|
|
# Full bootstrap
|
|
./bootstrap-k8s-cluster.sh
|
|
|
|
# Only verify prerequisites
|
|
./bootstrap-k8s-cluster.sh --verify
|
|
|
|
# Only bootstrap cluster (skip infrastructure)
|
|
./bootstrap-k8s-cluster.sh --bootstrap
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 **Manual Bootstrap Commands**
|
|
|
|
If you prefer manual control, here are the detailed commands:
|
|
|
|
### **1. Install Prerequisites**
|
|
|
|
```bash
|
|
# Install Ansible
|
|
pip install ansible
|
|
|
|
# Install Azure CLI
|
|
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
|
|
|
|
# Install kubectl
|
|
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
|
|
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
|
|
```
|
|
|
|
### **2. Configure Azure**
|
|
|
|
```bash
|
|
# Login to Azure
|
|
az login
|
|
|
|
# Set subscription
|
|
az account set --subscription <subscription-id>
|
|
```
|
|
|
|
### **3. Bootstrap Cluster**
|
|
|
|
```bash
|
|
# Navigate to kubespray
|
|
cd freeleaps-ops/3rd/kubespray
|
|
|
|
# Run cluster installation
|
|
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b
|
|
```
|
|
|
|
### **4. Verify Installation**
|
|
|
|
```bash
|
|
# Get kubeconfig
|
|
ssh wwwadmin@mathmast.com@<master-ip> "sudo cat /etc/kubernetes/admin.conf" > ~/.kube/config
|
|
|
|
# Test cluster
|
|
kubectl get nodes
|
|
kubectl get pods -n kube-system
|
|
```
|
|
|
|
---
|
|
|
|
## 🔍 **Verification Steps**
|
|
|
|
### **1. Cluster Health**
|
|
|
|
```bash
|
|
# Check nodes
|
|
kubectl get nodes -o wide
|
|
|
|
# Check system pods
|
|
kubectl get pods -n kube-system
|
|
|
|
# Check cluster info
|
|
kubectl cluster-info
|
|
```
|
|
|
|
### **2. Network Verification**
|
|
|
|
```bash
|
|
# Check Calico pods
|
|
kubectl get pods -n kube-system | grep calico
|
|
|
|
# Check network policies
|
|
kubectl get networkpolicies --all-namespaces
|
|
```
|
|
|
|
### **3. Authentication Test**
|
|
|
|
```bash
|
|
# Test OIDC authentication
|
|
kubectl auth whoami
|
|
|
|
# Check permissions
|
|
kubectl auth can-i --list
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 **Troubleshooting**
|
|
|
|
### **Common Issues**
|
|
|
|
#### **1. Ansible Connection Failed**
|
|
```bash
|
|
# Check VM status
|
|
az vm show --resource-group k8s --name <vm-name> --query "powerState"
|
|
|
|
# Test SSH manually
|
|
ssh wwwadmin@mathmast.com@<vm-ip>
|
|
|
|
# Check network security groups
|
|
az network nsg rule list --resource-group k8s --nsg-name <nsg-name>
|
|
```
|
|
|
|
#### **2. Cluster Bootstrap Failed**
|
|
```bash
|
|
# Check Ansible logs
|
|
ansible-playbook -i inventory.ini cluster.yml -kK -b -vvv
|
|
|
|
# Check VM resources
|
|
kubectl describe node <node-name>
|
|
|
|
# Check system pods
|
|
kubectl get pods -n kube-system
|
|
kubectl describe pod <pod-name> -n kube-system
|
|
```
|
|
|
|
#### **3. Infrastructure Deployment Failed**
|
|
```bash
|
|
# Check CRDs
|
|
kubectl get crd
|
|
|
|
# Check operator pods
|
|
kubectl get pods --all-namespaces | grep operator
|
|
|
|
# Check events
|
|
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
|
|
```
|
|
|
|
### **Recovery Procedures**
|
|
|
|
#### **If Bootstrap Fails**
|
|
1. **Clean up failed installation**
|
|
```bash
|
|
# Reset VMs to clean state
|
|
az vm restart --resource-group k8s --name <vm-name>
|
|
```
|
|
|
|
2. **Retry bootstrap**
|
|
```bash
|
|
cd freeleaps-ops/3rd/kubespray
|
|
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b
|
|
```
|
|
|
|
#### **If Infrastructure Deployment Fails**
|
|
1. **Check prerequisites**
|
|
```bash
|
|
kubectl get nodes
|
|
kubectl get pods -n kube-system
|
|
```
|
|
|
|
2. **Redeploy components**
|
|
```bash
|
|
kubectl delete -f <component-directory>/
|
|
kubectl apply -f <component-directory>/
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 **Post-Bootstrap Verification**
|
|
|
|
### **1. Core Components**
|
|
|
|
```bash
|
|
# ArgoCD
|
|
kubectl get pods -n freeleaps-devops-system | grep argocd
|
|
|
|
# Cert-manager
|
|
kubectl get pods -n freeleaps-controls-system | grep cert-manager
|
|
|
|
# Prometheus/Grafana
|
|
kubectl get pods -n freeleaps-monitoring-system | grep prometheus
|
|
kubectl get pods -n freeleaps-monitoring-system | grep grafana
|
|
|
|
# Logging
|
|
kubectl get pods -n freeleaps-logging-system | grep loki
|
|
```
|
|
|
|
### **2. Access Points**
|
|
|
|
```bash
|
|
# ArgoCD UI
|
|
kubectl port-forward svc/argocd-server -n freeleaps-devops-system 8080:80
|
|
|
|
# Grafana UI
|
|
kubectl port-forward svc/kube-prometheus-stack-grafana -n freeleaps-monitoring-system 3000:80
|
|
|
|
# Kubernetes Dashboard
|
|
kubectl port-forward svc/kubernetes-dashboard-kong-proxy -n freeleaps-infra-system 8443:443
|
|
```
|
|
|
|
### **3. Authentication Setup**
|
|
|
|
```bash
|
|
# Setup user authentication
|
|
cd freeleaps-ops/cluster/bin
|
|
./freeleaps-cluster-authenticator auth
|
|
|
|
# Test authentication
|
|
kubectl auth whoami
|
|
kubectl get nodes
|
|
```
|
|
|
|
---
|
|
|
|
## 🔒 **Security Considerations**
|
|
|
|
### **1. Network Security**
|
|
- Ensure VMs are in private subnets
|
|
- Configure network security groups properly
|
|
- Use VPN or bastion host for access
|
|
|
|
### **2. Access Control**
|
|
- Use Azure AD OIDC for authentication
|
|
- Implement RBAC for authorization
|
|
- Regular access reviews
|
|
|
|
### **3. Monitoring**
|
|
- Enable audit logging
|
|
- Monitor cluster health
|
|
- Set up alerts
|
|
|
|
---
|
|
|
|
## 📚 **Next Steps**
|
|
|
|
### **1. Application Deployment**
|
|
- Deploy applications via ArgoCD
|
|
- Configure CI/CD pipelines
|
|
- Set up monitoring and alerting
|
|
|
|
### **2. Maintenance**
|
|
- Regular security updates
|
|
- Backup etcd data
|
|
- Monitor resource usage
|
|
|
|
### **3. Scaling**
|
|
- Add more worker nodes
|
|
- Configure auto-scaling
|
|
- Optimize resource allocation
|
|
|
|
---
|
|
|
|
## 🆘 **Support**
|
|
|
|
### **Emergency Contacts**
|
|
- **Infrastructure Team**: [Contact Information]
|
|
- **Azure Support**: [Contact Information]
|
|
- **Kubernetes Community**: [Contact Information]
|
|
|
|
### **Useful Commands**
|
|
```bash
|
|
# Cluster status
|
|
kubectl get nodes
|
|
kubectl get pods --all-namespaces
|
|
|
|
# Logs
|
|
kubectl logs -n kube-system <pod-name>
|
|
|
|
# Events
|
|
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
|
|
|
|
# Resource usage
|
|
kubectl top nodes
|
|
kubectl top pods --all-namespaces
|
|
```
|
|
|
|
---
|
|
|
|
**Last Updated**: September 3, 2025
|
|
**Version**: 1.0
|
|
**Maintainer**: Infrastructure Team
|