Deka GPU Documentations
  • Starter Guide
    • Introduction
    • Sign Up
    • Choose a Package
    • Top Up
    • Create a Virtual Machine
    • Download kubeconfig
    • Create a Deka LLM
    • Create a Deka Notebook
    • Conclusion
  • Service Portal
    • Introduction
    • Sign Up
    • Sign In
    • Sign Out
    • Forgot Password
    • Account Setting
      • Using MFA Google Authenticator
      • Using MFA Microsoft Authenticator
    • Project
      • Add Project
      • Delete Project
    • List Roles
    • Broadcast
    • Audit Log
    • Voucher
      • Voucher Trial
      • Voucher Credit
    • Security
      • AI Security AI Infrastructure Layer
      • AI Security AI Application Layer
    • Ticket
      • Create Ticket
      • Add Attachment
      • Discussion
      • Detail Ticket
    • Billing
      • Daily Cost Estimated
      • Monthly Cost
      • Invoice
      • Summary Monthly
    • Balance
      • Project Type: SME
        • GPU Merdeka
        • Choose Package
        • Top-Up
      • Project Type: Enterprise
      • History Balance
        • Balance
        • Transaction
      • Custom Resource Definition
  • Deka GPU
    • Deka GPU: Kubernetes
      • Introduction
      • GPU Type
      • Dashboard
        • Check Status Kubernetes
        • Download Kube Config
        • Access Console
      • Workloads
        • Pods
          • Create New Pod
          • Access Console
          • Configuration Pod
          • Delete Pod
          • How to Create a New Pod use CLI
        • Deployments
          • Create New Deployment
          • Configuring Deployment
          • Delete of a Deployment
          • How to Create a New Deployment use CLI
        • DaemonSets
          • Create a New DaemonSet
          • Configuring a DaemonSet
          • Delete DaemonSet
      • Services
      • Storages
        • Storage Class
        • Persistent Volume Claims
          • Create a New Persistent Volume Claim
          • How to Create a New Persistent Volume Claim use CLI
      • Resources
        • Ingresses
          • Create a New Ingresses
          • Configuring Ingresses
          • Delete Ingresses
        • Ingresses Classes
          • Create a New Ingresses Classes
          • Configuring Ingresses Classes
          • Delete Ingresses Classes
        • Secrets
          • Create a New Secret
          • Configuring a Secrets
          • Delete a Secrets
        • Config Maps
          • Create a New Config Maps
          • Configuring a Config Map
          • Delete a Config Map
      • Service Accounts
        • Service Accounts
          • Create a New Service Account
          • Configuring a Service Account
          • Delete Service Account
        • Roles
          • Create a New Roles
          • Configuring Roles
          • Delete Roles
        • Cluster Roles
          • Create a New Cluster Role
          • Configuring Cluster Role
          • Delete Cluster Role
        • Cluster Role Bindings
          • Create a New Role Binding
          • Configuring Role Bindings
          • Delete Cluster Role Binding
    • Deka GPU: VMs
      • Operating System
      • GPU Type
      • Machine Type
      • Namespace Type
      • Storage Class
      • How to Create a Virtual Machine on Service Portal
      • How to Manually Create a Virtual Machine
        • Download Kube Config
        • Running Kube Config
        • Configuration file dv.yaml
        • Configuration file vm.yaml
        • Configuration file svc.yaml
      • Feature Overview of Virtual Machine
        • Detail a Virtual Machine
        • Open Console
        • Turn Off a VM Instance
        • Turn On a VM Instance
        • Restart a Virtual Machine
        • How to Access Console
        • Show YAML File
      • Delete a Virtual Machine
    • Deka GPU: Registry
      • Create Registry
      • Quota
      • Detail Registry
        • Summary
        • Repository
        • Logs
        • Labels
        • Tag Immutability
        • Member
        • Resize Storage Registry
      • Delete Registry
    • Deka GPU: Security
      • Deka Guard
        • Introduction
        • Create Guard to Deny All Ingress
        • Create Guard to Allow Ingress
        • Create Guard to Allow Ingress with port
        • Create Guard to Allow Ingress with IP/CIDR
        • Create Guard to Deny All Egress
        • Create Guard to Allow Egress
        • Create guard to Allow Egress with Port
        • Create Guard to Allow Egress with IP/CIDR
    • Deka GPU: Service
      • Ingress
        • Install Ingress nginx
        • Install Cert Manager
        • Create Cluster Issuer
        • Create Ingress with TLS
    • Deka GPU: Autoscaling
      • Basic Autoscaling
    • Deka GPU: Network
      • Deka VPC
    • Deka GPU: MLOps
      • Introduction
      • Notebook
      • Tensorboards
      • Volumes
      • Endpoints
        • Create Endpoint
        • Delete Endpoint
      • Experiments (AutoML)
        • Create Experiments (AutoML)
        • Create Experiments (AutoML) using Python SDK
        • Get Experiments Results
      • Experiments (KFP)
        • Create Experiment
      • Pipelines
      • Runs
        • Create Run
        • Delete Active Run
      • Recurring Runs
        • Create Recurring Run
        • Delete Recurring Runs
        • Home
      • Artifacts
      • Executions
      • Manage Contributors
  • Deka Guard
    • Introduction
    • Create Deka Guard
      • Create with Form
      • Create with YAML
      • Create Deny All Igress
      • Create Deny All Egress
    • Configuration Deka Guard
    • Delete Deka Guard
  • Deka LLM
    • Introduction
    • Check Project Type
    • Create a New LLM
    • Detail Deka LLM
      • Overview Tab
      • Keys Tab
        • Create a New Key
        • Detail a Key
        • Edit a Key
        • Get a Secret Key
        • Delete a Key
      • Coin Usage Tab
      • Token Usage Tab
      • Playground Tab
      • Alert Tab
      • Top Up Coin
      • Update Models
    • API Deka LLM
      • Model Management
      • Completions
      • Embedding
    • Delete Deka LLM
    • How to Create Simple Prompt with Deka LLM
      • Create Deka LLM
      • Get URL API Deka LLM
      • Get Secret Key
      • Access API Deka LLM using Postman
      • Get Model
      • Post Chat Completions
  • Deka Notebook
    • Introduction
    • Namespace Type
    • Create a New Notebook
    • Detail Deka Notebook
      • Configuration Deka Notebook
      • Start Deka Notebook Service
      • Stop Deka Notebook Service
      • Get Token
      • Login Deka Notebook
      • Logout Deka Notebook
    • Delete Deka Notebook
  • Reference
    • How to use kubeconfig on Linux
    • How to use kubeconfig on Windows
    • Kubernetes Commands for Enhancing Security
    • How to add GPU in Kubernetes
    • How to Add GPU in VM
      • Download kubeconfig
      • Install kubectl
      • Add GPU
      • Install Driver NVIDIA
    • RAPIDS
      • How to Setup RAPIDS
      • How to make Custom Image
    • How to push image with Docker
    • Deployment LLaMA 3.1 70B with VLLM on Kubernetes
      • Getting the Hugging Face API Key
      • Requesting Access to the LLaMA Model
      • Connect Kubernetes on Computer
      • Create Namespace
      • Create PersistentVolumeClaim (PVC)
      • Create Secret for Hugging Face Token
      • Create Deployment
      • Create Service
      • Verify Deployment
      • Accessing the LLaMA Service
      • Troubleshooting
    • How to Get an API Key on NGC
    • Deployment LLM with Deka GPU + NIM
    • Deployment Deepseek R1 70B with VLLM on Deka GPU's Kubernetes
      • Prerequisites
      • Create Namespace
      • Create PersistentVolumeClaim (PVC)
      • Create Deployment
      • Create Service
      • Verify Deployment
      • Accessing the Deepsek Service
      • Troubleshooting
    • How to Upload and Download on FTP Web
  • Troubleshooting
    • Reinstall Driver NVIDIA on Linux
    • NVIDIA Driver Not Detected After Upgrade Kernel
Powered by GitBook
On this page
  1. Reference
  2. Deployment Deepseek R1 70B with VLLM on Deka GPU's Kubernetes

Create Deployment

PreviousCreate PersistentVolumeClaim (PVC)NextCreate Service

Last updated 2 months ago

Create a deployment to run the LLaMA container. If you are using a Linux operating system, then run the following syntax to create the pvc.yaml file.

nano deployment.yaml

If you are using a Windows operating system, open a text editor such as Notepad or Notepad++.

Enter the following syntax.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deepseek-r1
  namespace: vllm
  labels:
    app: deepseek-r1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deepseek-r1

  template:
    metadata:
      labels:
        app: deepseek-r1

    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                - NVIDIA-H100-80GB-HBM3
      volumes:
      - name: cache-volume
        persistentVolumeClaim:
          claimName: deepseek-r1
        # vLLM needs to access the host's shared memory for tensor parallel inference

      - name: shm
        emptyDir:
          medium: Memory
          sizeLimit: "2Gi"
      containers:
      - name: deepseek-r1

        image: vllm/vllm-openai:v0.6.4 
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve deepseek-ai/DeeSeek-R1-Distill-Llama-70B \ 

         pu-memory-utilization 0.95 \
 -       -tensor-parallel-size 4 \
         -enforce-eager"
        ]
        ports:
        - containerPort: 8000
        resources:
          limits:
            cpu: "16"
            memory: 64Gi
            nvidia.com/gpu: "4" 
          requests:
            cpu: "4"
            memory: 8Gi
            nvidia.com/gpu: "4"
        volumeMounts:
        - mountPath: /.cache
          name: cache-volume
        - name: shm
          mountPath: /dev/shm
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 240
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 240
          periodSeconds: 10
        redlinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 240
          periodSeconds: 5
        securityContext:
          runAsUser: 1000
          runAsNonRoot: true
          allowPrivilegeEscalation: false
      runtimeClassName: nvidia

There are several lines of syntax above that you have to change.

  • You need to adjust the version tag used according to the LLaMA model page. You can adjust on line 58.

 image: vllm/vllm-openai:v0.6.4 
  • The model can be run on 2 GPUs, but you need to reduce max-model-len to 16000. The consequence is that the model can only process 16000 input tokens. To do this, change the args section on line 39 in deployment.yaml as follows.

args: [
"vllm serve meta-llama/Llama-3.1-70B-Instruct --gpu-memory-utilization 0.95 --tensor-parallel-size 2 --max-model-len 16000 --enforce-eager"
]
  • Detailed Parameters for vllm serve. The vllm serve command is used to start the vLLM server. Here are the detailed parameters used in the command:

    1. <model>: The name of the model to serve. In this case, it is deepseek-ai/DeepSeek-R1-Distill-Llama-70B.

    2. --gpu-memory-utilization <float>: The GPU memory utilization factor. This controls how much of the available GPU memory is used by the model. A value of 0.95 means 95% of the GPU memory will be used.

    3. --tensor-parallel-size <int> : The number of GPUs to use for tensor parallelism. This helps in distributing the model across multiple GPUs. In this case, it is set to 4

    4. --max-model-len <int>: The maximum length of the input tokens the model can process. Reducing this value can help in running the model with fewer GPUs.

    5. --enforce-eager: Enforces eager execution mode, which can improve performance for certain workloads.

If you are using a Linux operating system, run the following syntax but If you are using a Windows operating system, after save the file as secret.yaml, in CMD navigate to the folder that contains the secret.yaml file and run the following syntax.

kubectl apply -f deployment.yaml

To delete the pvc.yaml configuration that has been applied, run the following syntax.

kubectl delete -f deployment.yaml -n [namespace]

Replace [namespace] with the namespace you created in the sub-chapter .

Create Namespace
Text Editor
Page cover image