Setup Cloudwatch monitoring GPU EC2

STEP 1: Install CloudWatch Agent

  1. Download CloudWatch Agent MSI

$msiUrl = "https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi"
$installerPath = "$env:TEMP\amazon-cloudwatch-agent.msi"
  1. Download installer

Invoke-WebRequest -Uri $msiUrl -OutFile $installerPath
  1. Install MSI

msiexec.exe /i $installerPath /quiet
  1. Verify installation

Get-Service AmazonCloudWatchAgent

STEP 2: Install NVIDIA Plugin untuk GPU Monitoring

CloudWatch Agent untuk Windows sudah include NVIDIA plugin, tidak perlu install terpisah. Tapi pastikan:

cd "C:\Program Files\Amazon\AmazonCloudWatchAgent"
dir .\plugins\

STEP 3: Buat Konfigurasi File

Buat file C:\ProgramData\Amazon\AmazonCloudWatchAgent\config.json:

{
    "metrics": {
        "metrics_collected": {
            "nvidia_gpu": {
                "measurement": [
                    "utilization_gpu",
                    "utilization_memory",
                    "memory_used"
                ],
                "metrics_collection_interval": 300
            },
            "Memory": {
                "measurement": [
                    "% Committed Bytes In Use",
                    "Available MBytes"
                ]
            },
            "LogicalDisk": {
                "measurement": [
                    "% Free Space",
                    "Free Megabytes"
                ],
                "resources": [
                    "*"
                ]
            },
            "Processor": {
                "measurement": [
                    "% Processor Time"
                ],
                "resources": [
                    "_Total"
                ]
            }
        },
        "append_dimensions": {
            "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
            "ImageId": "${aws:ImageId}",
            "InstanceId": "${aws:InstanceId}",
            "InstanceType": "${aws:InstanceType}"
        },
        "aggregation_dimensions": [
            [
                "InstanceId"
            ]
        ],
        "force_flush_interval": 60
    }
}

STEP 4: Konfigurasi dan Start Agent

Run as Administrator

  1. Navigate ke agent directory

  1. Apply konfigurasi

  1. Start service

  1. Cek status

STEP 5: Verifikasi Instalasi

  1. Cek service status

  1. Cek logs (jika ada issue)

  1. Cek proses berjalan

  1. Test config (validasi syntax)

Last updated