# 🖥️ GPU Acceleration for Discovery on macOS

## 🔍 Overview

The NEAT-AI Discovery Rust library **already supports GPU acceleration** on
macOS using Metal via the `wgpu` crate. GPU acceleration is automatically
enabled for Mac systems (`darwin`) and uses Metal under the hood for
high-performance compute operations.

> [!NOTE]
> GPU acceleration is enabled by default on macOS (`darwin`) and requires no
> additional configuration. All modern Macs with Apple Silicon or recent
> AMD/Intel GPUs are supported via Metal.

## ⚡ Current GPU Implementation

### ✅ What's Already GPU-Accelerated

1. **Synapse Analysis** - Both helpful and harmful synapse evaluation use GPU
   compute shaders
2. **Neuron Analysis** - The helpful statistics calculation uses GPU, though
   activation function evaluation still runs on CPU

### 🧰 GPU Technology Stack

- **wgpu 0.19** - Cross-platform GPU abstraction layer
- **Metal** - Apple's GPU API (used automatically on macOS)
- **Compute Shaders** - WGSL shaders for parallel processing

## 🔎 Verifying GPU Usage

### Method 1: Check Logs

When discovery runs with logging enabled, you'll now see messages like:

```
Rust synapse analysis using GPU (X helpful, Y harmful candidates)
Rust neuron analysis using GPU (Z candidates)
```

If you see "using CPU fallback" instead, the GPU failed to initialise.

### Method 2: Enable Debug Mode

Set the environment variable to see detailed GPU diagnostics:

```bash
export NEAT_AI_DISCOVERY_GPU_DEBUG=1
```

This will print:

- GPU adapter information
- Device initialisation status
- Any fallback to CPU

> [!TIP]
> Enable `NEAT_AI_DISCOVERY_GPU_DEBUG=1` during initial setup to confirm that
> Metal is being picked up correctly. You can disable it once GPU usage is
> verified.

### Method 3: Check Return Values

The Rust analysis functions return a `gpuUsed` boolean field indicating whether
GPU was actually used:

```typescript
const result = analyzeSynapses(input);
if (result.gpuUsed) {
  console.log("GPU acceleration is active!");
} else {
  console.warn("Falling back to CPU - check GPU availability");
}
```

## 📊 Performance Considerations

### ⚡ GPU Utilisation Improvements (2 Jan 2025)

The code now batches multiple GPU operations together to improve utilisation:

- **Batched Evaluation**: Multiple synapse evaluations are collected and
  submitted together in batches of 32
- **Better GPU Saturation**: Instead of submitting one operation at a time and
  waiting, multiple operations are queued before waiting
- **Reduced Idle Time**: GPU spends less time waiting for CPU to prepare the
  next operation

### 🐢 Why It Might Still Be Slow

Even with GPU acceleration, discovery can be CPU-bound due to:

1. **Data I/O** - Reading Parquet files and preparing data for GPU
2. **Neuron Analysis** - Activation function evaluation (GELU, ELU, SELU, etc.)
   still runs on CPU
3. **Memory Transfers** - Copying data between CPU and GPU memory
4. **Small Workloads** - GPU overhead may not be worth it for small datasets

### 🖥️ Current GPU Usage

The implementation uses GPU for:

- ✅ Helpful synapse statistics (batched parallel evaluation)
- ✅ Harmful synapse statistics (parallel evaluation)
- ❌ Neuron activation function evaluation (still CPU-bound)

### 🚀 Future Optimisation Opportunities

- Move activation function evaluation to GPU
- Batch harmful synapse evaluations as well
- Optimise memory transfers with buffer reuse
- Use async GPU execution to overlap CPU/GPU work

## 🔧 Troubleshooting

### ❌ GPU Not Being Used

If you see "using CPU fallback" in logs:

1. **Check Metal Support**: Ensure your Mac supports Metal (all modern Macs do)
2. **Check Permissions**: Some systems may require GPU access permissions
3. **Check wgpu Version**: Ensure `wgpu = "0.19"` in `Cargo.toml`
4. **Rebuild**: After updating Rust code, rebuild the library:
   ```bash
   cd NEAT-AI-Discovery
   cargo build --release
   ```

> [!WARNING]
> If GPU initialisation fails and `requireGpu` is set to `true`, the discovery
> process will exit with an error rather than silently falling back to CPU. Set
> `requireGpu: false` if you need a CPU fallback in environments without Metal
> support.

### 🐌 Performance Issues

If GPU is active but still slow:

1. **Check Dataset Size**: GPU benefits increase with larger datasets
2. **Monitor GPU Usage**: Use Activity Monitor or `gpuStats` to verify GPU is
   being utilised
3. **Check Other Bottlenecks**: Parquet I/O, TypeScript processing, etc.

## ⚙️ Configuration

### 🔒 Requiring GPU (Default on Mac)

On macOS, GPU is required by default:

```typescript
requireGpu: Deno.build.os === "darwin"; // true on Mac
```

If GPU initialisation fails and `requireGpu` is true, discovery will fail with
an error. If false, it falls back to CPU.

### 🔓 Disabling GPU Requirement

To allow CPU fallback even on Mac:

```typescript
requireGpu: false; // Will use GPU if available, but fall back to CPU
```

## 🔬 Technical Details

### 💻 GPU Compute Shaders

The GPU uses two compute shaders:

1. **Helpful Synapse Shader** - Evaluates which synapses would help reduce error
2. **Harmful Synapse Shader** - Evaluates which existing synapses are harmful

Both use workgroup size of 256 threads for parallel processing.

### 🗂️ Memory Layout

Data is transferred to GPU as:

- `GpuHelpfulSample` - Activation and error pairs
- Results returned as `HelpfulContribution` or `HarmfulContribution`

## 📅 History

- **2 Jan 2025**: Initial GPU batching improvements for synapse evaluation.
- GPU acceleration is actively maintained as part of the NEAT-AI-Discovery Rust
  module.