Weights & Biases Experiment Tracking Tutorial

Introduction

Weights & Biases (WandB) is a comprehensive machine learning platform that provides experiment tracking, model management, and collaboration tools for ML teams. It offers powerful visualization capabilities, hyperparameter optimization, and seamless integration with popular ML frameworks.

Note

WandB can be integrated into ML-Train for robust experiment tracking with cloud-based storage and advanced visualization features.

What Makes WandB Special

WandB provides a comprehensive ML platform with powerful tracking capabilities. Key advantages include:

  • Rich Visualizations: Interactive charts, plots, and dashboards

  • Cloud Storage: Secure cloud-based experiment storage and sharing

  • Team Collaboration: Share experiments and insights with team members

  • Hyperparameter Optimization: Built-in sweep functionality for automated hyperparameter tuning

  • Model Registry: Track and version your trained models

  • Framework Integration: Native support for PyTorch, TensorFlow, Keras, and more

Getting Started

Basic Tracking

Here’s your first WandB experiment:

import wandb
import math

# Initialize a new run
wandb.init(project="my-first-project", name="basic-experiment")

# Track metrics during training
for step in range(100):
    loss = math.exp(-step/50) + 0.1 * math.sin(step/10)
    accuracy = 1 - math.exp(-step/30)

    wandb.log({
        "loss": loss,
        "accuracy": accuracy
    }, step=step)

# Finish the run
wandb.finish()

Important

Always call wandb.finish() at the end of your training script to ensure all data is properly uploaded.

Advanced Tracking

Configuration and Hyperparameters

Track your experiment configuration:

import wandb

# Define configuration
config = {
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 10,
    "model_type": "transformer",
    "hidden_size": 768
}

# Initialize with config
wandb.init(
    project="my-project",
    config=config,
    name="transformer-experiment"
)

# Access config during training
lr = wandb.config.learning_rate
batch_size = wandb.config.batch_size

Multiple Metrics at Once

Log multiple metrics simultaneously:

# Log multiple metrics in one call
wandb.log({
    "train/loss": train_loss,
    "train/accuracy": train_acc,
    "val/loss": val_loss,
    "val/accuracy": val_acc,
    "learning_rate": current_lr,
    "epoch": epoch
}, step=global_step)

Tracking Rich Objects

Images and Plots

WandB provides excellent support for tracking images and matplotlib figures:

import wandb
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

wandb.init(project="image-tracking")

# Track matplotlib figures
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 2])
ax.set_title("Training Progress")

wandb.log({"training_plot": wandb.Image(fig)})
plt.close(fig)

# Track PIL images
img = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))
wandb.log({"sample_image": wandb.Image(img, caption="Generated Sample")})

# Track image arrays directly
img_array = np.random.random((32, 32, 3))
wandb.log({"numpy_image": wandb.Image(img_array)})

Tables and DataFrames

Track structured data with WandB Tables:

import pandas as pd
import wandb

# Create a table from pandas DataFrame
df = pd.DataFrame({
    "epoch": [1, 2, 3, 4, 5],
    "train_loss": [0.8, 0.6, 0.4, 0.3, 0.2],
    "val_loss": [0.9, 0.7, 0.5, 0.4, 0.3],
    "accuracy": [0.7, 0.8, 0.85, 0.9, 0.92]
})

table = wandb.Table(dataframe=df)
wandb.log({"results_table": table})

# Create table manually
columns = ["image", "prediction", "target", "correct"]
data = []

for i in range(10):
    img = wandb.Image(sample_images[i])
    data.append([img, predictions[i], targets[i], predictions[i] == targets[i]])

table = wandb.Table(data=data, columns=columns)
wandb.log({"predictions": table})

Audio and Video

Track multimedia content:

import wandb

# Track audio files
wandb.log({
    "generated_audio": wandb.Audio("path/to/audio.wav", caption="Generated Speech"),
    "sample_rate": 22050
})

# Track video files
wandb.log({
    "training_animation": wandb.Video("path/to/video.mp4", caption="Training Progress")
})

# Track audio arrays
audio_data = np.random.randn(22050 * 2)  # 2 seconds of random audio
wandb.log({
    "numpy_audio": wandb.Audio(audio_data, sample_rate=22050, caption="Random Audio")
})

3D Objects and Point Clouds

Track 3D data for computer vision and robotics applications:

import numpy as np
import wandb

# Track 3D point clouds
points = np.random.uniform(-1, 1, (1000, 3))
colors = np.random.randint(0, 255, (1000, 3))

point_cloud = wandb.Object3D({
    "type": "lidar/beta",
    "points": points,
    "colors": colors
})

wandb.log({"point_cloud": point_cloud})

Model Checkpoints and Artifacts

Artifacts System

WandB Artifacts provide versioned storage for datasets, models, and other files:

import wandb

wandb.init(project="artifact-example")

# Save a model as an artifact
artifact = wandb.Artifact("my-model", type="model")
artifact.add_file("model.pth")
artifact.add_file("config.json")
wandb.log_artifact(artifact)

# Use an artifact from another run
run = wandb.init(project="artifact-example")
artifact = run.use_artifact("my-model:latest")
artifact_dir = artifact.download()

Model Checkpointing

Automatically save model checkpoints during training:

import torch
import wandb

wandb.init(project="checkpoint-example")

# During training loop
for epoch in range(num_epochs):
    # ... training code ...

    # Save checkpoint
    if epoch % 5 == 0:
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
        }

        torch.save(checkpoint, f'checkpoint_epoch_{epoch}.pth')

        # Log as artifact
        artifact = wandb.Artifact(f"model-checkpoint", type="model")
        artifact.add_file(f'checkpoint_epoch_{epoch}.pth')
        wandb.log_artifact(artifact)

Hyperparameter Optimization

WandB Sweeps

Automate hyperparameter optimization with WandB Sweeps:

# sweep_config.yaml
program: train.py
method: bayes
metric:
  goal: maximize
  name: val_accuracy
parameters:
  learning_rate:
    distribution: log_uniform_values
    min: 0.0001
    max: 0.1
  batch_size:
    values: [16, 32, 64, 128]
  hidden_size:
    values: [128, 256, 512, 1024]
  dropout:
    distribution: uniform
    min: 0.1
    max: 0.5

Create and run the sweep:

import wandb
import yaml

# Load sweep configuration
with open('sweep_config.yaml') as f:
    sweep_config = yaml.safe_load(f)

# Create sweep
sweep_id = wandb.sweep(sweep_config, project="hyperparameter-optimization")

# Run sweep agents
wandb.agent(sweep_id, function=train, count=50)

Training Function for Sweeps

def train():
    # Initialize wandb
    wandb.init()

    # Get hyperparameters from sweep
    config = wandb.config

    # Build model with sweep parameters
    model = build_model(
        hidden_size=config.hidden_size,
        dropout=config.dropout
    )

    optimizer = torch.optim.Adam(
        model.parameters(),
        lr=config.learning_rate
    )

    # Training loop
    for epoch in range(num_epochs):
        train_loss, train_acc = train_epoch(model, train_loader, optimizer)
        val_loss, val_acc = validate(model, val_loader)

        # Log metrics
        wandb.log({
            "epoch": epoch,
            "train_loss": train_loss,
            "train_accuracy": train_acc,
            "val_loss": val_loss,
            "val_accuracy": val_acc
        })

Framework Integrations

WandB provides seamless PyTorch integration:

import torch
import torch.nn as nn
import wandb

# Initialize wandb
wandb.init(project="pytorch-integration")

# Log model architecture
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(128, 10)
)

wandb.watch(model, log_freq=100)  # Log gradients and parameters

# Training loop
for epoch in range(num_epochs):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = nn.CrossEntropyLoss()(output, target)
        loss.backward()
        optimizer.step()

        # Log metrics
        if batch_idx % 100 == 0:
            wandb.log({
                "batch_loss": loss.item(),
                "epoch": epoch,
                "batch": batch_idx
            })

Use WandB with Hugging Face Transformers:

from transformers import TrainingArguments, Trainer
import wandb

# Initialize wandb
wandb.init(project="huggingface-integration")

# Setup training arguments with wandb
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    report_to="wandb",  # Enable wandb logging
    run_name="bert-fine-tuning"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()
import tensorflow as tf
from wandb.keras import WandbCallback
import wandb

# Initialize wandb
wandb.init(project="tensorflow-integration")

# Build model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train with WandB callback
model.fit(
    x_train, y_train,
    batch_size=32,
    epochs=10,
    validation_data=(x_val, y_val),
    callbacks=[WandbCallback()]
)

Best Practices

Experiment Organization

  1. Use Meaningful Project Names:

wandb.init(
    project="image-classification-resnet",
    name=f"resnet50-lr{learning_rate}-bs{batch_size}",
    tags=["resnet", "baseline", "imagenet"]
)
  1. Consistent Naming Conventions:

# Use hierarchical metric names
wandb.log({
    "train/loss": train_loss,
    "train/accuracy": train_acc,
    "val/loss": val_loss,
    "val/accuracy": val_acc,
    "optimizer/learning_rate": current_lr
})
  1. Use Tags and Groups:

wandb.init(
    project="my-project",
    group="experiment-1",  # Group related runs
    tags=["baseline", "bert", "fine-tuning"],  # Add searchable tags
    notes="Initial baseline with default hyperparameters"
)

Configuration Management

Store comprehensive experiment configuration:

config = {
    # Model config
    "model": {
        "type": "transformer",
        "num_layers": 12,
        "hidden_size": 768,
        "num_heads": 12,
        "dropout": 0.1
    },

    # Training config
    "training": {
        "learning_rate": 2e-5,
        "batch_size": 32,
        "num_epochs": 10,
        "warmup_steps": 1000,
        "weight_decay": 0.01
    },

    # Data config
    "data": {
        "dataset": "imdb",
        "max_length": 512,
        "train_size": 25000,
        "val_size": 5000
    },

    # Environment
    "environment": {
        "gpu_type": "V100",
        "pytorch_version": "1.9.0",
        "cuda_version": "11.1"
    }
}

wandb.init(project="my-project", config=config)

Error Handling and Robustness

import wandb
import sys

try:
    wandb.init(project="robust-training")

    # Training code here
    for epoch in range(num_epochs):
        try:
            train_loss = train_epoch()
            val_loss = validate()

            wandb.log({
                "train_loss": train_loss,
                "val_loss": val_loss,
                "epoch": epoch
            })

        except Exception as e:
            wandb.log({"error": str(e), "epoch": epoch})
            print(f"Error in epoch {epoch}: {e}")
            continue

except KeyboardInterrupt:
    print("Training interrupted by user")
finally:
    wandb.finish()

Team Collaboration

Sharing and Reports

Create shareable reports for team collaboration:

# Add run notes and documentation
wandb.init(
    project="team-project",
    notes="""
    ## Experiment Goals
    - Test new attention mechanism
    - Compare with baseline transformer

    ## Key Findings
    - 15% improvement in accuracy
    - 2x faster convergence

    ## Next Steps
    - Scale to larger dataset
    - Test on additional tasks
    """
)

# Log important insights
wandb.log({
    "insight": "Attention mechanism shows significant improvement",
    "recommendation": "Deploy to production pipeline"
})

Model Registry

Use WandB Model Registry for model versioning:

import wandb

# After training
wandb.init(project="model-registry-demo")

# Log model to registry
artifact = wandb.Artifact("sentiment-classifier", type="model")
artifact.add_file("model.pth")
artifact.add_file("tokenizer.json")
artifact.add_file("config.json")

# Add metadata
artifact.metadata = {
    "accuracy": 0.95,
    "f1_score": 0.94,
    "training_data": "imdb-50k",
    "framework": "pytorch"
}

wandb.log_artifact(artifact)

# Link to model registry
wandb.link_artifact(artifact, "model-registry/sentiment-classifier")

Troubleshooting

Common Issues

Slow Upload Speeds: Configure WandB for better performance:

import os

# Reduce upload frequency
os.environ["WANDB_LOG_INTERNAL"] = "false"

wandb.init(
    project="my-project",
    settings=wandb.Settings(
        _disable_stats=True,  # Disable system stats
        _disable_meta=True    # Disable metadata collection
    )
)

Authentication Issues: Check your API key setup:

# Verify login status
wandb verify

# Re-login if needed
wandb login --relogin

Offline Mode: Run experiments without internet connection:

import os
os.environ["WANDB_MODE"] = "offline"

wandb.init(project="offline-project")
# Your training code here
wandb.finish()

# Sync when online
# wandb sync wandb/offline-run-*/

Memory Issues: For large experiments:

# Reduce logging frequency
wandb.init(
    project="large-experiment",
    settings=wandb.Settings(
        log_internal=None,
        save_code=False,
        disable_git=True
    )
)

# Log less frequently
if step % 100 == 0:  # Log every 100 steps instead of every step
    wandb.log(metrics, step=step)

Migration and Integration

From TensorBoard

Convert existing TensorBoard logs:

# Install tensorboard integration
pip install wandb[tensorboard]

# Sync TensorBoard logs
wandb sync --tensorboard ./tensorboard_logs

From Other Platforms

# Import existing experiment data
import wandb
import pandas as pd

# Read existing experiment results
df = pd.read_csv("previous_experiments.csv")

for _, row in df.iterrows():
    wandb.init(
        project="migrated-experiments",
        name=row["experiment_name"],
        config=row["config"],
        reinit=True
    )

    # Log historical results
    for metric in ["loss", "accuracy", "f1_score"]:
        if metric in row:
            wandb.log({metric: row[metric]})

    wandb.finish()

Conclusion

WandB provides a comprehensive platform for machine learning experiment tracking with powerful visualization, collaboration, and optimization features. Key takeaways:

  • Start Simple: Begin with basic metric and config tracking

  • Leverage Rich Media: Use images, tables, and multimedia logging

  • Optimize Systematically: Use Sweeps for hyperparameter optimization

  • Collaborate Effectively: Share experiments and insights with your team

  • Scale Intelligently: Use artifacts and model registry for production workflows

With WandB, you can build a complete MLOps pipeline from experimentation to production, ensuring reproducibility and enabling effective collaboration across your ML team.

Tip

Create your first WandB project today by signing up at https://wandb.ai and running your first experiment. The platform’s intuitive interface and powerful features will transform how you approach ML experimentation.