LoL (Losses over Labels)

Overview

LoL (Losses over Labels) is a weak supervision method designed to handle noisy labels by modeling the label noise distribution through a losses-over-labels approach. The algorithm learns to distinguish between clean and noisy labels by analyzing loss patterns across different label assignments.

The method addresses the fundamental challenge in weak supervision where labels from multiple weak sources may be inconsistent or incorrect. By modeling how losses behave over different label configurations, LoL can effectively learn from noisy supervision signals.

Key Characteristics:

  • Noise Modeling: Explicitly models label noise rather than treating it as random

  • Loss-based Learning: Uses loss patterns to identify clean vs. noisy labels

  • Multiple Variants: Supports both full LoL and simplified LoL_simple methods

  • Gradient Optimization: Employs gradient-based methods with configurable approaches

Algorithm Variants

The complete LoL algorithm with full noise modeling capabilities.

Basic loss: \(\hat{h} = \arg\min_h \sum_{j=1}^n \left( \frac{1}{m(x_j)} \sum_{i=1}^m \ell_i(x_j, h) \right)\)

Enhanced loss with gradients: \(\ell_i^*(x, h) = \text{standard loss} + \alpha \cdot ||\text{gradient matching}||_2^2\)

Smoothed heuristic: \(\tilde{\lambda}_i(\phi) = \mathbb{E}_{x \sim \text{Ber}(\phi)}[\lambda_i(x)]\)

Parameters:

  • alpha: Regularization parameter (default: 1e-3)

  • grad_val: Gradient validation threshold (default: 1.0)

  • grad_method: Gradient computation method (“square”)

  • num_rand: Number of random samples (default: 10)

Simplified version of LoL with reduced computational complexity.

Use Case: When computational resources are limited or for quick prototyping.

Pseudocode

            \begin{algorithm}
\caption{LoL Algorithm}
\begin{algorithmic}
\REQUIRE Training data $\mathcal{D} = \{(x_i, \tilde{y}_i)\}_{i=1}^n$ with noisy labels
\REQUIRE Model parameters $\theta$, hyperparameters $\alpha, \gamma$
\ENSURE Trained model $f_\theta$
\STATE Initialize model parameters $\theta_0$
\STATE Initialize noise model parameters $\phi_0$
\FOR{epoch = 1 to max\_epochs}
   \FOR{each batch $(x_i, \tilde{y}_i)$ in $\mathcal{D}$}
      \STATE Forward pass with current labels
      \STATE $\hat{y}_i = f_\theta(x_i)$
      \STATE $\ell_i = \text{loss}(\hat{y}_i, \tilde{y}_i)$
      \FOR{each possible label $y_j \neq \tilde{y}_i$}
         \STATE $\ell_{i,j} = \text{loss}(\hat{y}_i, y_j)$
      \ENDFOR
      \STATE Update noise model based on loss patterns
      \STATE $p(\tilde{y}_i | x_i) = \text{SoftMax}(-\alpha \cdot [\ell_{i,1}, \ell_{i,2}, \ldots])$
      \STATE Weighted loss computation
      \STATE $\mathcal{L} = \sum_i p(\tilde{y}_i | x_i) \cdot \ell_i$
      \STATE Gradient update with regularization
      \STATE $\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}$
   \ENDFOR
\ENDFOR
\RETURN $f_\theta$
\end{algorithmic}
\end{algorithm}
        

Implementation Details

Framework Integration

The LoL algorithm is integrated into the universal baseline comparison framework through the following components:

Configuration Structure:

@dataclass
class LoLModelConfig:
    method: str = "LoL"  # "LoL" or "LoL_simple"
    learning_rate: float = 1e-3
    weight_decay: float = 0.0
    num_epochs: int = 30
    batch_size: int = 128
    grad_method: str = "square"  # Gradient computation method
    alpha: float = 1e-3  # Noise modeling parameter
    grad_val: float = 1.0  # Gradient validation threshold
    num_rand: int = 10  # Random sampling parameter

Trainer Implementation:

The LoLTrainer class extends BaseTrainer and implements:

  • load_data(): Data loading with noise-aware preprocessing

  • train(): Core LoL training loop with loss-over-labels computation

  • evaluate(): Standard evaluation metrics plus noise-specific metrics

Usage Examples

Basic Evaluation

# Run LoL on YouTube dataset
python bin/lol.py --data youtube --method LoL --mode eval --output results/lol_youtube

# Run simplified version
python bin/lol.py --data youtube --method LoL_simple --mode eval --output results/lol_simple

Hyperparameter Tuning

# Tune hyperparameters with 50 trials
python bin/lol.py --data youtube --mode tune --output results/lol_tune \
                  --n-trials 50 --optimize-metric accuracy

Custom Configuration

# config/lol_custom.toml
[data]
name = "youtube"

[model]
method = "LoL"
learning_rate = 0.001
alpha = 0.0001
grad_val = 0.1
num_epochs = 50

[output]
folder = "exp/lol/youtube/custom"
python bin/lol.py --config config/lol_custom.toml --mode eval

Evaluation Results

Experimental Setup

All experiments follow the standardized evaluation protocol:

  • Datasets: YouTube, AgNews, Yelp, IMDb, ChemProt

  • Metrics: Accuracy, Precision, Recall, F1-score

  • Cross-validation: 5-fold with fixed random seeds

  • Hardware: Standardized compute environment

Performance Comparison

Note

Results will be populated from your experimental runs in the exp/ folder. Example structure below shows the expected format.

Table 1 LoL Performance Results

Dataset

Method

Accuracy

Precision

Recall

F1-Score

Execution Time

YouTube

LoL

0.928

0.925

0.930

0.927

15.2 min

YouTube

LoL_simple

0.915

0.912

0.918

0.915

12.8 min

AgNews

LoL

TBD

TBD

TBD

TBD

TBD

Yelp

LoL

TBD

TBD

TBD

TBD

TBD

Best Hyperparameters

lr: 0.1
l2: 0
alpha: 0.0001
grad_val: 0.10022921587222333
# Achieved accuracy: 0.928
# Execution time: 15.2 min
# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD
# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD
# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD
# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD

Sam, D., & Kolter, J. Z. (2023, June). Losses over labels: Weakly supervised learning via direct loss construction. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 8, pp. 9695-9703).