LoL (Losses over Labels)¶

Overview¶

LoL (Losses over Labels) is a weak supervision method designed to handle noisy labels by modeling the label noise distribution through a losses-over-labels approach. The algorithm learns to distinguish between clean and noisy labels by analyzing loss patterns across different label assignments.

The method addresses the fundamental challenge in weak supervision where labels from multiple weak sources may be inconsistent or incorrect. By modeling how losses behave over different label configurations, LoL can effectively learn from noisy supervision signals.

Key Characteristics:

Noise Modeling: Explicitly models label noise rather than treating it as random
Loss-based Learning: Uses loss patterns to identify clean vs. noisy labels
Multiple Variants: Supports both full LoL and simplified LoL_simple methods
Gradient Optimization: Employs gradient-based methods with configurable approaches

Algorithm Variants¶

LoL (Full Method)

The complete LoL algorithm with full noise modeling capabilities.

Basic loss: \(\hat{h} = \arg\min_h \sum_{j=1}^n \left( \frac{1}{m(x_j)} \sum_{i=1}^m \ell_i(x_j, h) \right)\)

Enhanced loss with gradients: \(\ell_i^*(x, h) = \text{standard loss} + \alpha \cdot ||\text{gradient matching}||_2^2\)

Smoothed heuristic: \(\tilde{\lambda}_i(\phi) = \mathbb{E}_{x \sim \text{Ber}(\phi)}[\lambda_i(x)]\)

Parameters:

alpha: Regularization parameter (default: 1e-3)
grad_val: Gradient validation threshold (default: 1.0)
grad_method: Gradient computation method (“square”)
num_rand: Number of random samples (default: 10)

LoL_simple

Simplified version of LoL with reduced computational complexity.

Use Case: When computational resources are limited or for quick prototyping.

Pseudocode¶

            \begin{algorithm}
\caption{LoL Algorithm}
\begin{algorithmic}
\REQUIRE Training data $\mathcal{D} = \{(x_i, \tilde{y}_i)\}_{i=1}^n$ with noisy labels
\REQUIRE Model parameters $\theta$, hyperparameters $\alpha, \gamma$
\ENSURE Trained model $f_\theta$
\STATE Initialize model parameters $\theta_0$
\STATE Initialize noise model parameters $\phi_0$
\FOR{epoch = 1 to max\_epochs}
   \FOR{each batch $(x_i, \tilde{y}_i)$ in $\mathcal{D}$}
      \STATE Forward pass with current labels
      \STATE $\hat{y}_i = f_\theta(x_i)$
      \STATE $\ell_i = \text{loss}(\hat{y}_i, \tilde{y}_i)$
      \FOR{each possible label $y_j \neq \tilde{y}_i$}
         \STATE $\ell_{i,j} = \text{loss}(\hat{y}_i, y_j)$
      \ENDFOR
      \STATE Update noise model based on loss patterns
      \STATE $p(\tilde{y}_i | x_i) = \text{SoftMax}(-\alpha \cdot [\ell_{i,1}, \ell_{i,2}, \ldots])$
      \STATE Weighted loss computation
      \STATE $\mathcal{L} = \sum_i p(\tilde{y}_i | x_i) \cdot \ell_i$
      \STATE Gradient update with regularization
      \STATE $\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}$
   \ENDFOR
\ENDFOR
\RETURN $f_\theta$
\end{algorithmic}
\end{algorithm}

Implementation Details¶

Framework Integration¶

The LoL algorithm is integrated into the universal baseline comparison framework through the following components:

Configuration Structure:

@dataclass
class LoLModelConfig:
    method: str = "LoL"  # "LoL" or "LoL_simple"
    learning_rate: float = 1e-3
    weight_decay: float = 0.0
    num_epochs: int = 30
    batch_size: int = 128
    grad_method: str = "square"  # Gradient computation method
    alpha: float = 1e-3  # Noise modeling parameter
    grad_val: float = 1.0  # Gradient validation threshold
    num_rand: int = 10  # Random sampling parameter

Trainer Implementation:

The LoLTrainer class extends BaseTrainer and implements:

load_data(): Data loading with noise-aware preprocessing
train(): Core LoL training loop with loss-over-labels computation
evaluate(): Standard evaluation metrics plus noise-specific metrics

Usage Examples¶

Basic Evaluation¶

# Run LoL on YouTube dataset
python bin/lol.py --data youtube --method LoL --mode eval --output results/lol_youtube

# Run simplified version
python bin/lol.py --data youtube --method LoL_simple --mode eval --output results/lol_simple

Hyperparameter Tuning¶

# Tune hyperparameters with 50 trials
python bin/lol.py --data youtube --mode tune --output results/lol_tune \
                  --n-trials 50 --optimize-metric accuracy

Custom Configuration¶

# config/lol_custom.toml
[data]
name = "youtube"

[model]
method = "LoL"
learning_rate = 0.001
alpha = 0.0001
grad_val = 0.1
num_epochs = 50

[output]
folder = "exp/lol/youtube/custom"

python bin/lol.py --config config/lol_custom.toml --mode eval

Evaluation Results¶

Experimental Setup¶

All experiments follow the standardized evaluation protocol:

Datasets: YouTube, AgNews, Yelp, IMDb, ChemProt
Metrics: Accuracy, Precision, Recall, F1-score
Cross-validation: 5-fold with fixed random seeds
Hardware: Standardized compute environment

Performance Comparison¶

Note

Results will be populated from your experimental runs in the exp/ folder. Example structure below shows the expected format.

Table 1 LoL Performance Results¶
Dataset	Method	Accuracy	Precision	Recall	F1-Score	Execution Time
YouTube	LoL	0.928	0.925	0.930	0.927	15.2 min
YouTube	LoL_simple	0.915	0.912	0.918	0.915	12.8 min
AgNews	LoL	TBD	TBD	TBD	TBD	TBD
Yelp	LoL	TBD	TBD	TBD	TBD	TBD

Best Hyperparameters¶

YouTube

lr: 0.1
l2: 0
alpha: 0.0001
grad_val: 0.10022921587222333
# Achieved accuracy: 0.928
# Execution time: 15.2 min

AgNews

# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD

Yelp

# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD

IMDb

# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD

ChemProt

# Parameters to be determined
lr: TBD
l2: TBD
alpha: TBD
grad_val: TBD

Sam, D., & Kolter, J. Z. (2023, June). Losses over labels: Weakly supervised learning via direct loss construction. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 8, pp. 9695-9703).