===============================
LoL (Losses over Labels)
===============================

.. meta::
   :description: Comprehensive documentation for the LoL (Losses over Labels) weak supervision algorithm
   :keywords: weak supervision, noisy labels, label noise, machine learning, LoL

Overview
========

**LoL (Losses over Labels)** is a weak supervision method designed to handle noisy labels by modeling the label noise distribution through a losses-over-labels approach. The algorithm learns to distinguish between clean and noisy labels by analyzing loss patterns across different label assignments.

The method addresses the fundamental challenge in weak supervision where labels from multiple weak sources may be inconsistent or incorrect. By modeling how losses behave over different label configurations, LoL can effectively learn from noisy supervision signals.

**Key Characteristics:**

* **Noise Modeling**: Explicitly models label noise rather than treating it as random
* **Loss-based Learning**: Uses loss patterns to identify clean vs. noisy labels  
* **Multiple Variants**: Supports both full LoL and simplified LoL_simple methods
* **Gradient Optimization**: Employs gradient-based methods with configurable approaches

Algorithm Variants
==================

.. tab-set::

    .. tab-item:: LoL (Full Method)
        
        The complete LoL algorithm with full noise modeling capabilities.

        Basic loss: $\hat{h} = \arg\min_h \sum_{j=1}^n \left( \frac{1}{m(x_j)} \sum_{i=1}^m \ell_i(x_j, h) \right)$

        Enhanced loss with gradients: $\ell_i^*(x, h) = \text{standard loss} + \alpha \cdot ||\text{gradient matching}||_2^2$

        Smoothed heuristic: $\tilde{\lambda}_i(\phi) = \mathbb{E}_{x \sim \text{Ber}(\phi)}[\lambda_i(x)]$

        **Parameters:**
        
        * ``alpha``: Regularization parameter (default: 1e-3)
        * ``grad_val``: Gradient validation threshold (default: 1.0) 
        * ``grad_method``: Gradient computation method ("square")
        * ``num_rand``: Number of random samples (default: 10)

    .. tab-item:: LoL_simple
        
        Simplified version of LoL with reduced computational complexity.
        
        **Use Case:** When computational resources are limited or for quick prototyping.

Pseudocode
==========

.. pcode:: 
   :linenos:

   \begin{algorithm}
   \caption{LoL Algorithm}
   \begin{algorithmic}
   \REQUIRE Training data $\mathcal{D} = \{(x_i, \tilde{y}_i)\}_{i=1}^n$ with noisy labels
   \REQUIRE Model parameters $\theta$, hyperparameters $\alpha, \gamma$
   \ENSURE Trained model $f_\theta$
   \STATE Initialize model parameters $\theta_0$
   \STATE Initialize noise model parameters $\phi_0$
   \FOR{epoch = 1 to max\_epochs}
      \FOR{each batch $(x_i, \tilde{y}_i)$ in $\mathcal{D}$}
         \STATE Forward pass with current labels
         \STATE $\hat{y}_i = f_\theta(x_i)$
         \STATE $\ell_i = \text{loss}(\hat{y}_i, \tilde{y}_i)$
         \FOR{each possible label $y_j \neq \tilde{y}_i$}
            \STATE $\ell_{i,j} = \text{loss}(\hat{y}_i, y_j)$
         \ENDFOR
         \STATE Update noise model based on loss patterns
         \STATE $p(\tilde{y}_i | x_i) = \text{SoftMax}(-\alpha \cdot [\ell_{i,1}, \ell_{i,2}, \ldots])$
         \STATE Weighted loss computation
         \STATE $\mathcal{L} = \sum_i p(\tilde{y}_i | x_i) \cdot \ell_i$
         \STATE Gradient update with regularization
         \STATE $\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}$
      \ENDFOR
   \ENDFOR
   \RETURN $f_\theta$
   \end{algorithmic}
   \end{algorithm}

Implementation Details
======================

Framework Integration
---------------------

The LoL algorithm is integrated into the universal baseline comparison framework through the following components:

**Configuration Structure**:

.. code-block:: python

   @dataclass
   class LoLModelConfig:
       method: str = "LoL"  # "LoL" or "LoL_simple" 
       learning_rate: float = 1e-3
       weight_decay: float = 0.0
       num_epochs: int = 30
       batch_size: int = 128
       grad_method: str = "square"  # Gradient computation method
       alpha: float = 1e-3  # Noise modeling parameter
       grad_val: float = 1.0  # Gradient validation threshold
       num_rand: int = 10  # Random sampling parameter

**Trainer Implementation**:

The ``LoLTrainer`` class extends ``BaseTrainer`` and implements:

* ``load_data()``: Data loading with noise-aware preprocessing
* ``train()``: Core LoL training loop with loss-over-labels computation
* ``evaluate()``: Standard evaluation metrics plus noise-specific metrics

Usage Examples
===============

Basic Evaluation
----------------

.. code-block:: bash

   # Run LoL on YouTube dataset
   python bin/lol.py --data youtube --method LoL --mode eval --output results/lol_youtube
   
   # Run simplified version
   python bin/lol.py --data youtube --method LoL_simple --mode eval --output results/lol_simple

Hyperparameter Tuning
---------------------

.. code-block:: bash

   # Tune hyperparameters with 50 trials
   python bin/lol.py --data youtube --mode tune --output results/lol_tune \
                     --n-trials 50 --optimize-metric accuracy

Custom Configuration
---------------------

.. code-block:: toml
   
   # config/lol_custom.toml
   [data]
   name = "youtube"
   
   [model]
   method = "LoL"
   learning_rate = 0.001
   alpha = 0.0001
   grad_val = 0.1
   num_epochs = 50
   
   [output]
   folder = "exp/lol/youtube/custom"

.. code-block:: bash

   python bin/lol.py --config config/lol_custom.toml --mode eval

Evaluation Results
==================

Experimental Setup
------------------

All experiments follow the standardized evaluation protocol:

* **Datasets**: YouTube, AgNews, Yelp, IMDb, ChemProt
* **Metrics**: Accuracy, Precision, Recall, F1-score
* **Cross-validation**: 5-fold with fixed random seeds
* **Hardware**: Standardized compute environment

Performance Comparison
-----------------------

.. note::
   Results will be populated from your experimental runs in the ``exp/`` folder.
   Example structure below shows the expected format.

.. csv-table:: LoL Performance Results
   :header: "Dataset", "Method", "Accuracy", "Precision", "Recall", "F1-Score", "Execution Time"
   :widths: 22, 24, 19, 19, 19, 19, 22
   
   "YouTube", "LoL", "0.928", "0.925", "0.930", "0.927", "15.2 min"
   "YouTube", "LoL_simple", "0.915", "0.912", "0.918", "0.915", "12.8 min"
   "AgNews", "LoL", "TBD", "TBD", "TBD", "TBD", "TBD"
   "Yelp", "LoL", "TBD", "TBD", "TBD", "TBD", "TBD"

Best Hyperparameters
---------------------

.. tab-set::

    .. tab-item:: YouTube
    
        .. code-block:: yaml
        
           lr: 0.1
           l2: 0
           alpha: 0.0001
           grad_val: 0.10022921587222333
           # Achieved accuracy: 0.928
           # Execution time: 15.2 min
    
    .. tab-item:: AgNews
    
        .. code-block:: yaml
        
           # Parameters to be determined
           lr: TBD
           l2: TBD
           alpha: TBD
           grad_val: TBD
    
    .. tab-item:: Yelp
    
        .. code-block:: yaml
        
           # Parameters to be determined
           lr: TBD
           l2: TBD
           alpha: TBD
           grad_val: TBD
    
    .. tab-item:: IMDb
    
        .. code-block:: yaml
        
           # Parameters to be determined
           lr: TBD
           l2: TBD
           alpha: TBD
           grad_val: TBD
    
    .. tab-item:: ChemProt
    
        .. code-block:: yaml
        
           # Parameters to be determined
           lr: TBD
           l2: TBD
           alpha: TBD
           grad_val: TBD

.. cite::
   Sam, D., & Kolter, J. Z. (2023, June). Losses over labels: Weakly supervised learning via direct loss construction. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 8, pp. 9695-9703).