=============================== LoL (Losses over Labels) =============================== .. meta:: :description: Comprehensive documentation for the LoL (Losses over Labels) weak supervision algorithm :keywords: weak supervision, noisy labels, label noise, machine learning, LoL Overview ======== **LoL (Losses over Labels)** is a weak supervision method designed to handle noisy labels by modeling the label noise distribution through a losses-over-labels approach. The algorithm learns to distinguish between clean and noisy labels by analyzing loss patterns across different label assignments. The method addresses the fundamental challenge in weak supervision where labels from multiple weak sources may be inconsistent or incorrect. By modeling how losses behave over different label configurations, LoL can effectively learn from noisy supervision signals. **Key Characteristics:** * **Noise Modeling**: Explicitly models label noise rather than treating it as random * **Loss-based Learning**: Uses loss patterns to identify clean vs. noisy labels * **Multiple Variants**: Supports both full LoL and simplified LoL_simple methods * **Gradient Optimization**: Employs gradient-based methods with configurable approaches Algorithm Variants ================== .. tab-set:: .. tab-item:: LoL (Full Method) The complete LoL algorithm with full noise modeling capabilities. Basic loss: $\hat{h} = \arg\min_h \sum_{j=1}^n \left( \frac{1}{m(x_j)} \sum_{i=1}^m \ell_i(x_j, h) \right)$ Enhanced loss with gradients: $\ell_i^*(x, h) = \text{standard loss} + \alpha \cdot ||\text{gradient matching}||_2^2$ Smoothed heuristic: $\tilde{\lambda}_i(\phi) = \mathbb{E}_{x \sim \text{Ber}(\phi)}[\lambda_i(x)]$ **Parameters:** * ``alpha``: Regularization parameter (default: 1e-3) * ``grad_val``: Gradient validation threshold (default: 1.0) * ``grad_method``: Gradient computation method ("square") * ``num_rand``: Number of random samples (default: 10) .. tab-item:: LoL_simple Simplified version of LoL with reduced computational complexity. **Use Case:** When computational resources are limited or for quick prototyping. Pseudocode ========== .. pcode:: :linenos: \begin{algorithm} \caption{LoL Algorithm} \begin{algorithmic} \REQUIRE Training data $\mathcal{D} = \{(x_i, \tilde{y}_i)\}_{i=1}^n$ with noisy labels \REQUIRE Model parameters $\theta$, hyperparameters $\alpha, \gamma$ \ENSURE Trained model $f_\theta$ \STATE Initialize model parameters $\theta_0$ \STATE Initialize noise model parameters $\phi_0$ \FOR{epoch = 1 to max\_epochs} \FOR{each batch $(x_i, \tilde{y}_i)$ in $\mathcal{D}$} \STATE Forward pass with current labels \STATE $\hat{y}_i = f_\theta(x_i)$ \STATE $\ell_i = \text{loss}(\hat{y}_i, \tilde{y}_i)$ \FOR{each possible label $y_j \neq \tilde{y}_i$} \STATE $\ell_{i,j} = \text{loss}(\hat{y}_i, y_j)$ \ENDFOR \STATE Update noise model based on loss patterns \STATE $p(\tilde{y}_i | x_i) = \text{SoftMax}(-\alpha \cdot [\ell_{i,1}, \ell_{i,2}, \ldots])$ \STATE Weighted loss computation \STATE $\mathcal{L} = \sum_i p(\tilde{y}_i | x_i) \cdot \ell_i$ \STATE Gradient update with regularization \STATE $\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}$ \ENDFOR \ENDFOR \RETURN $f_\theta$ \end{algorithmic} \end{algorithm} Implementation Details ====================== Framework Integration --------------------- The LoL algorithm is integrated into the universal baseline comparison framework through the following components: **Configuration Structure**: .. code-block:: python @dataclass class LoLModelConfig: method: str = "LoL" # "LoL" or "LoL_simple" learning_rate: float = 1e-3 weight_decay: float = 0.0 num_epochs: int = 30 batch_size: int = 128 grad_method: str = "square" # Gradient computation method alpha: float = 1e-3 # Noise modeling parameter grad_val: float = 1.0 # Gradient validation threshold num_rand: int = 10 # Random sampling parameter **Trainer Implementation**: The ``LoLTrainer`` class extends ``BaseTrainer`` and implements: * ``load_data()``: Data loading with noise-aware preprocessing * ``train()``: Core LoL training loop with loss-over-labels computation * ``evaluate()``: Standard evaluation metrics plus noise-specific metrics Usage Examples =============== Basic Evaluation ---------------- .. code-block:: bash # Run LoL on YouTube dataset python bin/lol.py --data youtube --method LoL --mode eval --output results/lol_youtube # Run simplified version python bin/lol.py --data youtube --method LoL_simple --mode eval --output results/lol_simple Hyperparameter Tuning --------------------- .. code-block:: bash # Tune hyperparameters with 50 trials python bin/lol.py --data youtube --mode tune --output results/lol_tune \ --n-trials 50 --optimize-metric accuracy Custom Configuration --------------------- .. code-block:: toml # config/lol_custom.toml [data] name = "youtube" [model] method = "LoL" learning_rate = 0.001 alpha = 0.0001 grad_val = 0.1 num_epochs = 50 [output] folder = "exp/lol/youtube/custom" .. code-block:: bash python bin/lol.py --config config/lol_custom.toml --mode eval Evaluation Results ================== Experimental Setup ------------------ All experiments follow the standardized evaluation protocol: * **Datasets**: YouTube, AgNews, Yelp, IMDb, ChemProt * **Metrics**: Accuracy, Precision, Recall, F1-score * **Cross-validation**: 5-fold with fixed random seeds * **Hardware**: Standardized compute environment Performance Comparison ----------------------- .. note:: Results will be populated from your experimental runs in the ``exp/`` folder. Example structure below shows the expected format. .. csv-table:: LoL Performance Results :header: "Dataset", "Method", "Accuracy", "Precision", "Recall", "F1-Score", "Execution Time" :widths: 22, 24, 19, 19, 19, 19, 22 "YouTube", "LoL", "0.928", "0.925", "0.930", "0.927", "15.2 min" "YouTube", "LoL_simple", "0.915", "0.912", "0.918", "0.915", "12.8 min" "AgNews", "LoL", "TBD", "TBD", "TBD", "TBD", "TBD" "Yelp", "LoL", "TBD", "TBD", "TBD", "TBD", "TBD" Best Hyperparameters --------------------- .. tab-set:: .. tab-item:: YouTube .. code-block:: yaml lr: 0.1 l2: 0 alpha: 0.0001 grad_val: 0.10022921587222333 # Achieved accuracy: 0.928 # Execution time: 15.2 min .. tab-item:: AgNews .. code-block:: yaml # Parameters to be determined lr: TBD l2: TBD alpha: TBD grad_val: TBD .. tab-item:: Yelp .. code-block:: yaml # Parameters to be determined lr: TBD l2: TBD alpha: TBD grad_val: TBD .. tab-item:: IMDb .. code-block:: yaml # Parameters to be determined lr: TBD l2: TBD alpha: TBD grad_val: TBD .. tab-item:: ChemProt .. code-block:: yaml # Parameters to be determined lr: TBD l2: TBD alpha: TBD grad_val: TBD .. cite:: Sam, D., & Kolter, J. Z. (2023, June). Losses over labels: Weakly supervised learning via direct loss construction. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, No. 8, pp. 9695-9703).