src.common.Evaluator

class src.common.Evaluator[source]

Bases: BaseEvaluator

Extended evaluator with weak supervision specific metrics

Methods

compute_adjusted_accuracy(soft_predictions, ...)

Compute accuracy that handles abstaining predictions

compute_confusion_matrix(predictions, labels)

Compute confusion matrix

compute_coverage(predictions[, ...])

Compute coverage (percentage of non-abstaining predictions)

compute_coverage_metrics(prediction_probs[, ...])

Compute coverage and confidence metrics from prediction probabilities

compute_metrics(predictions, labels[, average])

Compute standard classification metrics with optional weak supervision metrics

compute_weak_supervision_metrics(...)

Compute metrics specific to weak supervision using Snorkel's LFAnalysis

evaluate_weak_supervision_method(results, labels)

Comprehensive evaluation for weak supervision methods

static compute_adjusted_accuracy(soft_predictions, labels)[source]

Compute accuracy that handles abstaining predictions

Args:

soft_predictions: Soft prediction probabilities labels: True labels

Returns:

Adjusted accuracy

Parameters:
  • soft_predictions (numpy.ndarray)

  • labels (numpy.ndarray)

Return type:

float

static compute_confusion_matrix(predictions, labels)[source]

Compute confusion matrix

Args:

predictions: Predicted labels labels: True labels

Returns:

Confusion matrix

Parameters:
  • predictions (numpy.ndarray)

  • labels (numpy.ndarray)

Return type:

numpy.ndarray

static compute_coverage(predictions, abstain_threshold=0.001)[source]

Compute coverage (percentage of non-abstaining predictions)

Args:

predictions: Soft predictions or confidence scores abstain_threshold: Threshold below which predictions are considered abstaining

Returns:

Coverage ratio

Parameters:
  • predictions (numpy.ndarray)

  • abstain_threshold (float)

Return type:

float

static compute_coverage_metrics(prediction_probs, confidence_threshold=0.5)[source]

Compute coverage and confidence metrics from prediction probabilities

Args:

prediction_probs: Prediction probabilities/logits confidence_threshold: Threshold for confident predictions

Returns:

Dictionary of coverage metrics

Parameters:
  • prediction_probs (numpy.ndarray)

  • confidence_threshold (float)

Return type:

Dict[str, float]

static compute_metrics(predictions, labels, average='weighted', **kwargs)[source]

Compute standard classification metrics with optional weak supervision metrics

Args:

predictions: Predicted labels labels: True labels average: Averaging strategy for multi-class metrics kwargs: Additional arguments (weak_labels, prediction_probs, loss_info, etc.)

Returns:

Dictionary of metrics

Parameters:
  • predictions (numpy.ndarray)

  • labels (numpy.ndarray)

  • average (str)

Return type:

Dict[str, float]

static compute_weak_supervision_metrics(predictions, labels, weak_labels)[source]

Compute metrics specific to weak supervision using Snorkel’s LFAnalysis

Args:

predictions: Model predictions labels: True labels weak_labels: Weak supervision labels (L matrix from Snorkel)

Returns:

Dictionary of weak supervision metrics

Parameters:
  • predictions (numpy.ndarray)

  • labels (numpy.ndarray)

  • weak_labels (numpy.ndarray)

Return type:

Dict[str, float]

static evaluate_weak_supervision_method(results, labels, method_name='unknown')[source]

Comprehensive evaluation for weak supervision methods

Args:

results: Dictionary containing predictions and other method outputs labels: True labels method_name: Name of the method being evaluated

Returns:

Comprehensive evaluation dictionary

Parameters:
  • results (Dict[str, Any])

  • labels (numpy.ndarray)

  • method_name (str)

Return type:

Dict[str, Any]