src.common
.Evaluator¶
- class src.common.Evaluator[source]¶
Bases:
BaseEvaluator
Extended evaluator with weak supervision specific metrics
Methods
compute_adjusted_accuracy
(soft_predictions, ...)Compute accuracy that handles abstaining predictions
compute_confusion_matrix
(predictions, labels)Compute confusion matrix
compute_coverage
(predictions[, ...])Compute coverage (percentage of non-abstaining predictions)
compute_coverage_metrics
(prediction_probs[, ...])Compute coverage and confidence metrics from prediction probabilities
compute_metrics
(predictions, labels[, average])Compute standard classification metrics with optional weak supervision metrics
Compute metrics specific to weak supervision using Snorkel's LFAnalysis
evaluate_weak_supervision_method
(results, labels)Comprehensive evaluation for weak supervision methods
- static compute_adjusted_accuracy(soft_predictions, labels)[source]¶
Compute accuracy that handles abstaining predictions
- Args:
soft_predictions: Soft prediction probabilities labels: True labels
- Returns:
Adjusted accuracy
- Parameters:
soft_predictions (numpy.ndarray)
labels (numpy.ndarray)
- Return type:
float
- static compute_confusion_matrix(predictions, labels)[source]¶
Compute confusion matrix
- Args:
predictions: Predicted labels labels: True labels
- Returns:
Confusion matrix
- Parameters:
predictions (numpy.ndarray)
labels (numpy.ndarray)
- Return type:
numpy.ndarray
- static compute_coverage(predictions, abstain_threshold=0.001)[source]¶
Compute coverage (percentage of non-abstaining predictions)
- Args:
predictions: Soft predictions or confidence scores abstain_threshold: Threshold below which predictions are considered abstaining
- Returns:
Coverage ratio
- Parameters:
predictions (numpy.ndarray)
abstain_threshold (float)
- Return type:
float
- static compute_coverage_metrics(prediction_probs, confidence_threshold=0.5)[source]¶
Compute coverage and confidence metrics from prediction probabilities
- Args:
prediction_probs: Prediction probabilities/logits confidence_threshold: Threshold for confident predictions
- Returns:
Dictionary of coverage metrics
- Parameters:
prediction_probs (numpy.ndarray)
confidence_threshold (float)
- Return type:
Dict[str, float]
- static compute_metrics(predictions, labels, average='weighted', **kwargs)[source]¶
Compute standard classification metrics with optional weak supervision metrics
- Args:
predictions: Predicted labels labels: True labels average: Averaging strategy for multi-class metrics kwargs: Additional arguments (weak_labels, prediction_probs, loss_info, etc.)
- Returns:
Dictionary of metrics
- Parameters:
predictions (numpy.ndarray)
labels (numpy.ndarray)
average (str)
- Return type:
Dict[str, float]
- static compute_weak_supervision_metrics(predictions, labels, weak_labels)[source]¶
Compute metrics specific to weak supervision using Snorkel’s LFAnalysis
- Args:
predictions: Model predictions labels: True labels weak_labels: Weak supervision labels (L matrix from Snorkel)
- Returns:
Dictionary of weak supervision metrics
- Parameters:
predictions (numpy.ndarray)
labels (numpy.ndarray)
weak_labels (numpy.ndarray)
- Return type:
Dict[str, float]
- static evaluate_weak_supervision_method(results, labels, method_name='unknown')[source]¶
Comprehensive evaluation for weak supervision methods
- Args:
results: Dictionary containing predictions and other method outputs labels: True labels method_name: Name of the method being evaluated
- Returns:
Comprehensive evaluation dictionary
- Parameters:
results (Dict[str, Any])
labels (numpy.ndarray)
method_name (str)
- Return type:
Dict[str, Any]