`src.common`.Evaluator¶

class src.common.Evaluator[source]¶

Bases: BaseEvaluator

Extended evaluator with weak supervision specific metrics

Methods

`compute_adjusted_accuracy`(soft_predictions, ...)	Compute accuracy that handles abstaining predictions
`compute_confusion_matrix`(predictions, labels)	Compute confusion matrix
`compute_coverage`(predictions[, ...])	Compute coverage (percentage of non-abstaining predictions)
`compute_coverage_metrics`(prediction_probs[, ...])	Compute coverage and confidence metrics from prediction probabilities
`compute_metrics`(predictions, labels[, average])	Compute standard classification metrics with optional weak supervision metrics
`compute_weak_supervision_metrics`(...)	Compute metrics specific to weak supervision using Snorkel's LFAnalysis
`evaluate_weak_supervision_method`(results, labels)	Comprehensive evaluation for weak supervision methods

static compute_adjusted_accuracy(soft_predictions, labels)[source]¶

Compute accuracy that handles abstaining predictions

Args:: soft_predictions: Soft prediction probabilities labels: True labels
Returns:: Adjusted accuracy

Parameters:

soft_predictions (numpy.ndarray)
labels (numpy.ndarray)

Return type:

float

static compute_confusion_matrix(predictions, labels)[source]¶

Compute confusion matrix

Args:: predictions: Predicted labels labels: True labels
Returns:: Confusion matrix

Parameters:

predictions (numpy.ndarray)
labels (numpy.ndarray)

Return type:

numpy.ndarray

static compute_coverage(predictions, abstain_threshold=0.001)[source]¶

Compute coverage (percentage of non-abstaining predictions)

Args:: predictions: Soft predictions or confidence scores abstain_threshold: Threshold below which predictions are considered abstaining
Returns:: Coverage ratio

Parameters:

predictions (numpy.ndarray)
abstain_threshold (float)

Return type:

float

static compute_coverage_metrics(prediction_probs, confidence_threshold=0.5)[source]¶

Compute coverage and confidence metrics from prediction probabilities

Args:: prediction_probs: Prediction probabilities/logits confidence_threshold: Threshold for confident predictions
Returns:: Dictionary of coverage metrics

Parameters:

prediction_probs (numpy.ndarray)
confidence_threshold (float)

Return type:

Dict[str, float]

static compute_metrics(predictions, labels, average='weighted', **kwargs)[source]¶

Compute standard classification metrics with optional weak supervision metrics

Args:: predictions: Predicted labels labels: True labels average: Averaging strategy for multi-class metrics kwargs: Additional arguments (weak_labels, prediction_probs, loss_info, etc.)
Returns:: Dictionary of metrics

Parameters:

predictions (numpy.ndarray)
labels (numpy.ndarray)
average (str)

Return type:

Dict[str, float]

static compute_weak_supervision_metrics(predictions, labels, weak_labels)[source]¶

Compute metrics specific to weak supervision using Snorkel’s LFAnalysis

Args:: predictions: Model predictions labels: True labels weak_labels: Weak supervision labels (L matrix from Snorkel)
Returns:: Dictionary of weak supervision metrics

Parameters:

predictions (numpy.ndarray)
labels (numpy.ndarray)
weak_labels (numpy.ndarray)

Return type:

Dict[str, float]

static evaluate_weak_supervision_method(results, labels, method_name='unknown')[source]¶

Comprehensive evaluation for weak supervision methods

Args:: results: Dictionary containing predictions and other method outputs labels: True labels method_name: Name of the method being evaluated
Returns:: Comprehensive evaluation dictionary

Parameters:

results (Dict[str, Any])
labels (numpy.ndarray)
method_name (str)

Return type:

Dict[str, Any]

src.common.Evaluator¶

`src.common`.Evaluator¶