Blind label ratio estimation

Document Type : Original Scientific Paper

Authors

1 Department of Electrical and Computer Engineering‎, ‎Babol Noshirvani University of Technology‎, ‎Tehran‎, ‎Iran

2 Son Corporate Group‎, ‎Tehran‎, ‎Iran

10.22034/jsmta.2026.22764.1173

Abstract

Many anomaly detection algorithms require knowledge of the ratio of the two labels to operate‎. ‎In real life‎, ‎however‎, ‎we may not have access to this value‎. ‎As such‎, ‎we often run anomaly detection packages with default values that may differ significantly from the actual value‎. ‎Experiments on multiple datasets show that correctly determination of this ratio or at least obtaining a close estimate can makes a significant difference in the final performance of the anomaly detection algorithm‎. ‎In this paper‎, ‎we address the problem of estimating this ratio using both theoretical and heuristic techniques‎. ‎In the theoretical method‎, ‎we maximize the mutual information between features and labels to find the exact ratio‎. ‎In the heuristic method‎, ‎we sweep the [0,1] range in 0.01 steps to search for the ratio‎. ‎On each iteration‎, ‎we run the anomaly detection algorithm based on the ratio for that iteration and record the correlation coefficient between the features and the label generated by the algorithm‎. ‎After the 100th iteration‎, ‎we declare the ratio that provides the maximum correlation coefficient as our estimate of the label ratio‎. ‎Our experiments on multiple datasets and several anomaly detection algorithms show that maximizing the correlation coefficient leads to the best results.

Keywords

Main Subjects