As a student of the Scholomance Academy, you are studying a course called textit{Machine Learning}. You are currently working on your course project: training a binary classifier. A binary classifier is an algorithm that predicts the classes of instances, which may be positive (+)({+})(+) or negative ()({-})(). A typical binary classifier consists of a scoring function S{S}S that gives a score for every instance and a threshold θthetaθ that determines the category. Specifically, if the score of an instance S≥θS geq thetaS≥θ, then the instance x{x}x is classified as positive; otherwise, it is classified as negative. Clearly, choosing different thresholds may yield different classifiers. Of course, a binary classifier may have misclassification: it could either classify a positive instance as negative or classify a negative instance as positive . Given a dataset and a classifier, we may define the true positive rate and the false positive rate as follows: TPR=#TP#TP+#FN,FPR=#FP#TN+#FP{TPR} = frac{# {TP}} {# {TP} + # {FN}}, quad
As a student of the Scholomance Academy, you are studying a course called textit{Machine Learning}. You are currently working on your course project: training a binary classifier. A binary classifier is an algorithm that predicts the classes of instances, which may be positive (+)({+})(+) or negative (−)({-})(−). A typical binary classifier consists of a scoring function S{S}S that gives a score for every instance and a threshold θthetaθ that determines the category. Specifically, if the score of an instance S(x)≥θS(x) geq thetaS(x)≥θ, then the instance x{x}x is classified as positive; otherwise, it is classified as negative. Clearly, choosing different thresholds may yield different classifiers. Of course, a binary classifier may have misclassification: it could either classify a positive instance as negative (false negative) or classify a negative instance as positive (false positive). Given a dataset and a classifier, we may define the true positive rate (TPR{TPR}TPR) and the false positive rate (FPR{FPR}FPR) as follows: TPR=#TP#TP+#FN,FPR=#FP#TN+#FP{TPR} = frac{# {TP}} {# {TP} + # {FN}}, quad {FPR} = frac{# {FP}} {# {TN} + # {FP}}TPR=#TP+#FN#TP,FPR=#TN+#FP#FP where #TP# TP#TP is the number of true positives in the dataset; #FP,#TN,#FN# FP, #TN, #FN#FP,#TN,#FN are defined likewise. Now you have trained a scoring function, and you want to evaluate the performance of your classifier. The classifier may exhibit different TPR and FPR if we change the threshold θthetaθ. Let TPR(θ),FPR(θ){TPR}(theta), FPR(theta)TPR(θ),FPR(θ) be the TPR,FPR{TPR, FPR}TPR,FPR when the threshold is θthetaθ, define the area under curve{area;under;curve}areaundercurve (AUC{AUC}AUC) as AUC=∫01maxθ∈R{TPR(θ)∣FPR(θ)≤r}dr{AUC} = int_{0}^{1} max_{theta in mathbb{R}} {TPR(theta)|FPR(theta) leq r} d rAUC=∫01maxθ∈R{TPR(θ)∣FPR(θ)≤r}dr where the integrand, called receiver operating characteristic{receiver;operating;characteristic}receiveroperatingcharacteristic (ROC), means the maximum possible of TPR{TPR}TPR given that FPR≤rFPR leq rFPR≤r. Given the actual classes and predicted scores of the instances in a dataset, can you compute the AUC{AUC}AUC of your classifier? For example, consider the third test data. If we set threshold θ=30theta = 30θ=30, there are 3 true positives, 2 false positives, 2 true negatives, and 1 false negative; hence, TPR(30)=0.75{TPR}(30) = 0.75TPR(30)=0.75 and FPR(30)=0.5{FPR}(30) = 0.5FPR(30)=0.5. Also, as θthetaθ varies, we may plot the ROC curve and compute the AUC accordingly, as shown in Figure 1.
标签: HBC223861[NOI1999]生日蛋糕 深度优先搜索(DFS) 搜索ScholomanceAcademy题解