In general, the level of BCVA is measured by examination using the Snellen or Landolt C chart. However, when a patient is unconscious due to brain injury or illness such as dementia, he will be unable to determine BCVA level. In addition, such manual measurement of BVCA cannot be applied to infants and childhood. In addition, some patients give false answers in the examination to obtain a high degree of invalidity. In such a case, measurement of BCVA level by direct fundus inspection is necessary, where the macula, major retinal arteries and veins are used for examination.

Recently, CNN has been widely applied in retinal fundus image analysis. Specifically, the CNN-based detection and classification of DR, which is a complication of diabetes leading to blindness, was studied.^{42,43,44,45}. Additionally, a study confirmed that the CNN-based approach can be used for sex estimation from the retinal fundus image.^{46} and left and right fundus image identification^{47}. Since the spatial feature of fundus images can be extracted by CNNs, we investigated the feasibility of CNN-based BCVA estimation in this study.

Table 1 shows the accuracy of the considered CNN-based BCVA estimation schemes, i.e. Res-Cla, Eff-Cla, Res-Reg and Eff-Reg. Unlike conventional classification tasks, the severity of misclassifications can vary depending on misclassified results. For example, when the actual level of BCVA is 0.1, the prediction of 0.2 can be considered more accurate than the prediction of 0.9, even if both prediction results are incorrect. It should be noted that prediction results obtained through retrospective medical chart review may be slightly different for the same fundus image, and such a minor deviation is usually acceptable in practice. To properly reflect these features in the misclassification, we assessed accuracy by allowing such a small difference in prediction, namely the level of relaxation. Accordingly, when the 2-level relaxation is adopted, we assume that the prediction is correct if (|{hat{VA}} – VA| le 0.2) where *Virginia* and ({hat{GO}}) are the actual and forecast BCVA levels.

As shown in Table 1, classification schemes achieve higher accuracy than regression schemes when relaxation is not adopted. On the other hand, regression schemes offer higher precision when relaxed precision is adopted. We assume that the use of different loss functions causes such a phenomenon. Specifically, we train the classification schemes to match the predicted BCVA exactly to the actual BCVA, while the regression schemes are trained to minimize the difference between the predicted and actual BCVA levels. As a result, classification schemes are more efficient in finding the exact level of BCVA, while regression schemes prediction can be closer to its true value when the predicted level of BCVA is different from the true value. Moreover, the Res-Cla achieves the highest accuracy of 44.28% for the non-relaxation case, the value of which is somewhat low for practical uses. However, as 1, 2 and 3 levels of relaxation are adopted, the achievable accuracy can increase to 71.11% (Res-Reg), 87.03% (Eff-Reg) and 94.37% (Eff-Reg), respectively. It should be noted that the manual measurement of BVCA is subjective, and the measurement of BVCA may change depending on the measurement environment and time. Accordingly, a minor difference in BCVA prediction is generally acceptable, so we can conclude that our considered BCVA estimation schemes can be practical in practical use.

Table 2 shows the RMSE and (R^2)-Score of BCVA estimation schemes considered. Regression schemes perform better than classification schemes because we have trained the regression schemes to minimize the difference between the actual and predicted BCVA level, as explained earlier, while this difference determines the RMSE and (R^2)-Score. Specifically, the Res-Reg scheme achieves the highest performance (i.e. the lowest RMSE and the (R^2)-Score) while the Eff-Reg scheme achieves almost the same performance as the Res-Reg scheme.

Then, to better understand the prediction accuracy for each level of BCVA, we showed the accuracy of the considered BCVA estimation schemes using the confusion matrix as shown in Fig. 1. The diagonal element corresponds to the precision for each level of BCVA. As can be seen from the confusion matrices, the diagonal elements are usually small. The diagonal values are smaller when the BCVA level is close to 0.5 compared to when the BCVA level is 0.0 or 1.0. For example, the accuracy of Res-Cla is 79% and 72% when the BCVA level is 0.0 and 1.0, respectively, while the accuracy is 24% when the BCVA level is 0.5. Indeed, when the BCVA level is 0.0 or 1.0, the fundus image contains the unique feature, which is easy to identify. However, as the level of BCVA approaches 0.5, such a unique characteristic becomes ambiguous, making identification of BCVA more difficult. Also, exams performed by expert ophthalmologists are safer when BCVA levels are 0.0 or 1.0. Among all the schemes considered, the confusion matrix trace is the highest for Res-Cla, i.e. the highest accuracy, which is in line with the performance evaluation in Table 1.

Although diagonal elements are not dominant in the confusion matrix, values close to the diagonal are considerably large. This means that even when the predicted BCVA level is incorrect, the difference between the incorrect prediction result and the actual BCVA level is not significant. Also, the level of concentration of the diagonal element is higher for regression schemes (i.e. Res-Reg and Eff-Reg) than for classification schemes (i.e. Res-Cla and Eff-Cla). Thus, the regression schemes will be more accurate, which coincides with our conclusion in Table 2. Additionally, we also found only a minor difference in performance under the baseline CNN models, i.e. say ResNet-18 and EfficientNet-B0.

Figure 2 presents the histogram of the estimated level of BCVA for all the diets considered. The distribution of the predicted BCVA level concentrates around the actual BCVA level, i.e. the misprediction will be close to the ground truth value, and the distribution becomes more densely concentrated when BCVA is 0, 0 and 1.0, which agrees with our previous observation in Fig. 1. In addition, the distribution of classification schemes is also concentrated around the actual level of BCVA, even though we do not take into account the difference between the prediction and the ground truth level of BCVA during training. To be more specific, we adopted the cross-entropy loss function for DNN formation in classification schemes. Thus, the value of the loss function will be the same for the erroneous predictions. For example, when the actual level of BCVA is 0.2, the inaccurate prediction of 0.3 will not have a lower loss function value than the case with an erroneous prediction of 1.0. Therefore, training will not force the prediction results to be near the actual BCVA level. According to the results, the similarity exists in the fundus images for the near BCVA levels, for example, the spatial characteristics of the fundus image corresponding to the BCVA level of 0.2 will be similar to those corresponding to the BCVA level of 0.3, as will be confirmed later. using clustering with t-SNE.

Due to the black box nature of DNN^{9}, it is difficult to justify why such a BCVA-level prediction is obtained from the fundus image, which is one of the major drawbacks of CNN-based schemes. To better understand how the considered CNN-based schemes work, we applied Guided Grad-CAM, which combines Grad-CAM and Guided Back-propagation to identify dominant spatial features for prediction. Figure 3 illustrates the original fundus image and the resulting Grad-CAM, guided backpropagation and guided Grad-CAM overlaid on the original fundus image for all schemes considered, where the actual level of BCVA is 1.0, and the prediction of Res-Cla, Eff-Cla, Res-Reg and Eff-Reg are 0.7, 1.0, 0.934, 0.968 respectively. The area near the macula is highlighted for all patterns considered in the Grad CAM. On the other hand, the blood vessel, macula and optic disc are highlighted for the guided backpropagation result. Accordingly, for the Guided Grad-CAM, the macula and the blood vessels surrounding the macula are highlighted, i.e. the BCVA estimation schemes make their predictions by observing these features. Since the same spatial features are used in the retrospective medical record review to identify BCVA levels, we can conclude that the prediction of the considered BCVA estimation schemes is reasonable and trustworthy.

To further analyze inaccurate prediction results, Figure 4 presents the randomly selected samples of incorrect prediction results for all considered schemes, when the true BCVA is 0, 0, 0, 2, 0, 4, 0 , 6.0.8 and 1.0. In some cases, erroneous prediction results are obtained due to inappropriate fundus images. For example, the macula and optic disc could not be identified from the fundus images for the case where the actual BCVA was 0.0. Additionally, for Eff-Cla with a BCVA level of 1.0 and Eff-Reg with a BCVA level of 1.0, the macula is shadowed and difficult to recognize, making identification of BCVA levels more difficult. hard.

Finally, to study the correlation of fundus images for each level of BCVA, Figure 5 illustrates the result of visualization using t-SNE. Data points are monotonically aligned based on their corresponding BCVA. This result reveals that there is a similarity in the spatial characteristics of fundus images based on their BCVA levels. Moreover, the data points for the BCVA level of 0.0 can be clustered separately, suggesting that the fundus image with a BCVA level of 0.0 can be classified accurately.