Automated Semantic Annotation Deploying Machine Learning Approaches: A Systematic Review
Abstract
Semantic Web is the vision to make Internet data machine-readable to achieve information retrieval with higher granularity and personalisation. Semantic annotation is the process that binds machine-understandable descriptions into Web resources such as text and images. Hence, the success of Semantic Web depends
on the wide availability of semantically annotated Web resources. However, there remains a huge amount of unannotated Web resources due to the limited annotation capability available. In order to address this, machine learning approaches have been used to improve the automation process. This Systematic Review aims to summarise the existing state-of-the-art literature to answer five Research Questions focusing on machine learning driven semantic annotation automation. The analysis of 40 selected primary studies reveals that the use of unitary and combination of machine learning algorithms are both the current directions. Support
Vector Machine (SVM) is the most-used algorithm, and supervised learning is the predominant machine learning type. Both semi-automated and fully automated annotation are almost nearly achieved. Meanwhile, text is the most annotated Web resource; and the availability of third-party annotation tools is in-line with this. While Precision, Recall, F-Measure and Accuracy are the most deployed quality metrics, not all the studies measured the quality of the annotated results. In the future, standardising quality measures is the direction for research.
References
Oxford learner’s dictionaries, 2022. https://www.oxfordlearnersdictionaries.com.
Achimugu, P., Selamat, A., Ibrahim, R., and Mahrin, M. N. A systematic literature review of software requirements prioritization research. Information and Software Technology 56 (2014), 568–585.
Adebugbe, O. Development and evaluation of a holistic, cloud-driven and microservices-based architecture for automated semantic annotation of web documents. Doctoral dissertation, 2019.
Ahmed, S., Frikha, M., Hussein, T., and Rahebi, J. Harris hawks optimization systems. In 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) (2022), pp. 1–6.
Al-Bukhitan, S., Alnazer, A., and Helmy, T. Semantic annotation arabic web documents using deep learning. Procedia Computer Science 130 (2018), 589–596.
Al-Bukhitan, S., Alnazer, A., and Helmy, T. Semantic web annotation using deep learning with arabic morphology. Procedia Computer Science 151 (2019), 385–392.
Al-Bukhitan, S., Helmy, T., and Al-Mulhem, M. Semantic annotation tool for annotating arabic web documents. Procedia Computer Science 32 (2014), 429–436.
Andrade, G. Semantic enrichment of american english corpora through automatic semantic annotation based on top-level ontologies using the crf classification model. Master dissertation, 2018.
Arcan, M., and Buitelaar, P. Machine tranlsation of domain-specific expressions within ontologies and documents. Phd theses, 2017.
Bastos, E., Barcellos, M., and de Almeida Falbo, R. Using semantic documentation to support software project management. Journal on Data Semantics 7 (2018), 107–132.
Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web. a new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American 285 (2001), 24–30.
Biolchini, J., Mian, P., Natali, A., and Travassos, G. Systematic review in software engineering. Technical Report ES 679/05, 2005.
Boella, G., Caro, L., Ruggeri, A., and Robaldo, L. Learning from syntax generalizations for automatic semantic annotation. J Intell Inf Syst 43 (2014), 231–246.
Bontcheva, K., and Cunningham, H. Semantic annotations and retrieval: Manual, semiautomatic, and automatic generation. In Handbook of Semantic Web Technologies (2011), pp. 77–116.
Cao, J., and Chen, L. Fuzzy emotional semantic analysis and automated annotation of scene images. Computational Intelligence and Neuroscience 33 (2015).
Cuzzola, J., Jovanovic, J., Bagheri, E., and Gasevic, D. Evolutionary fine-tuning of automated semantic annotation systems. Expert Systems with Applications 42 (2015), 6864–6877.
Dataversity. Data topics, 2019. https://www.dataversity.net/a-brief-historyof-natural-language-processing-nlp.
de Castilho, R., Mujdricza-Maydt, E., Yimam, S., Hartmann, S., Gurevych, I., Frank, A., and Biemann, C. A web-based tool for the integrated annotation of semantic and syntactic structures. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH) (2016), pp. 76–84.
de Maio, C., Fenza, G., Gallo, M., Loia, V., and Senatore, S. Formal and relational concept analysis for fuzzy-based automatic semantic annotation. Applied Intelligence 40 (2013), 154–177.
Ding, W., Liang, P., Tang, A., and Vliet, H. Knowledge-based approaches in software documentation: A systematic literature review. Information and Software Technology 56 (2014), 545–567.
Dumitru, C., Schwarz, G., Cui, S., Espinoza-Molina, D., and Datcu, M. Semiautomated semantic annotation of big archives of high-resolution sar images. In Proceedings of EUSAR 2016: 11th European Conference on Synthetic Aperture Radar (2016), pp. 1–4.
Dwivedi, Y., Williams, M., Mitra, A., Niranjan, S., and Weerakkody, V. Understanding advances in web technologies: Evolution from web 2.0 to web 3.0. In Proceedings of the European Conference on Information Systems (ECIS 2011) (2011), p. 257.
Dyba, T., and Dingsoyr, T. Empirical studies of agile software development: A systematic review. Information and Software Technology 50 (2008), 833–859.
Espinoza, R., and Melgar, A. An automated semantic annotation tool supported by an ontology in the computer science domain. In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (2015), pp. 133–138.
Gharehchopogh, F., and Lotfi, Y. Machine learning based question classification methods in the question answering systems. Int J Innovat Appl Stud 4 (2013), 264–273.
Gruber, T. A translation approach to portable ontology specifications. Knowledge Acquisition 5 (1993), 199–220.
Guha, R., McCool, R., and Miller, E. Semantic search. In Proceedings of the 12th International Conference on World Wide Web - WWW’03 (2003).
Gutu, G., Dascalu, M., Heutelbeck, D., Hemmje, M., Westera, W., and Trausan- Matu, S. Semantic annotation and automated text categorization using cohesion network analysis. In The International Scientific Conference eLearning and Software for Education (2017), p. 25.
Gabor, K., Zargayouna, H., Buscaldi, D., Tellier, I., and Charnois, T. Semantic annotation of the acl anthology corpus for the automatic analysis of scientific literature. In LREC (2016), pp. 3694–3701.
Hassani, A., Montori, F., Liao, K., Haghighi, P., Jayaraman, P., and Georgakopoulos, D. Informa: A tool for classification and semantic annotation of iot datastreams. In 2021 IEEE 7th World Forum on Internet of Things (WF-IoT) (2021), pp. 223–228.
Hou, A., Wang, C., Guo, J., Wu, L., and Li, F. Automatic semantic annotation for image retrieval based on multiple kernel learning. In Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science (2014), pp. 649–653.
Isabelle, J. Semantic, automatic image annotation based on multi-layered active contours and decision trees. International Journal of Advanced Computer Science and Applications 4 (2013), 201–208.
Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98: 10th European Conference on Machine Learning (2005).
Kitchenham, B. Procedures for performing systematic reviews. Keele University 33 (2004), 1–26.
Kitchenham, B. Guidelines for performing systematic literature reviews in software engineering. Technical Report Keele University and Durham University Joint Report, 2007.
Kurdi, G. Toward an electronic resource for systematic reviews in computer science, 2022. Researchgate. net.
Korner, D. . Automated semantic annotation of historical catalogues. Master thesis, 2020.
Le, H., Nguyen, M., and Yan, W. Machine learning with synthetic data - a new way to learn and classify the pictorial augmented reality markers in real-time. In 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ) (2020), pp. 1–6.
Li, R., and Li, S. Multimedia image data analysis based on knn algorithm. In Computational Intelligence and Neuroscience (2022), p. 7963603.
Liao, X., and Zhao, Z. Unsupervised approaches for textual semantic annotation, a survey. ACM Computing Surveys 52 (2019), 1–45.
Lin, S., Chung, C., Hu, W., Hung, C., Chen, S., and Lin, T. Automated knowledge discovery and semantic annotation for network and web services. International Journal of Distributed Sensor Networks 12 (2016), 1550147716657925.
Lin, S., Li, J., and Yu, C. Dynamic data driven-based automatic clustering and semantic annotation for internet of things sensor data. Sensors and Materials 31 (2019), 1789–1801.
Liu, F., Cui, J., Janssens, D., Wets, G., and Cools, M. Semantic annotation of mobile phone data using machine learning algorithms. Smartphones from an Applied Research Perspective (2017).
Liu, F., Li, P., and Deng, D. Device-oriented automatic semantic annotation in iot. Journal of Sensors 2017 (2017), 1–14.
Liu, Z., Bao, J., and Ding, F. An improved k-means clustering algorithm based on semantic model. In International Conference on Information Technology and Electrical Engineering (2018), pp. 1–5.
Mahdavi-Hezavehi, D., Galster, M., and Avgeriou, P. Variability in quality attributes of service-based software systems: A systematic literature review. Information and Software Technology 55 (2013), 320–343.
Makris, C., and Simos, M. Otnel: A distributed online deep learning semantic annotation methodology. Big Data and Cognitive Computing 4 (2020), 31.
Marbrouk, C., and Konat´e, K. An approach to extracting distributed data from the integrated environment of web technologies based on set theory. International Journal of Computer Science and Information Technology 11 (2019), 29–44.
Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., and Houben, G. Semantic annotation of data processing pipelines in scientific publications. In The Semantic Web: 14th International Conference, ESWC 2017 (2017), pp. 321–336.
Miranda, P., Isaias, P., and Costa, C. Elearning and web generations: Towards web 3.0 and e-learning 3.0. In International Proceedings of Economics Development and Research, IPEDR (2014), pp. 92–103.
Mwiti, D. 10 real-life applications of reinforcement learning, 2023. https://neptune.ai/blog/reinforcement-learningapplications.
Patra, A., and Singh, D. A survery report on text classification with different term weighing methods and comparison between classification algorithms. International Journal of Computer Applications 75 (2013), 14–18.
Pech, F., Martinez, A., Estrada, H., and Hernandez, Y. Semantic annotation of unstructured documents using concepts similarity. Scientific Programming 2017 (2017), 1–10.
Raj, R. Supervise, unsupervised, and semisupervised learning with real-life use case, 2020. www.enjoyalgorithms.com/blogs/supervisedunsupervised-and-semisupervised-learning.
Rinaldi, F. Semi-automated semantic annotation of the biomedical literature. In ISWC (Posters & Demos) (2014), pp. 473–476.
Salleh, N., Mendes, E., and Grundy, J. Empirical studies of pair programming for CS/SE teaching in higher education: A systematic literature review. IEEE Transactions on Software Engineering 37 (2011), 509–525.
Santini, M. Advantages & disadvantages of KMeans and hierarchical clustering. Tech. rep., 2016.
Shah, F., and Patel, V. A review on feature selection and feature extraction for text classification. In 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) (2016), pp. 2264–2268.
Sharma, A. The web 3.0: The web transition is coming, 2018. https://hackernoon.com/the-web-3-0-the-web-transition-is-coming-892108fd0d.
Silva, J., Rahman, A., and Saddik, A. Web 3.0 a vision for bridging the gap between real and virtual. In Proceedings of the 1st ACM International Workshop on Communicability Design and Evaluation in Cultural and Ecological Multimedia System (2008), pp. 29–42.
Silva, V., Bittencourt, I., and Maldonado, J. Automatic question classifiers: A systematic review. IEEE Transactions on Learning Technologies 12 (2019), 485–502.
Sinaga, K., and Yang, M. Unsupervised K-Means clustering algorithm. IEEE Access 8 (2020), 80716–80727.
Song, D., Chute, C., and Tao, C. Semantator: A semi-automatic semantic annotation tool for clinical narratives. In 10th International Semantic Web Conference (ISWC2011) (2011).
Stavropoulos, T., Vrakas, D., and Vlahavas, I. Iridescent. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics - WIMS’13 (2013), pp. 1–9.
Tallis, M. Semantic word processing for content authors. In Proceedings of the Knowledge Markup & Semantic Annotation Workshop (2003).
Tang, J., Zhang, D., Yao, L., and Li, Y. Automatic semantic annotation using machine learning. In The Semantic Web for Knowledge and Data Management (2009), pp. 106–150.
Taqi, M., and Ali, R. Automatic question classification models for computer programming examination: A systematic literature review. Journal of Theoretical and Applied Information Technology 93 (2016), 360–374.
Tosi, D., and Morasca, S. Supporting the semi-automatic semantic annotation of web services: A systematic literature review. Information and Software Technology 61 (2015), 16–32.
Tresp, V., Bundschus, M., Rettinger, A., and Huang, Y. Towards machine learning on the semantic web. Lecture notes in computer science, 2008.
van Engelen, J., and Hoos, H. A survey on semi-supervised learning. Machine Learning 109 (2019), 360–374.
Vapnik, V. Statistical Learning Theoru. Springer Verlag, 1998.
Velu, A., and Thangavelu, M. Information retrieval through a knowledge base system: Semantic web-based approach in south-eastern coastal areas of india. Songklanakarin Journal of Science and Technology 44 (2022), 272–280.
Vidal, J., Lama, M., Otero-Garc´ıa, E., and Bugar´ın, A. Graph-based semantic annotation for enriching educational content with linked data. Knowledge-based Systems 55 (2014), 29–42.
Vrablecova, P., and Simko, M. Supporting semantic annotation of educational content by automatic extraction of hierarchical domain relationship. IEEE Transaction on Learning Technologies 9 (2016), 285–298.
Wang, C., Ma, H., Chen, A., and Hartmann, S. Gp-based approach to comprehensive qualityaware automated semantic web service composition. Lecture notes in computer science, 2017.
Wang, Y., Ling, F., and Chen, H. Automatic semantic annotation of news images in mobile internet of things and construction of semantic internet of things system, 2022. https://doi.org/10.21203/rs.3.rs-1464067/v1.
Wei, W., Wu, Q., Chen, D., Zhang, Y., Liu, W., Duan, G., and Luo, X. Automatic image annotation based on an improved nearest neighbor technique with tag semantic extension model. Procedia Computer Science 183 (2021), 616–623.
Wikipedia. F-score, 2017. https://en.wikipedia.org/wiki/F-score.
Wikipedia. Semantic web stack, 2022. https://en.wikipedia.org/wiki/Semantic Web Stack.
Wohlin, C., Runeson, P., Host, M., Ohlsson, M., Regnell, B., and Wesslen, A. Experimentation in Software Engineering. Springer US EBooks, 2000.
Yao, X., Han, J., Cheng, G., Qian, X., and Guo, L. Semantic annotation of highresolution satellite images via weakly supervised learning. IEEE Transaction on Geoscience and Remote Sensing 54 (2016), 3660–3671.
Yordanova, K. Towards automated generation of semantic annotation for activity recognition problems. In 2020 IEEE International Conference on Pervasive Computing and Communications Workships (PerCom Workshops) (2020), pp. 1–6.
You, M., Di, L., and Guo, Z. A webbased semi-automated method for semantic annotation of high schools in remote sensing images. In 2014 The Third International Conference on Agro-Geoinformatics (2014).
Yu, C., Zou, Y., Li, H., and Lin, S. Automatic clustering and semantic annotation for dynamic iot sensor data. In 2018 1st International Cognitive Cities Conference (IC3) (2018), pp. 188–189.
Zakharova, O. Main aspects of big data semantic annotaion. Problems in Programming 4 (2020), 022–033.
Zhang, J. Vision to keywords: Automatic image annotation by filling the semantic gap. Doctoral dissertation, 2019.
Zhang, J., Wen, X., Cho, A., and Whang, M. An empathy evaluation system using spectogramimage features of audio. Sensors 21 (2021), 7111.
Zhang, M., Han, L., Yuan, L., and Chen, N. Ontology-based automatic semantic annotation method for iot data resources. In 2020 International Conferences on Internt of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics) (2020), pp. 661–667.
Zhang, P., Du, J., Fan, D., and Zhou, Y. Automatic image semantic annotation based on the tourism domain ontological knowledge base. In Communications in Computer and Information Science (2015), pp. 61–69.
Copyright (c) 2023 MENDEL
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
MENDEL open access articles are normally published under a Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/ . Under the CC BY-NC-SA 4.0 license permitted 3rd party reuse is only applicable for non-commercial purposes. Articles posted under the CC BY-NC-SA 4.0 license allow users to share, copy, and redistribute the material in any medium of format, and adapt, remix, transform, and build upon the material for any purpose. Reusing under the CC BY-NC-SA 4.0 license requires that appropriate attribution to the source of the material must be included along with a link to the license, with any changes made to the original material indicated.