First steps towards a platform for the analysis of civil law documentary heritage

Authors

DOI:

https://doi.org/10.54886/scire.v30i1.5018

Keywords:

Civil law, Documentary heritage, Text processing, Named entity recognition, Deep learning, Molino, Miguel del

Abstract

The documentary heritage about civil law is an important asset whose study allows us to learn about the political, social and cultural context of the period referred in historical documents. This paper presents the design of a prototype platform to sup-port researchers in the analysis of civil law docu-mentary heritage. Our platform involves creating an online version of these materials, making them more accessible. The platform provides automatic assistants for the transcription, translation and extraction of specific information items associated to civil law concepts (voices) such as citations to external sources and named entities (locations, persons, and organizations) to identify better their context. The feasibility of this platform has been tested with the processing of a doctrinal work writ-ten by Miguel del Molino, a well-known civil law expert in XV century in the Aragon kingdom.

Downloads

Download data is not yet available.

References

Aljalbout, S.; Falquet, G. (2017). Un modèle pour la représenta-tion des connaissances temporelles dans les documents historiques: Applications sur les manuscrits de F. Saussure. // Proc. 28es Journées francophones d'Ingénierie des Con-naissances (IC 2017): Caen, France, July 2017.

Erdmann, A.; Brown, C.; Joseph, B.D; Janse, M.; Ajaka, P.; Elsner, M.; de Marneffe, M. (2016). Challenges and solu-tions for Latin named entity recognition. // COLING 2016: 26th International Conference on Computational Linguistics. Association for Computational Linguistics. 85–93.

Fischer, L.; Scheurer, P.; Schwitter, R.; Volk, M. (2022). Machi-ne translation of 16th century letters from Latin to German. // Proceedings of 2nd Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) at LREC-2022, Marseille.

García Marco, Francisco Javier. Knowledge Organization in Historical Information Systems Revisited: Changes in So-ciety, Technology and Expectations 25 Years Later. // Kno-wledge Organization at the Interface. Proceedings of the Six-teenth International ISKO Conference 6-8 July 2020 Aalborg, Denkmark. Würzburg: Ergon-Verlag GmbH, 2020. 474-478.

Gupta, A.; Gutierrez-Osuna, R.; Christy, M.; Capitanu, B.; Auvil, L.; Grumbach, L.; Furuta, R.; Mandell, L. (2015). Automatic Assessment of OCR Quality in Historical Documents. // Proc. of 29th AAAI Conference on Artificial Intelligence. 1735-1741.

Hamdi, A.; Jean-Caurant, A.; Sidere, N.; Coustaty, M.; Doucet, A. (2019). An analysis of the performance of named entity recognition over ocred documents. // 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE. 333–334.

Hubkova, H. (2019). Named-entity recognition in Czech histori-cal texts: Using a CNN-BiLSTM neural network model. Ph.D. thesis.

Kišš, M.; Hradiš, M.; Beneš, K.; Buchal, P.; Kula, M. (2023). SoftCTC: semi-supervised learning for text recognition using soft pseudo-labels. // International Journal on Docu-ment Analysis and Recognition (IJDAR). 2, 1-17.

Kodym, O.; Hradiš, M. (2021). Page layout analysis system for unconstrained historic documents. // Proc. of 16th Internatio-nal Conference on Document Analysis and Recognition–ICDAR 2021: Lausanne, Switzerland, September 5–10, 2021. Part II, 492-506).

Lacasta, J.; Nogueras-Iso, J.; Zarazaga-Soria, F.J.; Pedraza-Gracia, M.J. (2022). Tracing the origins of incunabula through the automatic identification of fonts in digitised do-cuments. // Multimedia Tools and Applications. 81:28, 40977-40991.

Li, J.; Sun, A.; Han, J.; Li, C. (2020). A survey on deep learning for named entity recognition. // IEEE Transactions on Kno-wledge and Data Engineering. 34:1, 50-70.

Martínez Garcia, E; García Tejedor, Á. (2020). Latin-Spanish Neural Machine Translation: From the Bible to Saint Augus-tine. // Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Langua-ges: Marseille, France. European Language Resources As-sociation (ELRA). 94–99.

Neji, H.; Ben Halima, M.; Nogueras-Iso, J.; Hamdani, T.M.; Lacasta, J.; Chabchoub, H.; Alimi, A.M. (2024). Doc-Attentive-GAN: attentive GAN for historical document de-noising. // Multimedia Tools and Applications. 83, 55509–55525.

Rodriquez, K.J.; Bryant, M.; Blanke, T.; Luszczynska, M. (2012). Comparison of named entity recognition tools for raw OCR text. // 11th Conference on Natural Language Processing, KONVENS 2012, Empirical Methods in Natural Language Processing, Vienna, Austria, September 19-21, 2012. Scien-tific series of the OGAI. 5, 410–414.

Tiedemann, J. (2012). Parallel data, tools and interfaces in Opus. // Calzolari, Nicoletta (Conference Chair); et al., (eds). Proc. of 8th Int. Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Lan-guage Resources Association (ELRA).

van Strien, D.; Beelen, K.; Coll Ardanuy, M.; Hosseini, K.; McGi-llivray, B.; Colavizza, G. (2020). Assessing the impact of OCR quality on downstream NLP tasks. // Proceedings of the 12th International Conference on Agents and Artificial In-telligence. 1, 484-496. https://doi.org/10.5220/0009169004840496

Downloads

Published

2024-06-14

How to Cite

Neji, H., Nogueras-Iso, J., García-Marco, F. J., & Bayod López, . M. del C. (2024). First steps towards a platform for the analysis of civil law documentary heritage . Scire: Knowledge Representation and Organization (ISSNe 2340-7042; ISSN 1135-3716), 30(1), 75–83. https://doi.org/10.54886/scire.v30i1.5018

Issue

Section

Articles