CLS INFRA is an EU Horizon 2020 funded project building shared and sustainable infrastructure for computational literary studies. The resources we have built link existing tools and processes, experiment with new programmable corpora, and reviews resources for their multilingual applicability and interoperability. We hope these will be of use to the GLAM sector, so this session will introduce the project and outputs and invite suggestions as to what would be most useful for DH and non-DH experts there.
In this lightning talk, I will introduce the concept of Linked Open Data (LOD) and its application in Persian literature. LOD is a method of structuring data to make it easily connectable and shareable across the web, which is particularly useful for digital humanities. Persian literature contains a wealth of dispersed textual, historical, and cultural data. By applying LOD principles, we can link these scattered resources, creating a more cohesive and accessible network of information. This talk will briefly cover how LOD can enhance the study of Persian literature by interlinking various datasets, improving data discoverability, and fostering new research opportunities. The focus will be on practical examples of how LOD has been successfully implemented in other literary fields, demonstrating its potential for Persian literary studies.
This brief lightning talk will highlight 2-3 grant programs offered by the National Endowment for the Humanities that have and continue to support efforts to foster linked data infrastructure. From early convenings and experiments, to more recent efforts focused on equitable access to collections, NEH has invested in such work for well over a decade. This snapshot will take the form of an overview and an invitation for conversation with potential applicants.
Due to the complex and unique nature of manuscripts as handwritten objects, there exists no standard cataloging methodology for manuscripts. Institutional metadata contributed to the Digital Scriptorium (DS) Catalog, an online union catalog aggregating manuscript records from institutions across North America, varies in robustness of description, encoding formats, and other elements of data organization. The DS Catalog, therefore, enables the harmonization of diverse institutional descriptions and the broader linked data environment, which includes Wikidata, an open, crowdsourced, global database for structuring data.
Out of a desire for increased discoverability and data reusability, the research team developed a crosswalk from the DS Catalog and Wikidata to address issues of interoperability between metadata schemas and vocabularies by matching semantically equivalent or similar elements or values. In order to upload manuscript records from the DS Catalog to Wikidata, the research team identified ways to map the DS data model, and the manuscript records and data values found in the DS Catalog, to Wikidata. This lightning will provide a brief introduction to the development of this mapping process, the tools used, obstacles encountered, and solutions identified, and the implications for the future of manuscript cataloging and data reuse.
ZOOM PASSCODE: ld4-2024 Afternoon panel consisting of three short talks + questions and answers. Please click on the links to the fuller descriptions of each talk for more details. From Linked Art to Text and Back Again: An Unsupervised Approach William Thorne, PhD Candidate, University of Sheffield; National Gallery (London)
I'm the cultural heritage data engineer on Yale's LUX platform, a native LOD cross-collections discovery service. I came to Yale in the summer of 2022, after eight years working at the Getty Provenance Index, a program of the Getty Research Institute. My background is art history... Read More →
PhD Candidate, University of Sheffield; National Gallery (London)
I'm Liam, I am studying a joint PhD between the University of Sheffield and the National Gallery (London) into information extraction, organisation and searching of art historical text collections. My key areas of research interest are in reducing computational and data costs of language... Read More →
ZOOM PASSCODE: ld4-2024 The intersection of human understanding and machine processing in cultural heritage presents a fundamental challenge: humans naturally express their interpretations through textual descriptions, while machines reason most reliably over structured data.
Whilst researchers, developers and the public need data to be available in numerous formats, manual translation requires time and intimate knowledge of the data and chosen ontology; machine learning approaches generally require numerous paired training examples to perform well.
However, large quantities of linked-data and natural language samples already exist separately. We use cycle-consistency training, an unsupervised approach for learning bidirectional translation between linked-data and natural language. Using two sequence-to-sequence language models and two unpaired datasets, we learn to align their feature spaces through iterative back-translation: one model generates a synthetic example as input to a second model, which attempts to recreate the real, original input data to the first model. Once trained, these models may be used to translate arbitrary data from one representation to the other. This approach has already been shown to be incredibly effective in a graph-to-text setting (Q. Gou et al., 2020) but is yet to be applied in cultural heritage.
This presentation gives an overview of the datasets, the key differences between them, and the implications this has for the task of translation, particularly with respect to our training paradigm. I will then close with some proposed remedies before opening up to questions.
PhD Candidate, University of Sheffield; National Gallery (London)
I'm Liam, I am studying a joint PhD between the University of Sheffield and the National Gallery (London) into information extraction, organisation and searching of art historical text collections. My key areas of research interest are in reducing computational and data costs of language... Read More →
ZOOM PASSCODE: ld4-2024 In January 2024, the Technical Services team at Binghamton University launched an informal, monthly linked data study group. These meetings create space to discuss, research, and ask open questions about linked data projects and how they could be integrated into our daily workflows. This lightning talk will explain what the group formation looked like, our initial plans, how those plans changed, and what we hope to accomplish in the future.
ZOOM PASSCODE: ld4-2024 For years, organizations have been releasing authority data as Linked Open Data, using properties like owl:sameAs and skos:exactMatch to maintain reciprocal relationships between their data and that of others. Organizations have been following their own data management practices and best practices to create these relationships, and large-scale projects leveraging them have been rare, so any inaccuracies have remained dormant. However, with the launch of LUX, Yale’s cross-collections, linked open data discovery portal in June 2023, this dynamic has changed. LUX reveals the technical and research debt that has accumulated across the cultural heritage field, particularly in authority control and consistency in property usage. The obscured relationship graph that LUX now exposes raises an important question: If these properties are to be effectively leveraged, who is responsible for maintaining best practices in their use? How can we come together as a community to establish these practices? This lightning talk will explore the sometimes amusing and often unfortunate downstream effects of incorrect reciprocal relationships now revealed by LUX and invite the community to reconsider our approach to data creation in light of these challenges.
I'm the cultural heritage data engineer on Yale's LUX platform, a native LOD cross-collections discovery service. I came to Yale in the summer of 2022, after eight years working at the Getty Provenance Index, a program of the Getty Research Institute. My background is art history... Read More →