Introduction to the conference, including conference technology platforms, communication channels, formats, social opportunities, code of conduct, and more!
In December 2022 the National Library Board Singapore (NLB) launched a continuously updated, Linked Data based, Semantic Knowledge Graph (KG) to manage and aggregate resources from their library, authority, National Archives, and content management systems. The design of the data, and operational architecture of the KG, based upon the BIBFRAME and Schema.org vocabularies, took a unique approach to the management and cataloguing of data about library resources. It did not seek to change or replace established cataloging systems or processes, to facilitate the introduction of a linked data KG. These remain unchanged in the source systems. The creation of linked data entities and descriptions from source, resides in the daily import pipeline processes of the KG. This results in the dual benefits of not requiring the introduction of new end to end systems, or the disruption to current cataloging practices. It also separates the concerns of linked data entitity management into the KG system. Developments have continued since the successful launch. Utilising the data and functionality from the KG for sharing across the web and embedding in other NLB hosted services. Additionally, processes have been implemented to use external authority services, such as the Library of Congress Name Authority File, to enrich and improve the data quality of KG entities. Richard will review the architecture, its benefits and challenges plus advancements made since the initial launch of the system.
CLS INFRA is an EU Horizon 2020 funded project building shared and sustainable infrastructure for computational literary studies. The resources we have built link existing tools and processes, experiment with new programmable corpora, and reviews resources for their multilingual applicability and interoperability. We hope these will be of use to the GLAM sector, so this session will introduce the project and outputs and invite suggestions as to what would be most useful for DH and non-DH experts there.
In this lightning talk, I will introduce the concept of Linked Open Data (LOD) and its application in Persian literature. LOD is a method of structuring data to make it easily connectable and shareable across the web, which is particularly useful for digital humanities. Persian literature contains a wealth of dispersed textual, historical, and cultural data. By applying LOD principles, we can link these scattered resources, creating a more cohesive and accessible network of information. This talk will briefly cover how LOD can enhance the study of Persian literature by interlinking various datasets, improving data discoverability, and fostering new research opportunities. The focus will be on practical examples of how LOD has been successfully implemented in other literary fields, demonstrating its potential for Persian literary studies.
This brief lightning talk will highlight 2-3 grant programs offered by the National Endowment for the Humanities that have and continue to support efforts to foster linked data infrastructure. From early convenings and experiments, to more recent efforts focused on equitable access to collections, NEH has invested in such work for well over a decade. This snapshot will take the form of an overview and an invitation for conversation with potential applicants.
Due to the complex and unique nature of manuscripts as handwritten objects, there exists no standard cataloging methodology for manuscripts. Institutional metadata contributed to the Digital Scriptorium (DS) Catalog, an online union catalog aggregating manuscript records from institutions across North America, varies in robustness of description, encoding formats, and other elements of data organization. The DS Catalog, therefore, enables the harmonization of diverse institutional descriptions and the broader linked data environment, which includes Wikidata, an open, crowdsourced, global database for structuring data.
Out of a desire for increased discoverability and data reusability, the research team developed a crosswalk from the DS Catalog and Wikidata to address issues of interoperability between metadata schemas and vocabularies by matching semantically equivalent or similar elements or values. In order to upload manuscript records from the DS Catalog to Wikidata, the research team identified ways to map the DS data model, and the manuscript records and data values found in the DS Catalog, to Wikidata. This lightning will provide a brief introduction to the development of this mapping process, the tools used, obstacles encountered, and solutions identified, and the implications for the future of manuscript cataloging and data reuse.
ZOOM PASSCODE IS: ld4-2024 Jeff Mixter, Senior Product Manager of linked data at OCLC, will discuss topics related to OCLC work in linked data and answer any questions about linked data use in the library community today and into the future. Linked data resources will also be available to help educate and inform of OCLC’s linked data initiatives.
ZOOM PASSCODE IS: ld4-2024 Cataloging over 2,000 episodes of On the Line (1989-2002), the predecessor to The Brian Lehrer Show and a highlight in the WNYC collection, offered the opportunity to create new tools for applying linked data. The goal of creating “selector” tools was to create a central view for a variety of inputs, including an original producer database and Library of Congress authority records, each carefully selected based on collection relevance. The selector tools allow efficient and targeted human review, resulting in persistent URLs to be uploaded as XML and affixed to digitized episode audio. This process allows for efficient cataloging while maintaining ownership over original asset metadata and newly available audio, and resulted in adding 6,000 LCNAF links, creating 3,000 internal authorities, and applying LCSH to more than 9,000 segments. The result is a case study in adding Linked Data to assets at scale, with lessons learned that will go into effect in future collections, as well as a demonstration of the importance of Linked Data to create discoverability and inspire trust through accuracy. This presentation will share the exact steps taken and tools created alongside highlights along the way, with the aim to help series cataloging to be more standardized, thorough, and streamlined.
Our stations' are nearly a century old, so there is a wide variety of quality in both metadata and audio.My focus has been on normalization of metadata (within and across platforms), as well as data augmentation via APIs and Linked Data. For example, we are developing tools that analyze... Read More →
ZOOM PASSCODE: ld4-2024 Afternoon panel consisting of three short talks + questions and answers. Please click on the links to the fuller descriptions of each talk for more details. From Linked Art to Text and Back Again: An Unsupervised Approach William Thorne, PhD Candidate, University of Sheffield; National Gallery (London)
I'm the cultural heritage data engineer on Yale's LUX platform, a native LOD cross-collections discovery service. I came to Yale in the summer of 2022, after eight years working at the Getty Provenance Index, a program of the Getty Research Institute. My background is art history... Read More →
PhD Candidate, University of Sheffield; National Gallery (London)
I'm Liam, I am studying a joint PhD between the University of Sheffield and the National Gallery (London) into information extraction, organisation and searching of art historical text collections. My key areas of research interest are in reducing computational and data costs of language... Read More →
ZOOM PASSCODE: ld4-2024 The intersection of human understanding and machine processing in cultural heritage presents a fundamental challenge: humans naturally express their interpretations through textual descriptions, while machines reason most reliably over structured data.
Whilst researchers, developers and the public need data to be available in numerous formats, manual translation requires time and intimate knowledge of the data and chosen ontology; machine learning approaches generally require numerous paired training examples to perform well.
However, large quantities of linked-data and natural language samples already exist separately. We use cycle-consistency training, an unsupervised approach for learning bidirectional translation between linked-data and natural language. Using two sequence-to-sequence language models and two unpaired datasets, we learn to align their feature spaces through iterative back-translation: one model generates a synthetic example as input to a second model, which attempts to recreate the real, original input data to the first model. Once trained, these models may be used to translate arbitrary data from one representation to the other. This approach has already been shown to be incredibly effective in a graph-to-text setting (Q. Gou et al., 2020) but is yet to be applied in cultural heritage.
This presentation gives an overview of the datasets, the key differences between them, and the implications this has for the task of translation, particularly with respect to our training paradigm. I will then close with some proposed remedies before opening up to questions.
PhD Candidate, University of Sheffield; National Gallery (London)
I'm Liam, I am studying a joint PhD between the University of Sheffield and the National Gallery (London) into information extraction, organisation and searching of art historical text collections. My key areas of research interest are in reducing computational and data costs of language... Read More →
ZOOM PASSCODE: ld4-2024 In January 2024, the Technical Services team at Binghamton University launched an informal, monthly linked data study group. These meetings create space to discuss, research, and ask open questions about linked data projects and how they could be integrated into our daily workflows. This lightning talk will explain what the group formation looked like, our initial plans, how those plans changed, and what we hope to accomplish in the future.
ZOOM PASSCODE: ld4-2024 For years, organizations have been releasing authority data as Linked Open Data, using properties like owl:sameAs and skos:exactMatch to maintain reciprocal relationships between their data and that of others. Organizations have been following their own data management practices and best practices to create these relationships, and large-scale projects leveraging them have been rare, so any inaccuracies have remained dormant. However, with the launch of LUX, Yale’s cross-collections, linked open data discovery portal in June 2023, this dynamic has changed. LUX reveals the technical and research debt that has accumulated across the cultural heritage field, particularly in authority control and consistency in property usage. The obscured relationship graph that LUX now exposes raises an important question: If these properties are to be effectively leveraged, who is responsible for maintaining best practices in their use? How can we come together as a community to establish these practices? This lightning talk will explore the sometimes amusing and often unfortunate downstream effects of incorrect reciprocal relationships now revealed by LUX and invite the community to reconsider our approach to data creation in light of these challenges.
I'm the cultural heritage data engineer on Yale's LUX platform, a native LOD cross-collections discovery service. I came to Yale in the summer of 2022, after eight years working at the Getty Provenance Index, a program of the Getty Research Institute. My background is art history... Read More →
Mobile Subjects, Contrapuntal Modernisms investigates the circulation of artists from the decolonizing world through the colonial and artistic capitals of London and Paris. It examines and compares London and Paris as contrapuntal capitals of decolonizing empires that functioned as critical meeting places, anti-colonial hubs, and sites of exchange after WWII due to postwar mass migration. The project addresses the invisibility of overseas artists in histories of art through computational methods revealing their connections and intersections.
The relational database built for this project has been modeled upon CIDOC CRM ontology and establishes an event-based schema that connects people (or actors as defined by the CRM) to each other and defines their identities and social relationships, including racial identity, citizenship, gender, social class, political affiliations, language(s) used, and belonging to artistic groups. We decided to adopt an “universally” recognised ontology for the benefits of data integration and exchange with GLAM institutions and other art history projects. However, we are increasingly aware of the epistemological biases and knowledge gaps present in the CIDOC CRM. In this lightning talk we will discuss our work, focusing on the classes and properties that need to be addressed to better represent the identities of artists, and review existing efforts to tackle elements of this issue in other ontologies and CIDOC CRM extensions.
I am in my first year of studying for my PhD in Cultural Mediations at Carleton University. My research on the international “Art Bank” model uses data to examine difficult histories in public art collections.My recent MA thesis work: ARTiculating Canadian Identities (padlet... Read More →
Carleton University and University of the Arts London
Maribel Hidalgo Urbaneja is a postdoctoral researcher working on the Worlding Public Cultures research project at the University of the Arts London and on the Mobile Subjects. Contrapuntal Modernisms research project at Carleton University in Canada. Her research interests span digital... Read More →
Launched in June 2023, LUX: Yale Collections Discovery represents a groundbreaking shift in the conversation around linked data within the cultural heritage sector. As the largest cross-collections linked data portal in the U.S., LUX consolidates records from eight distinct Yale units: the Yale Center for British Art, Yale University Art Gallery, Peabody Museum, Yale University Library System, Beinecke Rare Book and Manuscript Library, Paul Mellon Centre, Yale Collection of Musical Instruments, and the Yale Campus Art Collection. These units span a wide array of cultural heritage domains, and LUX integrates their records into a unified search and discovery portal. This presentation will provide an overview of how LUX operates, exploring both the technical infrastructure and the collaborative, social processes that were essential to its development. We’ll delve into what was required to build this comprehensive portal and how the work has influenced cataloging and access methodologies across the participating units. Additionally, the session will include a live demonstration of the platform, showcasing its powerful features and illustrating how LUX contributes to the expansion of the cultural heritage knowledge graph. While focusing on the platform’s capabilities, the presentation will also touch upon the challenges encountered during implementation, particularly in areas of data quality and reconciliation.
I'm the cultural heritage data engineer on Yale's LUX platform, a native LOD cross-collections discovery service. I came to Yale in the summer of 2022, after eight years working at the Getty Provenance Index, a program of the Getty Research Institute. My background is art history... Read More →
Join the LD4 Rare Materials Affinity Group (RMAG | https://github.com/LD4/rare-materials) to learn more about our group activities and programing and contribute to an informal discussion regarding community needs for a revision of the Art and Rare Materials (ARM) BIBFRAME Ontology Extension | https://github.com/Art-and-Rare-Materials-BF-Ext/arm. The session will include a short presentation on the history of ARM and plans for a new joint task force to review the ontology. Then the floor will be open for discussion.
Metadata Librarian at the Harry Ransom Center, University of Texas at Austin, where I oversee the creation and management of MARC-based cataloging and develop metadata strategies to enhance access and discovery of rare and unique materials. Working collaboratively across departments... Read More →