2D presentation (O for overview, ? for help). Continuous HTML. PDF. Publications
GLAM, CH, DH?
Search for "library, museum" vs "Google, Facebook, Twitter" in books: the web sites are negligible
Compare two specific orgs: "Facebook" is more popular in recent books, compared to "British Museum" over time
Web searches over the last 12 years: "Facebook, Google" are much more popular than "library, museum"
Since ancient times GLAMs have been the centers of knowledge and wisdom
To survive, GLAMs must adopt the internet as their default modus operandi
GLAM data is complex and varied
Thus professional organizations have found it useful to define content standards
Examples are extremely useful for data modelers to decide how to map the data
Cataloging Cultural Objects: content standard for art, architecture, museums
How to describe one aspect of the data
UK Museum Collections Management Standard
Image by D.Pitti, 2015
Extremely detailed and comprehensive (see RDA later). But sometimes pay more attention where to put the commas than to:
Functional Requirements for Bibliographic Records (FRBR), Subject Authority Data (FRSAD), Authority Data (FRAD) (J.Mitchell, M.Zeng, M.Zumer, 2011)
Starts from user tasks (find, identify, select, obtain, explore). Introduces the important 4-level WEMI model (relates to Uniform Titles):
Anything can be subject (thema), referred to by various names/titles (nomen)
FRBR-Library Reference Model (P.Riva, P.Le Bœuf, M.Žumer, Draft for World-Wide Review 2016-02). Merges the previous standards
How many of the standards listed in Seeing Standards: A Visualization of the Metadata Universe apply to your work? (by Jenn Riley, Associate Dean for Digital Initiatives at McGill University Library)
Do you deal with XML? I bet you do
Tools:
Categories for the Description of Works of Art (CDWA): realization of CCO, 532 "categories" (data elements).
XML schema implementing part of CDWA. Moderate complexity, about 300 elements. Display vs Indexing (structured) elements, eg for Dimension.
Cultural Objects Name Authority (CONA): Getty museum data aggregation. Moderate complexity, about 280 elements:
SPECTRUM Schema 4.0b has 10 entities and 592 fields, of which 490 are Object (artwork) fields. I am not aware of any systems producing this.
Lightweight Information Describing Objects (LIDO). Evolved from CDWA, museumdat, with inspiration from CIDOC CRM. (Images by R.Stein and A.Vitzthum, ATHENA workshop, 2010)
Pay a lot of attention to presentation, not enough to linking (difficult to "semanticize"). Emphasis on documents, not historic agents and events
<bioghist> <head>Chronological Events</head> <chronlist> <chronitem> <date normal="19781028">October 28, 1978</date> <event> <persname normal="Wossname, Samuel">Sam Wossname</persname> succeeds <persname normal="Othername, John">John Othername</persname> as department head. </event> </chronitem> <chronitem> <date normal="19790315">March 15, 1979</date> <event>Departmental reorganization.</event> </chronitem> </chronlist> </bioghist>
MARC is 50 years old, unreadable, and doesn't accommodate new FRBR principles. MARC-XML is not much better
A whole emotional subculture, based on a slogan by Roy Fielding, 2002.
Presentation by Sally Chambers, ELAG 2011
Why do they call conversion to RDF "lifting" and back to some other format "lowering"?
Model used by the Europeana aggregator (53M objects), and adopted by Digital Public Library of America (DPLA) Based on:
Evolving specification (since 2009)
CIDOC CRM: comprehensive reference model used for history, historic events, archaeology, museum data, etc by CIDOC (ICOM documentation committee). Standardized as ISO 21127:2014, still evolving. About 85 classes, fundamental branches: Persistent (endurant) vs Temporal (perdurant), Physical vs Conceptual
Classes represent abstract things (eg crm:E24_Physical_Man-Made_Thing), specific things (eg Paintings, Coins) are accommodated with crm:P2_has_type. 135 props (plus their inverses); prop hierarchy (see "- - -" at bottom):
W3C TR: mark, annotate, relate any web resources, eg: Webpage and bookmark, Image and region over it, Document and translation, Paragraph and commentary. Diagram of Complete Example from spec (using my rdfpuml)
Standard API for DeepZoom (hi-res) images. Supported by many servers and viewers. http://iiif.io
Based on OA and SharedCanvas. Strong attention to JSONLD representation (convenient for developers). Allows to assemble manuscripts from pieces, present folios, etc etc. See Rob Sanderson presentations, eg IIIF and JSONLD:
War of the Bibliographic Ontologies?
Resource Description and Access (RDA). Registry info is well organized
Many props (306 for Work alone), for specific purposes (eg "apellee" for court decisions, "granting institution" for academic theses). Numeric prop names, but lexical (natural language) also supported. Serves many semantic formats.
EDM–FRBRoo Application Profile Task Force: asked what to add to EDM to better fit FRBRoo.
EDM variant:
Simpler FRBRoo variant:
More complex FRBRoo variant:
Pragmatic data model that reuses several ontologies, and adds own props
Oslo Public Library (http://data.deichman.no, since 2014) uses Koha open source software, RDF in the core, and marc2rdf/rdf2marc conversions. Pragmatic data model that reuses several ontologies, and adds own props. Enables a number of agile apps, eg search related books on Kiosk
d_res:tnr_749919 rdf:type bibo:Document , fabio:Manifestation ; dc:title "About time" ; d:titleURLized "about_time" ; fabio:hasSubtitle "Einstein's unfinished revolution" ; ctag:tagged d_keyword:imaginary , d_keyword:dilation , d_keyword:time , d_keyword:tidsreiser , d_keyword:tidsdilatasjon ; foaf:depiction <http://covers.openlibrary.org/b/id/96714-M.jpg> , <http://covers.openlibrary.org/b/id/96715-M.jpg> , <http://www.bokkilden.no/SamboWeb/servlet/VisBildeServlet?produktId=81081> ; owl:sameAs <http://purl.org/NET/book/isbn/0140174613#book> , <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0140174613> ; dc:language lexvo:eng ; d:bibliofilID "931138" ; dc:format <http://data.deichman.no/format/Book> ; d:location_signature "Dav" ; dc:publisher d_org:penguin ; bibo:numPages "316" ; d:physicalDescription "fig." ; d:bibsubject d_subject:einstein_albert , d_subject:tid_metafysikk ; fabio:isManifestationOf d_work:x24918900_about_time ; d:signatureNote "07x0619gq" ; d:bindingInfo <http://data.deichman.no/bindingInfo/h> ; d:bsID "0181541" ; dc:description "Bibliografi: s. 293-294"@no ; d:priceInfo "Nkr 170.00" ; foaf:isPrimaryTopicOf <http://www.goodreads.com/book/show/286461> , <http://www.librarything.com/work/23493> ; dc:identifier "749919" ; d:dewey "115" , "530.11" ; d:location_dewey "530.11" ; bibo:isbn "9780140174618" , "0140174613" ;
3 attempts to represent EAD as RDF, but IMHO neither is very good.
Records in Context (RiC): new upcoming semantic standard by ICA
Tons of info on everything, including GLAMs, artists, artworks, etc. Eg Frans Hals on Reasonator
Wikidata Project Sum of All Paintings. Data used for:
Excellent image search. Shows links to WD, Wikimedia Commons, original website. Eg Frans Hals on Crotos
Hunting for missing inventory numbers (9.9k of 140k). Important because <collection, inventory number> is used to identify the painting. Eg US (1k), Getty Museum (2)
Find it on Getty's site, add the info like this:
Timelines of everyting. Eg paintings by Leonardo
Virtual International Authority File: 20 national libraries, 10 other contributors including Getty ULAN and Wikidata. Eg coreferencing cluster of Spinoza:
Analyzed records of Lucas Cranach in 7 LOD datasets (Wikidata: Freebase, DBpedia, Yago; VIAF: ISNI, ULAN).
A global Authority on everything: librarian's dream come true! Mix-n-Match is a collaborative tool to create coreferences. 234 authorities, including Getty AAT, TGN, ULAN; RKD artists, works; LoC Authorities; VIAF (not in M-n-M but on WD); BM persons; BBC YourPaintings; Artsy, etc etc
Eg checking matches to Getty AAT. Single sign-on, a click per item. Easy!
GLAM and DH projects present a bewildering variety, eg
Research functions and sometimes integrated into Virtual Research Environments
The Andrew Mellon Foundation funds many projects in CH and DH, and a few software projects, including:
Executed by the British Museum. Ontotext developed the first prototype (2010-2013). Semantic Search
Powerful and precise search: Drawings by Rembrandt that are about Mammals
First implementation experience of the CIDOC CRM Fundamental Relations approach
120 GraphDB rules, weaved using Literate Programming approach. Inference dependencies between props (text=input, gray=intermediate, white=output)
(Not Ontotext work). Watch the video (D.Oldman)
Executed by a consortium led by US National Gallery of Art. Developed by Sirma ITT (Ontotext sibling). Based on Ontotext GraphDB (semantic metadata), Alfresco (document management), Smart Documents (Sirma product).
Ontotext crated and hosted the Europeana SPARQL and OAI PMH services
Eg chart of newspapers (several millions) by year: can't do this using the Europeana API, but is easy with SPARQL
Food & Drink content, semantically enriched (place and FD topic). EFD Semantic App: open data, SPARQL endpoint, open source (Github). Uses GraphDB and ElasticSearch enterprise connector
Eg 150 with beer, including pancakes!
Objects from the Roman Empire to Antarctica (Scott's expedition to the South Pole), and everything in-between
Use Wikipedia Categories to extract a FD Gazetteer.
Selected French as second enrichment language after English, considering category overlap (work by L.Tolosi, x-axis is cat level), available content, NLP capabilities
We used standard Ontotext Concept Enrichment Service, which is a mix of DBpedia+Wikidata. But also had to add Geonames, to leverage the place hierarchy
Hierarchical semantic facet based on Geonames
Once we have places, it's relatively easy to map them. We used the Cluster Mapper library
There are 9k objects marked "Bulgaria". We don't want all flags in the center of Bulgaria, so we jitter them up
Why should GLAMs bother about Wikidata? Because it gives an excellent way to connect and expose your collection data to a multilingual audience
GVP well-known and respected in GLAM. Dependencies: AAT-TGN-ULAN-CONA. Center of LODLAM cloud? GVP Training Materials (Diagram by J.Cobb, 2014)
AAT 2014-02, TGN 2014-08, ULAN 2015-03. Publicized in blog posts by J.Cuno, head of the Getty Trust
See GVP LOD: Ontologies and Semantic Representation, V.Alexiev, CIDOC 2014. External Ontologies:
Prefix | Ontology | Used for |
bibo: | Bibliography Ontology | Sources |
dc: | Dublin Core Elements | common |
dct: | Dublin Core Terms | common |
foaf: | Friend of a Friend ontology | Contributors |
iso: | ISO 25946 (latest on thesauri) | iso:ThesaurusArray, BTG/BTP/BTI |
owl: | Web Ontology Language | Basic RDF representation |
prov: | Provenance Ontology | Revision history |
rdf: | Resource Description Framework | Basic RDF representation |
rdfs: | RDF Schema | Basic RDF representation |
schema: | Schema.org | common, geo (TGN), bio (ULAN) |
skos: | Simple Knowledge Organization System | Basis vocabulary representation |
skosxl: | SKOS Extension for Labels | Rich labels |
wgs: | W3C World Geodetic Survey geo | Geo (TGN) |
xsd: | XML Schema Datatypes | Basic RDF representation |
Excel-driven Ontology Generation™. Key val can be mapped to Custom sub-class, Custom (sub-)prop, Ontology Value (eg <term/kind/Abbreviation>)
More Excel-driven Ontology Generation™
Getty Vocabularies Linked Open Data: Semantic Representation. Alexiev, V.; Cobb, J.; Garcia, G.; Harpring, P. Getty Research Institute, 3.2 edition, March 2015.
Some charts, eg "Year Joined UN" (TGN), "Pope Reign Durations" (ULAN)
Collected about 100 usages of the vocabs, many in Collection Management and Search. Many described in Getty Vocabs: Why LOD? Why Now?, J.Cobb, 2014. Eg
Working with JPGM on publishing LOD. Considering CIDOC CRM, maybe also simpler ontologies. Hoping to generate R2RML from instance examples like:
Discussing making data for Wikidata. WD has 480 Getty paintings, but the Museum has 180k artworks. WD query shown as image grid
American Art Collaborative: 14 US art museums committed to establishing a critical mass of LOD on the semantic web. Consulting on CRM mapping.
EHRI is a large-scale EU project that involves 23 Holocaust archives (Europe, Israel and the US), DH and IT organizations.
"Semantic Archive Integration for Holocaust Research: the EHRI Research Infrastructure", V.Alexiev, L.Brazzo, CIDOC Congress 2016.
Research question: how person networks influenced chance of survival. Idea:
Match USHMM places to Geonames, also achieving deduplication. A Geonames matching pipeline in free text was also developed
Analyze 2.5k OH Interviews:
guard | Cos dist | punishment | Cos dist |
---|---|---|---|
guarding | 0.593507 | punishments | 0.668144 |
sentry | 0.512083 | punish | 0.601212 |
hlinka | 0.496201 | punishing | 0.543213 |
gate | 0.490032 | beatings | 0.527033 |
watching | 0.484647 | penalty | 0.497262 |
rifle | 0.484379 | deserved | 0.490157 |
lookout | 0.482025 | beaten | 0.473870 |
patrol | 0.477233 | straf | 0.473338 |
soldier | 0.475982 | offense | 0.461230 |
guarded | 0.474689 | executing | 0.459965 |
police | 0.474291 | merciless | 0.455123 |
semantic "differencing" (interesting)
KGB - Stalin + Hitler = SS
And referencing to Geonames so we can get coordinates
Vienna University of Technology (site, paper)
NLP analysis of medieval Charters and Deeds. Funded by Digging Into Data cross-country SSH funding initiative. Visualized with BRAT
My good friend Ethan Gruber at the American Numismatic Society has developed a host of amazing software that uses and produces LOD.
Spatiotemporal distribution of hoards containing a particular Roman Republican coin type. Below: examples of this type in partner collections
Distribution of the Roman denarius: blue dots for mints, heatmap of finds (a lot in the UK Portable Antiquities Scheme)
Data platform with over 100k coin types. Powers custom collections, eg Art of Devastation: Medalic Art of the Great War
Shared authorities for numismatics. Eg a mint:
Denominations issued by Augustus, Tiberius… rendered in a chart using d3js
Kerameikos Project editor. Based on XForms, leverages Getty and BM LOD
Blog, Wiki. Based on XForms. Leverages the Getty thesauri and VIAF, imports data as needed