Semantic API
The SemanticApi class provides access to Dataverse’s Semantic/Linked Data API endpoints. It specializes in retrieving dataset metadata in JSON-LD (JSON for Linking Data) format, which enables semantic web applications, knowledge graphs, and linked data workflows. While other APIs return structured metadata models, SemanticApi returns JSON-LD dictionaries that can be converted to RDF graphs for advanced semantic processing.
Compared to other APIs, SemanticApi focuses on semantic web standards and linked data. It supports converting JSON-LD responses to RDFLib Graph objects, enabling SPARQL queries, RDF serialization, and integration with semantic web tools. Each method returns JSON-LD dictionaries that include semantic context and can be processed as linked data.
Initialization
Section titled “Initialization”To start using the Semantic API, create a SemanticApi instance with the base URL of your Dataverse installation and, if needed, an API token for authenticated operations.
from pyDataverse.api import SemanticApi
# Read-only access (public datasets)api = SemanticApi(base_url="https://demo.dataverse.org")
# Authenticated access for private datasetsapi = SemanticApi( base_url="https://dataverse.example.edu", api_token="your-api-token-here",)Understanding the Parameters
Section titled “Understanding the Parameters”SemanticApi supports the same core parameters as other API classes:
base_url(str, required): The base URL of the Dataverse installation, such as"https://demo.dataverse.org"or"https://dataverse.harvard.edu". All API calls are constructed from this URL.api_token(str, optional): API token used for endpoints that require authentication, such as accessing private datasets.api_version(str, optional): API version string passed to the Dataverse server. This is typically left at its default unless you have a specific reason to override it.
The SemanticApi class automatically manages request URLs, parameters, and authentication headers for you. Methods that retrieve public dataset metadata can be called without an API token, while accessing private datasets requires authentication.
Retrieving Dataset Metadata
Section titled “Retrieving Dataset Metadata”The Semantic API provides methods for retrieving dataset metadata in JSON-LD format, supporting both single and batch operations.
Fetching a Single Dataset
Section titled “Fetching a Single Dataset”Use get_dataset to retrieve metadata for a single dataset by its persistent identifier (PID) or numeric database ID. The method returns a dictionary containing the dataset metadata in JSON-LD format.
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
# Fetch by persistent identifier (DOI)metadata = api.get_dataset("doi:10.11587/8H3N93")print(metadata["@context"]) # JSON-LD contextprint(metadata.get("name")) # Dataset title
# Fetch by numeric IDmetadata = api.get_dataset(42)print(metadata["@type"]) # Dataset typeJSON-LD is a lightweight syntax for encoding linked data using JSON. The response includes standard JSON-LD fields like @context (which defines the vocabulary), @type, and dataset-specific metadata fields.
Accessing Metadata Fields
Section titled “Accessing Metadata Fields”The JSON-LD response contains structured metadata that you can access programmatically:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
metadata = api.get_dataset("doi:10.11587/8H3N93")
# Access dataset titletitle = metadata.get("name")print(f"Title: {title}")
# Access authorsauthors = metadata.get("author", [])for author in authors: print(f"Author: {author.get('name')}")
# Access descriptiondescription = metadata.get("description")print(f"Description: {description}")The @context field is essential for properly interpreting the semantic meaning of the data fields, as it defines the vocabulary and mappings used in the metadata.
Fetching Multiple Datasets
Section titled “Fetching Multiple Datasets”For bulk operations, use get_datasets to retrieve metadata for multiple datasets efficiently. The method processes datasets in concurrent batches for improved performance.
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
# Fetch multiple datasetsidentifiers = [ "doi:10.11587/8H3N93", "doi:10.11587/ABC123", 42, # numeric ID also supported]
all_metadata = api.get_datasets(identifiers)print(f"Retrieved {len(all_metadata)} datasets")
# Process each datasetfor metadata in all_metadata: title = metadata.get("name", "Unknown") author_count = len(metadata.get("author", [])) print(f"Dataset: {title}, Authors: {author_count}")The method automatically handles concurrent API requests within batches, proper async client lifecycle management, and error handling. Results are returned in the same order as the input identifiers.
Customizing Batch Size
Section titled “Customizing Batch Size”For large collections, you can adjust the batch size to balance performance and resource usage:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
# Process a large collection with smaller batch sizelarge_collection = [f"doi:10.11587/ID{i}" for i in range(1000)]metadata = api.get_datasets(large_collection, batch_size=25)The default batch size is 50, which provides a good balance for most use cases. Consider using smaller batch sizes (10-25) when processing very large collections or when working with slower networks.
Converting Directly to a Graph
Section titled “Converting Directly to a Graph”You can convert multiple datasets directly to a single RDF graph by setting as_graph=True:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
identifiers = [ "doi:10.11587/8H3N93", "doi:10.11587/ABC123", "doi:10.11587/XYZ789",]
# Get datasets and convert directly to a merged graphcombined_graph = api.get_datasets(identifiers, as_graph=True)print(f"Combined graph has {len(combined_graph)} triples")
# Query across all datasetsquery = ''' SELECT ?title WHERE { ?dataset <http://schema.org/name> ?title . }'''results = combined_graph.query(query)for row in results: print(f"Title: {row.title}")This is a convenient shortcut that combines fetching multiple datasets and merging them into a single graph in one step.
Working with RDF Graphs
Section titled “Working with RDF Graphs”The Semantic API provides utilities for converting JSON-LD responses to RDFLib Graph objects, enabling advanced semantic data processing, SPARQL queries, and RDF serialization.
Converting to RDF Graphs
Section titled “Converting to RDF Graphs”Use response_to_graph to convert a single JSON-LD response to an RDFLib Graph object:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
# Get dataset metadatametadata = api.get_dataset("doi:10.11587/8H3N93")
# Convert to RDF graphgraph = api.response_to_graph(metadata)print(f"Graph contains {len(graph)} triples")RDFLib is a Python library for working with RDF (Resource Description Framework) data. By converting JSON-LD to an RDFLib Graph, you can perform advanced semantic operations.
Querying with SPARQL
Section titled “Querying with SPARQL”Once converted to a graph, you can execute SPARQL queries on the metadata:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
metadata = api.get_dataset("doi:10.11587/8H3N93")graph = api.response_to_graph(metadata)
# Execute a SPARQL queryquery = ''' SELECT ?title WHERE { ?dataset a ?type . ?dataset <http://schema.org/name> ?title . }'''
results = graph.query(query)for row in results: print(f"Title: {row.title}")SPARQL queries allow you to extract specific information from the semantic graph using a powerful query language designed for RDF data.
Serializing to RDF Formats
Section titled “Serializing to RDF Formats”RDFLib graphs can be serialized to various RDF formats:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
metadata = api.get_dataset("doi:10.11587/8H3N93")graph = api.response_to_graph(metadata)
# Serialize to Turtle formatturtle_data = graph.serialize(format='turtle')print(turtle_data)
# Serialize to RDF/XMLrdf_xml = graph.serialize(format='xml')print(rdf_xml)
# Serialize to N-Triplesntriples = graph.serialize(format='nt')print(ntriples)This enables integration with other semantic web tools and workflows that work with different RDF serialization formats.
Merging Multiple Datasets
Section titled “Merging Multiple Datasets”You can merge multiple datasets into a single knowledge graph for combined analysis:
from pyDataverse.api import SemanticApifrom rdflib import Graph
api = SemanticApi("https://demo.dataverse.org")
identifiers = ["doi:10.11587/8H3N93", "doi:10.11587/ABC123"]
# Method 1: Convert individually and mergecombined_graph = Graph()for metadata in api.get_datasets(identifiers): dataset_graph = api.response_to_graph(metadata) combined_graph += dataset_graph
print(f"Combined graph has {len(combined_graph)} triples")
# Method 2: Use responses_to_graph for direct conversionall_metadata = api.get_datasets(identifiers)combined_graph = api.responses_to_graph(all_metadata)print(f"Combined graph has {len(combined_graph)} triples")The responses_to_graph method provides a convenient way to convert multiple JSON-LD responses directly into a single merged graph.
Converting Multiple Datasets to a Single Graph
Section titled “Converting Multiple Datasets to a Single Graph”For batch operations, you can convert multiple datasets directly to a single graph:
from pyDataverse.api import SemanticApi
api = SemanticApi("https://demo.dataverse.org")
identifiers = [ "doi:10.11587/8H3N93", "doi:10.11587/ABC123", "doi:10.11587/XYZ789",]
# Get all datasets and convert to a single graphall_metadata = api.get_datasets(identifiers)combined_graph = api.responses_to_graph(all_metadata)
# Now you can query across all datasetsquery = ''' SELECT ?title ?author WHERE { ?dataset <http://schema.org/name> ?title . ?dataset <http://schema.org/author> ?author . }'''
results = combined_graph.query(query)for row in results: print(f"{row.title} by {row.author}")This approach is useful when you need to perform cross-dataset queries or build a unified knowledge graph from multiple sources.
When to Use SemanticApi
Section titled “When to Use SemanticApi”Use SemanticApi when you:
- need JSON-LD format for semantic web applications or linked data workflows.
- want to build knowledge graphs from dataset metadata and need RDF graph structures.
- need to execute SPARQL queries on dataset metadata to extract specific information.
- are integrating with semantic web tools that require RDF or JSON-LD formats.
- want to merge multiple datasets into a unified semantic graph for combined analysis.
- are building linked data applications that need to understand semantic relationships between datasets.
For most everyday workflows (accessing dataset metadata, creating datasets, uploading files), the high-level Dataverse class or NativeApi provide convenient methods that return structured Pydantic models. When you need semantic web capabilities, linked data processing, or RDF graph operations, SemanticApi gives you access to JSON-LD formatted metadata and RDF graph conversion utilities.