Skip to content

Search

The SearchResult class represents search results from the Dataverse search API. It provides convenient access to datasets and collections found in search results, similar to how Collection provides access to its child content.

Search results differ from direct collection/dataset access in that they:

  • Are read-only (no update_metadata support)
  • Only contain identifiers (DOIs for datasets, aliases for collections), not database IDs
  • Represent a snapshot at search time
  • Items are fetched lazily when accessed

The SearchResult class encapsulates search results from the Dataverse search API, providing a unified interface for working with datasets and collections found through search. It abstracts away the complexity of working with search API responses while maintaining full access to the found content.

The class handles several key responsibilities:

  • Result access: Access datasets and collections found in search results through convenient view objects that support both iteration and dictionary-like access
  • Lazy loading: Fetch datasets and collections on-demand when accessed, optimizing for performance
  • Identifier-based access: Work with identifiers (DOIs, aliases) rather than database IDs, which search results don’t provide

The Dataverse class provides a search() method for searching across collections, datasets, and files:

from pyDataverse import Dataverse
dv = Dataverse("https://demo.dataverse.org")
# Perform a search
results = dv.search("climate change")
print(f"Found {results._search_response.total_count} results")

The search() method returns a SearchResult object that provides convenient access to datasets and collections found in the search results.

Collections also provide a search() method that searches within the collection and its sub-collections:

from pyDataverse import Dataverse
dv = Dataverse(base_url="https://demo.dataverse.org")
# Access a specific collection
simtech = dv.collections["simtech"]
# Search within the collection
for dataset in simtech.search("Deactivation").datasets:
print(dataset.title)

When you call search() on a collection, the search is automatically scoped to that collection and its nested sub-collections. This provides a convenient way to search within a specific organizational unit without having to manually configure search filters.

For advanced search options such as filtering by type, pagination, faceted search, or using Solr query syntax, you can use the search_api property directly. See the Search API documentation for more details.

The datasets property provides a view of all datasets in the search results:

from pyDataverse import Dataverse
dv = Dataverse(base_url="https://demo.dataverse.org")
# Perform a search and iterate over datasets
for dataset in dv.search("climate change").datasets:
print(dataset.title)
# You can also store the results first
results = dv.search("climate change")
# Iterate over all datasets in search results
for dataset in results.datasets:
print(f"{dataset.title}: {dataset.identifier}")
# Access a specific dataset by identifier (DOI)
dataset = results.datasets["doi:10.5072/FK2/ABC123"]
# Check how many datasets are in the results
dataset_count = sum(1 for _ in results.datasets)

The datasets property returns a DatasetView object that supports both iteration and dictionary-like access. Datasets are fetched on-demand using their global_id (DOI) from the search results. Note that search results only contain identifiers, not database IDs, so all dataset access uses identifier-based lookups.

The collections property provides a view of all collections (dataverses) in the search results:

from pyDataverse import Dataverse
dv = Dataverse(base_url="https://demo.dataverse.org")
# Perform a search and iterate over collections
for collection in dv.search("research").collections:
print(collection.alias)
# You can also store the results first
results = dv.search("research")
# Iterate over all collections in search results
for collection in results.collections:
print(f"{collection.metadata.name}: {collection.identifier}")
# Access a specific collection by identifier (alias)
collection = results.collections["harvard"]

Like datasets, the collections property returns a CollectionView object with lazy-loading and caching behavior. Collections are fetched using their identifier or alias from the search results.

The following example demonstrates searching at the Dataverse level and within a specific collection:

from pyDataverse import Dataverse
# Initialize connection
dv = Dataverse(base_url="https://demo.dataverse.org")
# Search across the entire Dataverse installation
# Iterate through datasets found in search results
for dataset in dv.search("climate change").datasets:
print(dataset.title)
# Iterate through collections found in search results
for collection in dv.search("climate change").collections:
print(collection.alias)
# Search within a specific collection
simtech = dv.collections["simtech"]
for dataset in simtech.search("climate change").datasets:
print(dataset.title)

You can also store search results and work with them in more detail:

from pyDataverse import Dataverse
# Initialize connection
dv = Dataverse(base_url="https://demo.dataverse.org")
# Perform a search and store results
results = dv.search("climate change")
print(f"Found {results._search_response.total_count} total results")
# Iterate through datasets
print("\nDatasets:")
for dataset in results.datasets:
print(f" - {dataset.title} ({dataset.identifier})")
# Iterate through collections
print("\nCollections:")
for collection in results.collections:
print(f" - {collection.metadata.name} ({collection.identifier})")
# Access a specific dataset by identifier
if results.datasets:
first_dataset = next(iter(results.datasets))
if first_dataset:
print(f"\nFirst dataset: {first_dataset.title}")
citation_block = first_dataset.metadata_blocks.get("citation")
if citation_block:
description = getattr(citation_block, "dsDescriptionValue", None)
if description:
print(f"Description: {description[0].value if description else 'N/A'}")
# Access a specific collection by identifier
if results.collections:
first_collection = next(iter(results.collections))
if first_collection:
print(f"\nFirst collection: {first_collection.metadata.name}")
print(f"Description: {first_collection.metadata.description or 'N/A'}")

These examples demonstrate the typical workflow: performing a search, browsing datasets and collections found in the results, and accessing specific items by identifier. Search results provide a convenient way to discover and access content across a Dataverse installation or within a specific collection.

Search results differ from direct Collection access in several important ways:

  • Read-only: Search results are read-only. You cannot call update_metadata() on a SearchResult object.
  • Identifier-only: Search results only contain identifiers (DOIs for datasets, aliases for collections), not database IDs. All access uses identifier-based lookups.
  • Snapshot: Search results represent a snapshot at the time of the search. The results don’t automatically update if content changes.
  • Lazy loading: Items are fetched lazily when accessed, optimizing for performance when you only need specific items.
  • Dataverse - Factory class that provides the search() method
  • Collection - Represents a Dataverse collection with direct access to its content
  • Dataset - Represents a Dataverse dataset that can be accessed from search results
  • Search API - Low-level search API for advanced search options