Creating a Dataverse MCP Server
Setting up a Dataverse MCP server enables Large Language Models to explore and interact with your Dataverse repositories through natural language. This guide walks you through creating a server, understanding how it works internally, and configuring it for your specific needs.
How It Works Internally
Section titled “How It Works Internally”Before diving into the code, it’s helpful to understand what happens when you create a Dataverse MCP server:
- Connection Establishment: Your server connects to one or more Dataverse installations using the pyDataverse client
- Tool Registration: The
DataverseMCPclass examines your configuration and registers the appropriate tools with the MCP server - Middleware Setup: A middleware layer is added to manage Dataverse connections and make them available to all tools
- Context Injection: When tools are called, the Dataverse connection is automatically injected so tools can execute operations
- Response Formatting: Tool results are encoded in TOON format and returned to the LLM for interpretation
The key insight is that DataverseMCP adds functionality to an existing MCP server—it doesn’t create the server itself. You create a FastMCP server instance, then use DataverseMCP to enhance it with Dataverse-specific capabilities.
Basic Server Setup
Section titled “Basic Server Setup”Here’s a minimal example that creates an MCP server for a single Dataverse installation:
from fastmcp import FastMCPfrom pyDataverse import Dataversefrom pyDataverse.mcp import DataverseMCP
# Step 1: Create the MCP servermcp = FastMCP(name="My Dataverse Server")
# Step 2: Connect to your Dataverse installationdataverse = Dataverse( base_url="https://demo.dataverse.org", verbose=0 # Reduce logging noise)
# Step 3: Add Dataverse tools to the MCP serverDataverseMCP(dataverse=dataverse).to_mcp(mcp)
# Step 4: Run the serverif __name__ == "__main__": mcp.run()Let’s break down what each step does:
Step 1: Create a FastMCP instance. This is the standard MCP server that will handle communication with LLMs. The name identifies your server in logs and client connections.
Step 2: Create a Dataverse connection. This establishes the connection to your target installation. You can provide an api_token parameter if you need authenticated access for operations beyond reading public data.
Step 3: Create a DataverseMCP instance and call to_mcp(). This is where the magic happens—all Dataverse tools are registered with your MCP server. The to_mcp() method configures middleware, registers tools, and handles all the setup automatically.
Step 4: Run the server. This starts the MCP server and makes it available for LLM connections.
Server Configuration
Section titled “Server Configuration”The DataverseMCP class accepts an optional configuration object that controls which tools are available. This gives you fine-grained control over what operations LLMs can perform.
Using MCPConfiguration
Section titled “Using MCPConfiguration”The MCPConfiguration class defines four categories of tools, each corresponding to a level of the Dataverse hierarchy:
from fastmcp import FastMCPfrom pyDataverse import Dataversefrom pyDataverse.mcp import DataverseMCP, MCPConfiguration
# Create a custom configurationconfig = MCPConfiguration( dataverse=["metrics"], # Installation-level tools collection=["read", "graph"], # Collection-level tools dataset=["read"], # Dataset-level tools file=["read", "metadata"] # File-level tools)
# Set up the server with custom configurationmcp = FastMCP(name="Configured Dataverse Server")dataverse = Dataverse("https://demo.dataverse.org")DataverseMCP(dataverse=dataverse, config=config).to_mcp(mcp)
if __name__ == "__main__": mcp.run()Configuration Options
Section titled “Configuration Options”Each category accepts a list of capabilities to enable:
Dataverse Level (dataverse):
"metrics": Enable installation-wide statistics and metrics
Collection Level (collection):
"read": Enable collection metadata and content listing"graph": Enable knowledge graph summaries and SPARQL queries
Dataset Level (dataset):
"read": Enable dataset metadata retrieval and file listings
File Level (file):
"read": Enable reading file contents"metadata": Enable file schema and metadata retrieval
Default Configuration
Section titled “Default Configuration”If you don’t provide a configuration, DataverseMCP uses sensible defaults that enable comprehensive read-only access:
# These are the defaultsMCPConfiguration( dataverse=["metrics"], collection=["read", "graph"], dataset=["read"], file=["read", "metadata"])This configuration provides full exploration capabilities while preventing any modifications to your Dataverse repositories.
Multi-Dataverse Setup
Section titled “Multi-Dataverse Setup”You can configure a single MCP server to connect to multiple Dataverse installations simultaneously. This is useful for federated search, cross-repository analysis, or comparing content across institutions.
from fastmcp import FastMCPfrom pyDataverse import Dataversefrom pyDataverse.mcp import DataverseMCP
# Create the MCP servermcp = FastMCP(name="Multi-Dataverse Server")
# Connect to multiple Dataverse installationsdataverses = { "demo": Dataverse("https://demo.dataverse.org"), "harvard": Dataverse("https://dataverse.harvard.edu"), "darus": Dataverse("https://darus.uni-stuttgart.de"),}
# Add all installations to the MCP serverDataverseMCP(dataverse=dataverses).to_mcp(mcp)
if __name__ == "__main__": mcp.run()When you provide a dictionary of Dataverse instances, the server automatically:
- Makes all tools available for each installation
- Adds a
dataverse_nameparameter to search tools - Lists available installation names in tool descriptions
- Routes requests to the correct installation based on LLM decisions
The LLM can then direct questions to specific installations: “Search demo Dataverse for climate data” or “Compare metrics between Harvard and DaRUS”.
Authenticated Access
Section titled “Authenticated Access”For operations that require authentication (such as accessing restricted datasets or private collections), provide an API token when creating your Dataverse connection:
from fastmcp import FastMCPfrom pyDataverse import Dataversefrom pyDataverse.mcp import DataverseMCP
mcp = FastMCP(name="Authenticated Dataverse Server")
# Connect with authenticationdataverse = Dataverse( base_url="https://your-dataverse.edu", api_token="your-api-token-here")
DataverseMCP(dataverse=dataverse).to_mcp(mcp)
if __name__ == "__main__": mcp.run()Real-World Example
Section titled “Real-World Example”Here’s a practical example that sets up an MCP server for DaRUS (Data Repository of the University of Stuttgart):
from fastmcp import FastMCPfrom pyDataverse import Dataversefrom pyDataverse.mcp import DataverseMCP, MCPConfiguration
# Create the MCP serverdarus_mcp = FastMCP(name="DaRUS Dataverse MCP")
# Connect to DaRUSdarus = Dataverse( base_url="https://darus.uni-stuttgart.de", verbose=0 # Quiet logging)
# Configure with default read-only access# (We could customize the config here if needed)DataverseMCP(dataverse=darus).to_mcp(darus_mcp)
# Run the serverif __name__ == "__main__": darus_mcp.run()Save this as server.py and run it with:
python server.pyYour MCP server is now running and ready to accept connections from LLM clients!
Connecting from LLM Clients
Section titled “Connecting from LLM Clients”Once your server is running, you can connect to it from various LLM clients:
Claude Desktop
Section titled “Claude Desktop”Add to your Claude configuration file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{ "mcpServers": { "dataverse": { "command": "python", "args": ["/path/to/your/server.py"] } }}Other MCP Clients
Section titled “Other MCP Clients”Any MCP-compatible client can connect to your server. Refer to the client’s documentation for configuration details. The server uses standard MCP protocol, so it should work with any compliant client.
Adding Custom Tools
Section titled “Adding Custom Tools”The DataverseMCP class adds Dataverse-specific tools to your MCP server, but you can also add your own custom tools to the same server instance:
from fastmcp import FastMCPfrom pyDataverse import Dataversefrom pyDataverse.mcp import DataverseMCP
# Create the servermcp = FastMCP(name="Extended Dataverse Server")
# Add Dataverse toolsdataverse = Dataverse("https://demo.dataverse.org")DataverseMCP(dataverse=dataverse).to_mcp(mcp)
# Add your custom tools@mcp.tool()def custom_analysis(dataset_id: str) -> str: """Perform custom analysis on a dataset.""" # Your custom logic here return f"Analysis results for {dataset_id}"
# Run the serverif __name__ == "__main__": mcp.run()The LLM will now have access to both the standard Dataverse tools and your custom tools, enabling powerful hybrid workflows.
Best Practices
Section titled “Best Practices”When setting up your Dataverse MCP server:
-
Start with defaults: Use the default configuration first, then customize only if you have specific security or functionality requirements
-
Use environment variables: Store sensitive information like API tokens in environment variables rather than hardcoding them
-
Name your servers clearly: Choose descriptive names that indicate which Dataverse installation they connect to
-
Document your configuration: If you customize the configuration, document which tools you’ve enabled and why
-
Test with public data first: Validate your setup with public Dataverse installations before connecting to production repositories
-
Monitor usage: Keep an eye on server logs to understand how the LLM is using your tools
-
Version control: Keep your server configuration in version control (excluding tokens and secrets)
Troubleshooting
Section titled “Troubleshooting”Server won’t start: Check that all dependencies are installed (pip install pyDataverse[mcp]) and that your Python version is compatible (3.8+).
Can’t connect to Dataverse: Verify the base URL is correct and that the Dataverse installation is accessible from your network. Test the connection manually using a browser.
Tools not appearing: Verify that your configuration includes the tool categories you need. Remember that tools must be explicitly enabled in the configuration.
Authentication errors: Ensure your API token is valid and has the necessary permissions. Test the token using the pyDataverse API directly before setting up the MCP server.
Performance issues: For large repositories, consider enabling only the tools you need. Knowledge graph queries and full metadata retrieval can be expensive—use them judiciously.