@cyanheads/gutenberg-mcp-server

Name: gutenberg mcp server
Author: cyanheads

Search, browse, and read 75,000+ public-domain books from Project Gutenberg with full plain-text retrieval and offset/limit chunking via MCP. STDIO or Streamable HTTP.

4 Tools

Public Hosted Server: https://gutenberg.caseyjhand.com/mcp

Tools

Four tools for searching and reading Project Gutenberg's public-domain library:

Tool	Description
`gutenberg_search_books`	Search the Gutenberg catalog by title, author, topic, language, or author lifespan — returns popularity-ordered results with IDs ready for follow-up calls
`gutenberg_get_book`	Fetch complete metadata for a book by ID — full formats map, translators, editors, subjects, bookshelves, copyright status, and the `has_plain_text` flag
`gutenberg_get_text`	Retrieve the plain-text content of a book, stripped of license boilerplate, with offset/limit chunking for context-budget management
`gutenberg_browse_popular`	Browse the most-downloaded books, optionally filtered by language or topic — useful as a discovery entry point

`gutenberg_search_books`

Search the Project Gutenberg catalog of 78,000+ public-domain books.

Full-text search against titles and author names (space-separated words, case-insensitive)
Topic filter matches subject headings and bookshelf categories
Language filter by ISO 639-1 two-character codes (e.g., ["en"], ["fr", "de"])
Author lifespan range filter via author_year_start / author_year_end
Sort by popularity (download count), or by Gutenberg ID ascending/descending
Batch lookup by known ID list via ids parameter
Paginated — up to 32 books per page; use totalCount to determine total pages
Each result includes has_plain_text to indicate whether gutenberg_get_text will work

`gutenberg_get_book`

Fetch complete metadata for a single Project Gutenberg book.

Returns the full formats map (MIME type → download URL) including plain text, HTML, EPUB, and cover image
Includes translators and editors alongside authors, each with birth/death years
has_plain_text flag confirms whether a UTF-8 or ASCII plain-text format is available
media_type distinguishes readable text books from audio recordings
Use this before gutenberg_get_text to confirm text availability and inspect the formats map

`gutenberg_get_text`

Retrieve the plain-text content of a Project Gutenberg book, stripped of license boilerplate.

Strips the standard Gutenberg license header and footer — response contains only the literary work
Offset/limit chunking for long works: novels routinely run 500 KB–2 MB; read in manageable chunks without loading the whole file
Response includes totalChars, offset, length, and remainingChars for precise pagination
Paragraph-boundary trimming: actual returned length may be slightly less than limit — use length (not limit) to compute the next offset
Prefers UTF-8 plain text; falls back to ASCII plain text; converts HTML as a last resort
Refuses audio books (media_type "Sound") with a clear recovery hint
provenance field carries the Gutenberg ID, title, and license URL for attribution

`gutenberg_browse_popular`

Browse the most-downloaded Project Gutenberg books.

Returns up to 32 titles ordered by download count (most popular first)
Optionally filter by language (ISO 639-1 codes) and/or topic keyword
Useful as a discovery entry point: "what are the most popular classics in French?"
totalInCatalog provides full context — "top 20 of 60,000"

Features

Built on @cyanheads/mcp-ts-core:

Declarative tool definitions — single file per tool, framework handles registration and validation
Unified error handling — handlers throw, framework catches, classifies, and formats with recovery hints
Pluggable auth: none, jwt, oauth
Swappable storage backends: in-memory, filesystem, Supabase, Cloudflare KV/R2/D1
Structured logging with optional OpenTelemetry tracing
STDIO and Streamable HTTP transports

Project Gutenberg integration:

Catalog search and metadata via Gutendex — an unofficial but stable JSON API over the Gutenberg dataset
Full plain-text retrieval directly from Project Gutenberg file servers with transparent UTF-8/ASCII/HTML fallback chain
In-session text caching: book text is fetched once per session and served from cache for subsequent chunk reads
No API key required — Project Gutenberg data is freely available; no registration needed

Agent-friendly output:

has_plain_text flag on every search/browse result so agents can pre-filter before attempting text retrieval
Precise chunking contract: offset, length, totalChars, remainingChars, hasMore on every gutenberg_get_text response for reliable sequential reads
provenance field on every text response for attribution
Discriminated sourceFormat field (text/plain; charset=utf-8, text/plain; charset=us-ascii, text/html) so agents know the fidelity of the text

Getting started

Public Hosted Instance

A public instance is available at https://gutenberg.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "streamable-http",
      "url": "https://gutenberg.caseyjhand.com/mcp"
    }
  }
}

Self-Hosted / Local

No API key required. Add the following to your MCP client configuration file:

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/gutenberg-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with npx (no Bun required):

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cyanheads/gutenberg-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with Docker:

{
  "mcpServers": {
    "gutenberg-mcp-server": {
      "type": "stdio",
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "MCP_TRANSPORT_TYPE=stdio",
        "ghcr.io/cyanheads/gutenberg-mcp-server:latest"
      ]
    }
  }
}

For Streamable HTTP, set the transport and start the server:

MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp

Prerequisites

Bun v1.3.11 or higher (or Node.js v24+).
No API key required — Project Gutenberg data is freely available.

Installation

Clone the repository:

git clone https://github.com/cyanheads/gutenberg-mcp-server.git

Navigate into the directory:

cd gutenberg-mcp-server

Install dependencies:

bun install

Configure environment:

cp .env.example .env
# edit .env if you need to override any defaults

Configuration

Variable	Description	Default
`GUTENDEX_BASE_URL`	Base URL for the Gutendex catalog API. Override for self-hosted instances.	`https://gutendex.com/books/`
`GUTENBERG_TEXT_BASE_URL`	Base URL for Project Gutenberg file servers. Override for mirrors.	`https://www.gutenberg.org`
`MCP_TRANSPORT_TYPE`	Transport: `stdio` or `http`.	`stdio`
`MCP_HTTP_PORT`	Port for HTTP server.	`3010`
`MCP_AUTH_MODE`	Auth mode: `none`, `jwt`, or `oauth`.	`none`
`MCP_LOG_LEVEL`	Log level (RFC 5424).	`info`
`LOGS_DIR`	Directory for log files (Node.js only).	`<project-root>/logs`
`STORAGE_PROVIDER_TYPE`	Storage backend.	`in-memory`
`OTEL_ENABLED`	Enable OpenTelemetry instrumentation.	`false`

See .env.example for the full list of optional overrides.

Running the server

Local development

Build and run:

# One-time build
bun run rebuild

# Run the built server
bun run start:stdio
# or
bun run start:http

Run checks and tests:

bun run devcheck   # Lint, format, typecheck, security
bun run test       # Vitest test suite
bun run lint:mcp   # Validate MCP definitions against spec

Docker

docker build -t gutenberg-mcp-server .
docker run --rm -p 3010:3010 gutenberg-mcp-server

The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/gutenberg-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.

Project structure

Path	Purpose
`src/index.ts`	`createApp()` entry point — registers tools and inits services.
`src/config/server-config.ts`	Server-specific environment variable parsing (Gutendex and file-server URL overrides).
`src/mcp-server/tools/definitions/`	Tool definitions (`*.tool.ts`).
`src/services/gutendex/`	Gutendex catalog API client — search and book metadata.
`src/services/gutenberg-text/`	Full plain-text retrieval, boilerplate stripping, in-session caching, and chunking.
`tests/`	Unit and integration tests mirroring `src/`.

Development guide

See CLAUDE.md / AGENTS.md for development guidelines and architectural rules. The short version:

Handlers throw, framework catches — no try/catch in tool logic
Use ctx.log for request-scoped logging, ctx.state for tenant-scoped storage
Register new tools via the entry arrays in src/index.ts
Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields

Contributing

Issues and pull requests are welcome. Run checks and tests before submitting:

bun run devcheck
bun run test

License

Apache-2.0 — see LICENSE for details.

Data from Project Gutenberg is in the public domain. Catalog metadata sourced from Gutendex (MIT license).