WebScrape MCP Server
English · Español
English
MCP server that lets AI agents search the web and extract clean Markdown content — no ads, no clutter, just the text your LLM needs.
Tools
| Tool | Description |
|---|---|
webscrape_search | Search the web (DuckDuckGo) and scrape results into Markdown |
webscrape_fetch_url | Fetch a single URL and return clean Markdown. Supports use_readability and auto-detects PDFs |
webscrape_batch_fetch | Fetch up to 5 URLs in parallel. Supports PDF auto-detection |
Features
-
PDF support: URLs ending in
.pdfor withapplication/pdfcontent-type are auto-detected and text is extracted page by page -
Readability mode: Pass
use_readability=Truetowebscrape_fetch_urlfor cleaner article extraction using Mozilla Readability (removes nav, sidebars, ads, comments) -
DuckDuckGo search: No API key required, just a search query
-
Built-in cache: 200-entry cache with automatic eviction for repeated URLs
-
Batch fetching: Up to 5 URLs in parallel
How to use
Option 1 — MCPize (recommended)
- Go to https://mcpize.com/marketplace
- Search Web Scrape and click Start Free
- You'll get an API key
- Configure in your AI client:
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape.mcpize.run",
"headers": {
"Authorization": "Bearer your-api-key"
}
}
}
}
Option 2 — Render (dev)
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape-mcp.onrender.com"
}
}
}
Option 3 — Local
git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
python webscrape_mcp.py
Official Registry
io.github.carrasquelalex1/webscrape-mcp
Dependencies
mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF
License
MIT
Español
Servidor MCP que permite a agentes de IA buscar en la web y extraer contenido limpio en Markdown — sin anuncios, sin navegación, solo el texto que tu LLM necesita.
Tools
| Tool | Descripción |
|---|---|
webscrape_search | Busca en la web (DuckDuckGo) y extrae los resultados a Markdown |
webscrape_fetch_url | Obtiene una URL y la convierte a Markdown limpio. Soporta use_readability y detecta PDFs automáticamente |
webscrape_batch_fetch | Obtiene hasta 5 URLs en paralelo. Soporta detección automática de PDFs |
Características
-
Soporte PDF: URLs que terminan en
.pdfo con content-typeapplication/pdfse detectan automáticamente y se extrae el texto página por página -
Modo Readability: Usá
use_readability=Trueenwebscrape_fetch_urlpara extraer artículos de forma más limpia (elimina navegación, barras laterales, anuncios, comentarios) -
Búsqueda DuckDuckGo: Sin necesidad de API key
-
Caché integrada: 200 entradas con evicción automática para URLs repetidas
-
Batch fetching: Hasta 5 URLs en paralelo
Cómo usarlo
Opción 1 — MCPize (recomendada)
- Ve a https://mcpize.com/marketplace
- Busca Web Scrape y haz clic en Start Free
- Obtendrás una API key
- Configura en tu cliente de IA:
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape.mcpize.run",
"headers": {
"Authorization": "Bearer tu-api-key"
}
}
}
}
Opción 2 — Render (desarrollo)
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape-mcp.onrender.com"
}
}
}
Opción 3 — Local
git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
python webscrape_mcp.py
Registro Oficial
io.github.carrasquelalex1/webscrape-mcp
Dependencias
mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF, playwright
Licencia
MIT