Endpoint
Request Body
The URL of the website to scrape.
Output formats to include:
markdown- Clean markdown contenthtml- Raw HTML structurescreenshot- Page screenshot
Additional scraping options:
onlyMainContent: Boolean - Extract only main content (default: true)waitFor: Number - Wait time for dynamic content in ms (default: 2000)timeout: Number - Request timeout in ms (default: 30000)
Example Request
Response
Whether the scrape succeeded.
Scraped content:
title: Page titledescription: Meta descriptioncontent: Combined contentmarkdown: Markdown versionhtml: HTML versionmetadata: Page metadatascreenshot: Base64 screenshot (if requested)links: Extracted links
Success Response
Website Cloning Workflow
Using Scraped Content for Generation
Enhanced Scraping
For more detailed extraction, use the enhanced endpoint:- Better handling of SPAs
- CSS extraction
- Asset downloading
- Structure analysis
Screenshot Capture
For visual reference:Error Handling
Scrape Failed
Limitations
- Some websites block automated scraping
- JavaScript-heavy SPAs may not render completely
- Rate limits may apply
Best Practices
- Use
onlyMainContent- Excludes headers/footers for cleaner content - Increase
waitFor- For dynamic content, wait longer - Combine with screenshots - Visual reference improves generation
- Check the markdown - Verify important content was captured