[Bug] Python SDK Not Catching `All scraping methods failed` Error #851

brian-carnot · 2024-10-31T06:11:42Z

Describe the Bug
When this issue occurs from scraping a page failing, instead of throwing this exception, it seems that content is just empty instead.

To Reproduce
Steps to reproduce the issue:

Wait for a website to raise exception All scraping methods failed for url: on the dashboard
View the return result from the .scrape_url(url) method
Example: {'content': '', 'markdown': '', 'linksOnPage': [], 'metadata': {'sourceURL': 'https://ycombinator.com/people', 'pageStatusCode': 200}}

Expected Behavior
An exception should be thrown by the scrape_url method instead of returning empty content.

Screenshots
If applicable, add screenshots or copies of the command line output to help explain the issue.

Environment (please complete the following information):

OS: Linux (python:3.12.3-bookworm image)
Firecrawl Version: ^0.0.20 Python SDK
Node.js Version: v23.1.0

Logs

{
    "url": "https://ycombinator.com/people",
    "type": "scrape",
    "method": "fetch",
    "result": {
        "error": null,
        "success": false,
        "time_taken": 591,
        "response_code": 200,
        "response_size": 55917
    },
    "createdAt": "2024-10-31T05:50:34.653524+00:00"
}

{
    "type": "error",
    "stack": "Error: All scraping methods failed for URL: https://ycombinator.com/people\n    at scrapSingleUrl (/app/dist/src/scraper/WebScraper/single_url.js:378:19)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /app/dist/src/scraper/WebScraper/index.js:66:32\n    at async Promise.all (index 0)\n    at async WebScraperDataProvider.convertUrlsToDocuments (/app/dist/src/scraper/WebScraper/index.js:64:13)\n    at async Promise.all (index 0)\n    at async WebScraperDataProvider.processLinks (/app/dist/src/scraper/WebScraper/index.js:208:40)\n    at async WebScraperDataProvider.handleSingleUrlsMode (/app/dist/src/scraper/WebScraper/index.js:174:25)\n    at async runWebScraper (/app/dist/src/main/runWebScraper.js:77:23)\n    at async startWebScraperPipeline (/app/dist/src/main/runWebScraper.js:13:13)\n    at async processJob (/app/dist/src/services/queue-worker.js:236:44)\n    at async processJobInternal (/app/dist/src/services/queue-worker.js:72:24)\n    at async /app/dist/src/services/queue-worker.js:174:39\n    at async /app/dist/src/services/queue-worker.js:161:25",
    "message": "All scraping methods failed for URL: https://ycombinator.com/people",
    "createdAt": "2024-10-31T05:50:35.242789+00:00"
}

Additional Context
Add any other context about the problem here, such as configuration specifics, network conditions, data volumes, etc.

Repeating the error is inconsistent and I am not reaching a rate limit for my api key.

The text was updated successfully, but these errors were encountered:

brian-carnot added the bug Something isn't working label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Python SDK Not Catching `All scraping methods failed` Error #851

[Bug] Python SDK Not Catching `All scraping methods failed` Error #851

brian-carnot commented Oct 31, 2024

[Bug] Python SDK Not Catching All scraping methods failed Error #851

[Bug] Python SDK Not Catching All scraping methods failed Error #851

Comments

brian-carnot commented Oct 31, 2024

[Bug] Python SDK Not Catching `All scraping methods failed` Error #851

[Bug] Python SDK Not Catching `All scraping methods failed` Error #851