0.004     2026-06-11 14:24:19Z

 - Detect: content volume is now the master signal. Every signal derived from a
   page's VISIBLE rendered text -- js_required, the blocked body-phrase arm, and
   the captcha marker -- only fires on a THIN page. A content-rich page that
   merely mentions JavaScript ("enable JavaScript" footer), quotes a bot-wall
   phrase ("unusual traffic", "access denied"), or carries captcha-prompt wording
   is no longer discarded: once 500+ chars of markdown are in hand the scrape
   succeeded, and body words can never prove otherwise. STRUCTURAL fingerprints
   are unchanged and still fire regardless of size -- WAF tokens in the HTML
   markup (__cf_chl, datadome), a "Just a moment" / "Access denied" <title>, and
   redirects to a known WAF / captcha challenge endpoint. Removes the rich-page
   captcha-PROMPT heuristic (and $RE_CAPTCHA_PROMPT) added in 0.003: same
   word-on-a-rendered-page false-positive class

0.003     2026-06-11 02:22:02Z

 - Detect: captcha signal no longer false-positives on cookie-banner /
   privacy-policy mentions of reCAPTCHA. A content-rich page now only walls when
   a markdown captcha marker co-occurs with captcha-PROMPT language ("complete
   the captcha to continue", "I'm not a robot", "verify you are human", …); thin
   pages still wall on any marker (JS-rendered gate) and html-only markers on
   rich pages still never wall

 - Detect: signals now also flags WAF / bot-management gates that REDIRECT to a
   challenge URL instead of embedding a widget. When the page's final_url
   matches a known challenge endpoint, blocked is raised for Cloudflare /
   DataDome / PerimeterX (/cdn-cgi/challenge, __cf_chl, /challenge-platform/,
   datadome, geo.captcha-delivery.com, /px/captcha, perimeterx) and captcha for
   provider verification endpoints (google.com/recaptcha, /recaptcha/api,
   hcaptcha.com). Additive and OR-ed in; cosmetic redirects (http->https, www,
   trailing slash) and an absent final_url never trigger

0.002     2026-05-30 22:57:39Z

 - Result: expose response_headers (lowercased keys) from the origin HTTP fetch,
   round-trips through to_hash/from_hash JSON persistence; empty hash default

0.001     2026-05-29 23:36:25Z

 - Initial release
 - WWW::Crawl4AI: Perl client and fallback orchestrator for Crawl4AI
 - WWW::Crawl4AI::Client: UA-agnostic REST client (/crawl, /md, /crawl/job,
   /crawl/job/{task_id}, /health) with request/parse/convenience flavours
 - WWW::Crawl4AI::Request: BrowserConfig/CrawlerRunConfig payload builder
 - Visible strategy chain: plain, browser, stealth, cloakbrowser (CDP), proxy,
   callback — escalated in cost/complexity order
 - CloakBrowser strategy: per-domain fingerprint seed is now a deterministic
   32-bit FNV-1a hash of the host (CloakBrowser requires a numeric seed and
   rejects raw host strings with HTTP 400)
 - WWW::Crawl4AI::Result with attempt history, signals, backend and cost_class
 - Result link accessors: urls (deduped, absolute, fragment-stripped),
   internal_links, external_links, links — no reaching into raw
 - deep_crawl: breadth-first crawl following each page's links through the full
   strategy chain (max_pages / max_depth / same_host / url_filter / on_page)
 - Single-URL action endpoints on the Client (and delegated from WWW::Crawl4AI):
   screenshot / pdf (raw bytes), html (preprocessed), execute_js (page +
   js_result), llm (LLM Q&A), token (JWT) — each with request/parse/convenience
   flavours like the rest
 - WWW::Crawl4AI::Detect: service detection + content-quality classification
   (js_required / blocked / captcha / thin_html)
 - WWW::Crawl4AI::Error structured error model (transport/api/job/content)
 - bin/www-crawl4ai-doctor and bin/www-crawl4ai-test-url
 - examples/docker-compose.yml (+ proxy escalation variant)
