← statichum.studio

Self-Hostable Bookmark-and-Full-Page-Archiver That Captures Reddit Threads Before They Vanish Behind the 2026 Paywall

dev tool weekend hack •• multiple requests

Reddit confirmed paywalled subreddits are coming this year (CEO Steve Huffman, late 2025) and admins keep tightening API and search access. Self-hosters who use bookmark-everything tools (Karakeep, Linkwarden, Wallabag) are running into the same wall: snapshotting a Reddit thread today returns 'just a small blurb' or an empty shell because Reddit's mobile-web layout strips comment trees behind a 'see more' button. Demand is for a self-hosted archiver that uses a real-browser engine (Playwright/Chromium) plus Reddit-specific tree expansion, captures the full comment tree to a single static HTML, and can replay archived threads when the original goes paywall-locked or 404.

builder note

The unsexy play is being a Karakeep plugin, not a competing app. Ship a 'site adapter pack' (Reddit, Twitter, Substack, Hacker News) that drops into Karakeep/Linkwarden via their plugin or sidecar API. Adapter packs as a recurring product. Open-source the engine, charge for the maintained adapter set as a $3/mo signal that pays for the headless-Chromium upkeep.

landscape (4 existing solutions)

Generic web archiving tools are getting outflanked by site-specific anti-archiving techniques (Reddit's lazy-loaded comments, Twitter's auth-walling, Substack's truncation). A self-hostable archiver with site-specific extractors is a legitimate product gap.

Karakeep Uses monolith for snapshots which works on most pages, but Reddit's tree-collapsing JS defeats it. Open issue #739 has been parked since early April 2026.
ArchiveBox Pumps URLs through wget + chromium + youtube-dl. Reddit threads frequently come back as login-walled landing pages or empty bodies. No Reddit-specific extraction.
Linkwarden Same root cause: generic page snapshot. No comment-tree expansion. No deduplication if a thread gets re-archived after edits.
archive.today / Wayback Hosted, not self-hosted. Wayback skips JS-rendered content; archive.today rate-limits hard and is a single point of failure.

sources (3)

other https://github.com/karakeep-app/karakeep/issues/739 "Reddit full page archiving" 2026-04-08
other https://www.niemanlab.org/reading/reddit-will-soon-put-some-... "Reddit will soon put some subreddits behind a paywall" 2025-02-19
other https://www.removepaywall.com/ "search various internet archives, which do not require a login" 2026-04-01
self-hostedarchivingredditbookmarksanti-paywall