Reddit’s API is effectively dead for archival. Third-party apps are gone. Reddit has threatened to cut off access to the Pushshift dataset multiple times. But 3.28TB of Reddit history exists as a torrent right now, and I built a tool to turn it into something you can browse on your own hardware.

The key point: This doesn’t touch Reddit’s servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.

What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.

API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.

Self-hosting options:

  • USB drive / local folder (just open the HTML files)
  • Home server on your LAN
  • Tor hidden service (2 commands, no port forwarding needed)
  • VPS with HTTPS
  • GitHub Pages for small archives

Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.

How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is “trust but verify” – it accelerates the boring parts but you still own the architecture.

Live demo: https://online-archives.github.io/redd-archiver-example/ GitHub: https://github.com/19-84/redd-archiver (Public Domain)

Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d88be8bed7b6afddd4

    • irmadlad@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      edit-2
      19 hours ago
      spoiler

      Maybe read where OP says ‘Yes I used AI, English is not my first language.’ Furthermore, are ethnic slurs really necessary here?

        • pixeltree@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 minutes ago

          Using ai to help normal everyday people cross language barriers is one of the few good ethical uses for it. I hate ai and it’s implications as mich as the next gal but this is clearly fine

        • El Barto@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          ·
          15 hours ago

          I disagree. I don’t like AI slop. But he’s using AI here in a way that is very much intended. I want to share something in Mandarin, I don’t know Mandarin. If only there was a way to transform my thoughts into Mandarin…

        • irmadlad@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          ·
          16 hours ago

          How many languages do you know fluently? I get that people have a definite opinion about AI. Like I told another Lemmy user, I have a definite opinion about the ‘arr’ stack which conservatively, 75% of selfhosters run. However, you don’t hear me out here beating my tin pan at the very mention of the ‘arr’ stack. Why? Because I assume you all are autonomous adults, capable of making your own decisions. Secondly, wouldn’t that get a bit tedious and annoying over time? If you don’t like AI, don’t use it ffs. Why castigate individuals who use AI? What does that do? I would really like to know what denigrating and browbeating users who use AI accomplishes.

          • euAppleHater@feddit.org
            link
            fedilink
            English
            arrow-up
            2
            ·
            25 minutes ago

            Wait, do you have an issue with piracy in general or an issue with the arr attack specifically? No judgement or interest in argument, just genuinely curious. Feel free to dm if you don’t want to start a whole thing, or beat your tin pan as you said, in an unrelated post.