I have different technical questions I want to ask. I went to Stack Overflow and they blocked registration with a VPN. It’s really fucking annoying. I can buy a residential IP to bypass this, but I’d rather just not use these enshitified platforms that are so hostile to VPNs.

Is there any decent alternative to Stack Overflow? I have tried getting AI answers to the technical question but they are not good.

And no, I can’t just create a github ID using VPN to login, they block github logins based on IP also.

    • someone@lemmy.todayOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      edit-2
      1 day ago

      that’s not for TrOCR, it’s just for OCR, which may not work for handwriting

      I did try some of the GPT steps:

      pip install --upgrade transformers pillow pdf2image
      
      

      getting some errors:

      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 3/4 [transformers]  WARNING: The scripts transformers and transformers-cli are installed in '/home/user/.local/bin' which is not on PATH.
        Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
      ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
      mistral-common 1.5.2 requires pillow<11.0.0,>=10.3.0, but you have pillow 12.1.0 which is incompatible.
      moviepy 2.1.2 requires pillow<11.0,>=9.2.0, but you have pillow 12.1.0 which is incompatible.
      
      
      

      this is what GPT said to run, but it makes no sense because I don’t have TrOCR even downloaded or running at all.

      Install packages: pip install --upgrade transformers pillow pdf2image
      Ensure poppler is installed:
      
      Ubuntu/Debian: sudo apt install -y poppler-utils
      macOS: brew install poppler
      
      Execute: python3 trocr_pdf.py input.pdf output.txt
      

      That’s the script to save and run.

      #!/usr/bin/env python3
      import sys
      from pdf2image import convert_from_path
      from PIL import Image
      import torch
      from transformers import TrOCRProcessor, VisionEncoderDecoderModel
      
      def main(pdf_path, out_path="output.txt", dpi=300):
          device = "cuda" if torch.cuda.is_available() else "cpu"
          model_name = "microsoft/trocr-base-handwritten"
          processor = TrOCRProcessor.from_pretrained(model_name)
          model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)
      
          pages = convert_from_path(pdf_path, dpi=dpi)
          results = []
          for i, page in enumerate(pages, 1):
              page = page.convert("RGB")
              # downscale if very large to avoid OOM
              max_dim = 1600
              if max(page.width, page.height) > max_dim:
                  scale = max_dim / max(page.width, page.height)
                  page = page.resize((int(page.width*scale), int(page.height*scale)), Image.Resampling.LANCZOS)
      
              pixel_values = processor(images=page, return_tensors="pt").pixel_values.to(device)
              generated_ids = model.generate(pixel_values, max_length=512)
              text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
              results.append(f"--- Page {i} ---\n{text.strip()}\n")
      
          with open(out_path, "w", encoding="utf-8") as f:
              f.write("\n".join(results))
          print(f"Saved OCR text to {out_path}")
      
      if __name__ == "__main__":
          if len(sys.argv) < 2:
              print("Usage: python3 trocr_pdf.py input.pdf [output.txt]")
              sys.exit(1)
          pdf_path = sys.argv[1]
          out_path = sys.argv[2] if len(sys.argv) > 2 else "output.txt"
          main(pdf_path, out_path)
      
      
      • jdr8@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        1 day ago

        Ok so from the error, you have a version of pillow that is incompatible.

        You have to downgrade pillow to version 11.

        That’s the first step.

        EDIT: Sorry just saw the rest of your comment. Do you really have to use that tech?

        You have other alternatives. Amazon AWS has a service for handwriting ocr, can’t remember the name though.

        You can also have a look at this, but it’s paid: https://www.handwritingocr.com/

        More ocr alternatives: https://github.com/michaelben/OCR-handwriting-recognition-libraries

        +1 for tesseract. I knew about this one a while ago. It may not recognise all handwriting, but you can train it to get better at it.

        • someone@lemmy.todayOP
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          edit-2
          1 day ago

          I don’t trust big tech to not extract data and metadata and save it. Many companies get served with government requests to save data and keep it secret. Even if handwritingocr.com doesn’t have such an agreement, it could run on AWS and that has an agreement. I would much rather do this locally. Some of the writings are confidential. Handwritingocr.com says data is encrypted in transit and at rest, but it’s not open source and even if it were I can’t verify the server code.

          also Tesseract is CPU only, right? It will be so slow.

          • jdr8@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            Fair point.

            So what about Tensor flow and some local LLM to do the job?

            You just need to find a reliable LLM in HuggingFace, for example.

            • someone@lemmy.todayOP
              link
              fedilink
              arrow-up
              1
              ·
              6 hours ago

              That’s exactly what I am trying to do, I’m just not that sure how to do it. I have the hardware needed, I just need to set up a docker with PyTorch and then find a way to set up Gradio inside that and then add TrOCR from hugging face, and then I’m good. I just am not totally sure how to do that and it seems hard, and when I ask AI for advice, it often is like “just run the following” and it’s wrong, and I’m not skilled enough to know why.

      • someone@lemmy.todayOP
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        1 day ago

        Terminal error after running GPT code:

        
        
        python3 trocr_pdf.py small.pdf output.txt
        Traceback (most recent call last):
          File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 479, in cached_files
            hf_hub_download(
          File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
            return fn(*args, **kwargs)
          File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1007, in hf_hub_download
            return _hf_hub_download_to_cache_dir(
          File "/home/user/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1124, in _hf_hub_download_to_cache_dir
            os.makedirs(os.path.dirname(blob_path), exist_ok=True)
          File "/usr/lib/python3.10/os.py", line 215, in makedirs
            makedirs(head, exist_ok=exist_ok)
          File "/usr/lib/python3.10/os.py", line 225, in makedirs
            mkdir(name, mode)
        PermissionError: [Errno 13] Permission denied: '/home/user/.cache/huggingface/hub/models--microsoft--trocr-base-handwritten'
        
        The above exception was the direct cause of the following exception:
        
        Traceback (most recent call last):
          File "/home/user/Documents/trocr_pdf.py", line 39, in <module>
            main(pdf_path, out_path)
          File "/home/user/Documents/trocr_pdf.py", line 11, in main
            processor = TrOCRProcessor.from_pretrained(model_name)
          File "/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
            args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
          File "/home/user/.local/lib/python3.10/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
            args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
          File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 489, in from_pretrained
            raise initial_exception
          File "/home/user/.local/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 476, in from_pretrained
            config_dict, _ = ImageProcessingMixin.get_image_processor_dict(
          File "/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py", line 333, in get_image_processor_dict
            resolved_image_processor_files = [
          File "/home/user/.local/lib/python3.10/site-packages/transformers/image_processing_base.py", line 337, in <listcomp>
            resolved_file := cached_file(
          File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 322, in cached_file
            file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
          File "/home/user/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 524, in cached_files
            raise OSError(
        OSError: PermissionError at /home/user/.cache/huggingface/hub/models--microsoft--trocr-base-handwritten when downloading microsoft/trocr-base-handwritten. Check cache directory permissions. Common causes: 1) another user is downloading the same model (please wait); 2) a previous download was canceled and the lock file needs manual removal.
        

        LLMs are so bad at code sometimes. This happens all the time time with LLMs and code for me, the code is unusable and it saves no time because it’s a rabbit hole leading to nowhere.

        I also don’t know if this is the right approach to the problem. Any sort of GUI interface would be easier. This is also hundreds of pages of handwritten stuff I want to change to text.

        • pinball_wizard@lemmy.zip
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          1 day ago

          This error looks like it is saying a previous attempt aborted, and it needs you to clean up some file that was only partly downloaded.

          Edit: The “please wait” makes me think I would try again in a couple hours.

          • someone@lemmy.todayOP
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            1 day ago

            So try again… in a couple of hours…

            Why would that make a difference? It’s a local model right?

            • pinball_wizard@lemmy.zip
              link
              fedilink
              arrow-up
              1
              ·
              1 day ago

              If it is local only, then waiting probably won’t help.

              Another thought for you: pip behaves much better inside a virtual environment - using the Python venv module, or uv.

              The instructions you have shared so far look more compatible with venv.

              • someone@lemmy.todayOP
                link
                fedilink
                arrow-up
                1
                ·
                2 hours ago

                I don’t understand what venv is or why this would work better. Will this make the compatibility issues go away? I could also just create a virtual Ubuntu environment that’s fresh if that would be easier and try to give that environment access to my GPU but I don’t know if that would work.

                • pinball_wizard@lemmy.zip
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  1 hour ago

                  I don’t understand what venv is or why this would work better.

                  Venv is a Python module that helps isolate sets of Python modules from the system installed Python version.

                  Will this make the compatibility issues go away?

                  No guarantees, but it often does.

                  I could also just create a virtual Ubuntu environment that’s fresh if that would be easier and try to give that environment access to my GPU but I don’t know if that would work.

                  I’m not sure either. But you’ve got the idea. Python packages install better when they’re allowed to exist separately from the underlying operating system.