• 1 Post
  • 22 Comments
Joined 5 months ago
cake
Cake day: January 25th, 2024

help-circle



  • Most open source tool have the same thing that it feels like it’s made by engineers. I think that’s because it’s true, most FOSS tools are made by engineers for engineers. Because most project start with someone needing something and then creating it and sharing it.

    Chances of a programmer needing something and then making it is a lot higher, than an artist needing it and then making it as then there’d be a need to have the necessary skills to make the software. As someone not from CS field I’ve seen how much of redundant programs are present for CS related tasks while barely some exists for other fields because the overlap of programmer and that field is low specifically FOSS programmers. And a few programmers that field would have don’t have the high level software development skills, so most open source tools made by them are “works on my machine, or works for this specific task” even though with less than 1% more effort they could have made a generalized tool.






  • For the OCR, have you tried tesseract? For printed documents it can take image input and generate a pdf with selectable text. I don’t OCR much but it has been useful when I tried a few times.

    You might be able to have a script that takes the scanner input into tesseract and output a pdf. It only works on a single image per run so I had to make script to run it on whole pdf by separating it and stitching it back together.


  • Someone already talked about the XY problem, so I’ll say this.

    Why sound notification instead of notification content? If your notification program (dunst in my case) have pattern matching or calling scripts based on patterns and the script has access to which app, notification title, contents etc. then it’s just about calling something in your bash script.

    And any time you wanna add that functionality to something else, add one more line with a different pattern or add a condition in your script. Comparing text is lot more reliable than audio.

    Of course your use case could be completely different, so maybe give some examples of use case so people can give you different ways to solve that instead of just the one you’re thinking of.






  • Hi there, I did say it’s easily doable, but I didn’t have a script because I run things based on the image before OCR manually (like the negating the dark mode I tried in this script; when doing manually it’s just one command as I know whether it’s dark mode of not myself; similar for the threshold as well).

    But here’s a one I made for you:

    #!/usr/bin/env bash
    
    # imagemagic has a cute little command for importing screen into a file
    import -colorspace gray /tmp/screenshot.png
    mogrify /tmp/screenshot.png -color-threshold "100-200"
    # extra magic to invert if the average pixel is dark
    details=`convert /tmp/screenshot.png -resize 1x1 txt:-`
    total=`echo $details | awk -F, '{print $4}'`
    value=`echo $details | awk '{print $7}'`
    darkness=$(( ${value#_(%_)} * 100 / $total ))
    if (( $darkness < 50 )); then
       mogrify -negate /tmp/screenshot.png
    fi
    
    # now run the OCR
    text=`tesseract /tmp/screenshot.png -`
    echo $text | xclip -selection c
    notify-send OCR-Screen "$text"
    

    So the middle part is to accommodate images in dark mode. It negates it based on the threshold that you can change. Without that, you can just have import for screen capture, tesseract for running OCR. and optionally pipe it to xclip for clipboard or notify-send for notification.

    In my use case, I have keybind to take a screenshot like this: import png:- | xclip -selection c -t image/png which gives me the cursor to select part of the screen and copies that to clipboard. I can save that as an image (through another bash script), or paste it directly to messenger applications. And when I need to do OCR, I just run tesseract in the terminal and copy the text from there.


  • Not for handwritten text, but for printed fonts, getting OCR is as easy as just making a box in screen with current technology. So I don’t think we need AI things for that.

    Personally I use tesseract. I have a simple bash script that when run let’s me select a rectangle in screen, save that image and run OCR in a temp folder and copy that text to clipboard. Done.

    Edit: for extra flavor you can also use notify-send to send that text over a notification so you know what the OCR produced without having to paste it.