Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Full page OCR processing tools?

 
post Feb 7 2021, 05:35
Post #1
innyinny



Newcomer
**
Group: Members
Posts: 81
Joined: 31-December 20


Does anyone know of any downloadable tools that can run on a full page, find all of the characters, and run OCR and export them to a text file?

i found this website: [www.ocrconvert.com] https://www.ocrconvert.com/japanese-ocr

it worked.. perfectly on some test files, very impress

but um.. ya know for converting this kind of content i don't know i want to rely on an online tool, and im not sure if ill run into any limits on free use for whole mangas

anyone know if there are similar tools that you can download locally? ive tried kanjitomo and it is great, but it only does a tiny section you pick at a time, which is mostly helpful enough so far but would be handy to just scan the whole thing in instants
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 7 2021, 06:48
Post #2
innyinny



Newcomer
**
Group: Members
Posts: 81
Joined: 31-December 20


i investigated tesseract, and it seems to 'work' but it misses a TON of lines..

i also tried to use nhocr but i can't get it to build..
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 7 2021, 08:36
Post #3
innyinny



Newcomer
**
Group: Members
Posts: 81
Joined: 31-December 20


after much handholding i did finally manage to make tesseract perform usefully.. its not ideal though

might still save some time and trouble in the long run
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 8 2021, 07:00
Post #4
프레이



Only one group, up to homomorphism
*****
Group: Gold Star Club
Posts: 691
Joined: 21-October 17
Level 500 (Ponyslayer)


Offline libraries like Tessaract suck ass.
Your best bet will be using either Microsoft's or Google's online APIs. Last I checked, both were free with rate limits that were more than sufficient for personal use (I've scanned ~1k pages in a day without issue).
I work mostly on Korean stuff but I imagine they'd perform mostly the same for JP stuff. In particular, Google's thingy is noticeably more accurate and better with spacing, but their API is dogshit (hard to install / hard to navigate docs / clunky and barebone response).

----

I mostly use it for generating a script template like this: [docs.google.com] https://docs.google.com/spreadsheets/d/1FRQ...tpOQ/edit#gid=0 (click top-left cell for the raws)

As for the python code I used to generate that... it's messy as hell so it's kind of a pain to share. But I don't mind sharing specific bits of it / answering any questions though.
Also heads-up that if you want the bubble texts sorted in reading order, you'll probably want do some kind of contour detection / ML model training for the panels. I was too lazy for that so just sorted them by bbox centers and manually corrected lol.

This post has been edited by 프레이: Feb 8 2021, 08:08
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Oct 7 2021, 18:25
Post #5
mspykez



Lurker
Group: Recruits
Posts: 4
Joined: 17-September 11
Level 12 (Novice)


There is a Windows app called [www.basiccat.org] ImageTrans that as far I understand and as shown on the demo videos they provide it not only OCR out all the RAW text from the page but it also allows you to replace it with a manual or machine translation. It apparently can do all that automatically with multiple files/images in a row. It isn't free thou, their site says it costs $10.99 for personal use.

Also there's a free tool being developed by the same team behind the [www.reddit.com] Sugoi Translator that kinda does the same but atm uses a web interface instead.
They have a discord that is listed somewhere there on the reddit thread with links to download. The manga tool is not the one that is showing on the screenshoots at reddit, they have different apps.





User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 26 2023, 00:55
Post #6
TyshawnVaughn



Lurker
Group: Lurkers
Posts: 1
Joined: 26-February 23


Hello there, folks. I need your help. I need to scan my ID ASAP, but I don't have a scanner nearby. Maybe there's any online service or app that could help me make it real? Sorry for disturbing you in such an old thread.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 26 2023, 01:30
Post #7
rinruririn



Casual Poster
****
Group: Gold Star Club
Posts: 353
Joined: 23-April 12
Level 414 (Dovahkiin)


QUOTE(TyshawnVaughn @ Feb 26 2023, 07:55) *

Hello there, folks. I need your help. I need to scan my ID ASAP, but I don't have a scanner nearby. Maybe there's any online service or app that could help me make it real? Sorry for disturbing you in such an old thread.


How big is this ID document? If you can take a good clear picture of it with your mobile phone camera that might be acceptable for whoever needs to see it.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Feb 27 2023, 02:18
Post #8
MarisaEwing



Lurker
Group: Lurkers
Posts: 1
Joined: 14-February 23


Hey, man! It's OK, never mind. If you need to scan your ID but don't have a scanner, you can use a document scanning and OCR service like Smart Engines. Just go to [smartengines.com] https://smartengines.com/ and check out their app. It's got full page OCR processing tools that can handle everything from IDs to passports. Plus, it's super user-friendly and should get the job done in no time. So don't stress, my dude. LMK if you have any other questions left.

This post has been edited by MarisaEwing: Feb 27 2023, 02:18
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Mar 24 2023, 02:49
Post #9
MistaLOD



Lurker
Group: Lurkers
Posts: 1
Joined: 14-July 16
Level 12 (Novice)


I usually use Capture2Text for my OCR then go through it with Google Translate to check that the Kanji are correct.

Not sure if it's really good for your use case, but I figured I might as well add my two cents.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Mar 24 2023, 06:18
Post #10
castle17



Casual Poster
***
Group: Members
Posts: 157
Joined: 9-October 22
Level 69 (Master)


Not a full page OCR tool, but you can use Poricom. It works with messy backgrounds too.
It gets everything right most of the time, 99% of the time.

[github.com] https://github.com/blueaxis/Poricom

User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Apr 3 2023, 07:27
Post #11
innyinny



Newcomer
**
Group: Members
Posts: 81
Joined: 31-December 20


wow some pretty incredible new tools have come out since i asked this..

thanks for the heads up
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post May 21 2023, 04:52
Post #12
Shevane12



Lurker
Group: Lurkers
Posts: 1
Joined: 28-December 22


use mokuro you can just take the text out of the html file they generate for the manga volume/chapter/page or whatever

you can ocr a billion and 14 pages in one go if you want but it takes while on my shit pc like a hour or a few for 20ish pages but idrk i dont track all i know its that its not fast

https ://github.com/kha-white/mokuro

took me a day to set it up cuz i barely know how to code but i was just dumb dont think its not working
you can probably find some tutorials somewhere or ask someone that knows how to code if you dont


User is offlineProfile CardPM
Go to the top of the page
+Quote Post

 
post Jun 3 2023, 07:17
Post #13
MisterJ167



Newcomer
**
Group: Members
Posts: 56
Joined: 17-February 14
Level 11 (Novice)


I tend to stick with a more manual OCR software: I prefer ABBYY, and often it's necessary to clean up the image so that the letters are not being blocked by background objects (see attached images as samples). I also use a program called KanjiTomo that can scan individual characters as you hover over them because sometimes ABBYY has issues even after clean up.

In cases like this, it's my experience that it's just best to do the work and not rely on automatic anything to read your text.

-JAttached Image Attached Image
User is offlineProfile CardPM
Go to the top of the page
+Quote Post


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 


Lo-Fi Version Time is now: 4th May 2024 - 00:39