有要转换为文本的PDF 文档或图像吗?(PDF document)最近,有人通过邮件向我发送了一份文件,我需要对其进行编辑并更正后发回。该人找不到数字副本,因此我的任务是将所有文本转换为数字格式。
我不可能花几个小时重新输入所有内容,所以我最终为文档拍了一张漂亮的高质量照片,然后通过一堆在线OCR服务来查看哪一个会给我最好的结果。
在本文中,我将介绍几个我最喜欢的免费OCR网站。值得注意的是,这些网站中的大多数都提供基本的免费服务,如果您想要更大的图像、多页PDF文档、不同的输入语言等额外功能,则可以选择付费选项。
事先知道这些服务中的大多数将无法匹配原始文档的格式也很好。这些主要用于提取文本,仅此而已。如果您需要所有内容都采用特定的布局或格式,则必须在从(layout or format)OCR获取所有文本后手动执行此操作。
此外,获取文本的最佳结果将来自分辨率为 200 到 400 DPI(DPI resolution)的文档。如果你有一个低DPI 图像(DPI image),结果不会那么好。
最后,我测试的很多网站都无法正常工作(t work)。如果您 Google 免费在线OCR,您会看到一堆网站,但前 10 个结果中的几个网站甚至没有完成转换。有些会超时,有些会出错,有些会卡在“转换”页面上,所以我什至懒得提那些网站。
对于每个站点,我测试了两个文档以查看输出的好坏。在我的测试中,我只是使用我的iPhone 5S拍摄了这两个文档的照片,然后将它们直接上传到网站进行转换。
如果您想查看我用于测试的图像是什么样的,我将它们附在此处:Test1和Test2。请注意,这些不是从手机拍摄的图像的全分辨率版本。上传到网站时,我使用了全分辨率图像。(resolution image)
在线OCR
OnlineOCR.net是一个干净简单的网站,在我的测试中提供了非常好的结果。我喜欢它的主要一点是它没有大量的广告,这通常是这类利基服务(niche service)网站的情况。
首先,选择您的文件并等待(file and wait till)它完成上传。此站点的最大上传大小为 100 MB。如果您注册一个免费帐户,您将获得一些额外的功能,例如更大的上传大小、多页PDF(PDFs)、不同的输入语言、每小时更多的转换等。
接下来,选择您的输入语言(input language),然后选择输出格式(output format)。您可以从Word、Excel或纯文本(Plain Text)中进行选择。单击(Click)“转换(Convert)”按钮,您将看到底部显示的文本以及下载链接(download link)。
如果您想要的只是文本,只需从框中复制并粘贴即可。(copy and paste)但是,我建议您下载Word 文档(Word document),因为它在保持原始文档的布局方面做得非常出色。
例如,当我打开Word 文档(Word document)进行第二次测试时,我惊讶地发现该文档包含一个包含三列的表格,就像在图像中一样。
在所有站点中,这是迄今为止最好的站点。如果您需要进行大量转换,则完全值得注册。
为了完整起见,我还将链接到每个服务创建的输出文件,以便您自己查看结果。以下是OnlineOCR的结果:Test1 Doc 和 Test2 Doc(Test1 Doc and Test2 Doc)。
请注意,在您的计算机上打开这些Word文档时,您会在Word中收到一条消息,说明它来自Internet,并且(Internet and editing)已禁用编辑。这完全没问题,因为Word 不(Word doesn)信任来自Internet的文档,而且如果您只想查看文档,您实际上不必启用编辑。
i2OCR
另一个给出相当好的结果的网站是i2OCR。该过程非常相似:选择您的语言、文件,然后按提取文本(Extract Text)。
您必须在这里等待一两分钟,因为此站点需要更长的时间。此外,在第 2 步(Step 2)中,确保您的图像在预览中正面朝上显示,否则您将得到一堆乱码作为输出。出于某种原因,我的 iPhone 上的图像在我的计算机上以纵向模式(portrait mode)显示,但当我上传到该站点时,它们是横向的。
我不得不在照片编辑应用程序中手动打开图像,将其旋转 90 度,然后将其旋转回纵向,然后再次保存。完成后,向下滚动,它(Once)会显示文本预览以及下载按钮(download button)。
该站点在第一个测试的输出中表现不错,但在具有列布局(column layout)的第二个测试中表现不佳。以下是 i2OCR 的结果:Test1 Doc 和 Test2 Doc(Test1 Doc and Test2 Doc)。
免费OCR
Free-OCR.com将获取您的图像并将其转换为纯文本(plain text)。它没有导出为Word 格式(Word format)的选项。选择您的文件,选择一种语言,然后单击开始(Start)。
该站点速度很快,您将很快获得输出。只需单击(Just click)链接即可将文本文件(text file)下载到您的计算机。
与下面提到的NewOCR 一样(NewOCR),该站点将文档中的所有 T 都大写。我不知道它为什么会这样做,但出于某种奇怪的原因,这个网站和 NewOCR(site and NewOCR)都这样做了。改变它并不是什么大不了的事,但这是一个你真的不应该做的乏味过程。
以下是FreeOCR的结果:Test1 Doc 和 Test2 Doc(Test1 Doc and Test2 Doc)。
ABBYY FineReader Online
要使用FineReader Online,您必须注册一个帐户,您可以免费试用 15 天OCR最多 10 页。如果您只需要对几个页面进行一次性OCR ,那么您可以使用此服务。(OCR)确保(Make)在注册后单击确认电子邮件(confirmation email)中的验证链接(verify link)。
单击顶部的识别,然后单击(Recognize)上传(Upload)以选择您的文件。选择您的语言、输出格式,然后单击底部的识别。(Recognize)这个网站有一个干净的界面,也没有广告。
在我的测试中,这个站点能够从第一个测试文档(test document)中抓取文本,但是当我打开Word 文档(Word doc)时,它绝对是巨大的,所以我最终又做了一次,并选择了纯文本(Plain Text)作为输出格式(output format)。
对于列的第二次测试,Word 文档(Word document)是空的,我什至找不到文本。不知道那里发生了什么,但它似乎无法处理除了简单段落之外的任何内容。以下是FineReader的结果:Test1 Doc 和 Test2 Doc。
新OCR
下一个站点NewOCR.com还可以,但不如第一个站点好。首先,它有广告,但谢天谢地没有很多。您首先选择您的文件,然后单击预览(Preview)按钮。
然后,您可以旋转图像并调整要扫描文本的区域。这很像扫描过程(scanning process)在带有扫描仪的计算机上的工作方式。
如果文档有多个列,您可以检查页面布局分析( Page layout analysis)按钮,它会尝试将文本拆分为列。单击(Click)OCR 按钮(OCR button),等待几秒钟以完成,然后在页面刷新时向下滚动到底部。
在第一次测试中,它正确地获取了所有文本,但由于某种原因,文档中的每个 T 都大写了!不知道它为什么会这样做,但确实如此。在启用页面分析(page analysis)的第二个测试中,它获得了大部分文本,但布局完全关闭。
以下是NewOCR的结果:Test1 Doc 和 Test2 Doc。
结论
如您所见,不幸的是,大多数时候免费并不能真正给您带来很好的结果。提到的第一个站点是迄今为止最好的站点,因为它不仅在识别所有文本方面做得很好,而且还设法保留了原始文档的格式。
但是,如果您只需要文本,上面的大多数网站都应该能够为您做到这一点。如果您有任何问题,请随时发表评论。享受!
5 Free Online OCR Services Tested and Reviewed
Hаve a PDF document or an image that you would like to convert to text? Recently, someone sent me a document in the mail that I needed to edіt and send back with correctionѕ. The person couldn’t locate a digital copy, so I was tasked with getting all that text іnto digital format.
There was no way I was going to spend hours typing everything back in, so I ended up taking a nice high-quality picture of the document and then burned my way through a bunch of online OCR services to see which one would give me the best results.
In this article, I’ll go through a couple of my favorite sites for OCR that are free. It’s worth noting that most of these sites provide a basic free service and then have paid options if you want extra features like bigger images, multi-page PDF documents, different input languages, etc.
It’s also good to know beforehand that most of these services will not be able to match the formatting of your original document. These are mainly for extracting text and that’s it. If you need everything to be in a specific layout or format, you’ll have to manually do that once you get all the text from the OCR.
In addition, the best results for getting the text will come from documents with a 200 to 400 DPI resolution. If you have a low DPI image, the results will not be as good.
Lastly, there were a lot of sites I tested that just didn’t work. If you Google free online OCR, you’ll see a bunch of sites but several of the sites in the top 10 results didn’t even complete the conversion. Some would time out, other would give errors and some just got stuck on the “converting” page, so I didn’t even bother to mention those sites.
For each site, I tested two documents to see how well the output would be. For my tests, I simply used my iPhone 5S to take a picture of both documents and then uploaded them directly to the websites for conversion.
In case you want to see what the images looked like that I used for my test, I have attached them here: Test1 and Test2. Note that these are not the full resolution versions of the images taken from the phone. I used the full resolution image when uploading to the sites.
OnlineOCR
OnlineOCR.net is a clean and simple site that delivered very good results in my test. The main thing I like about it is that it doesn’t have tons of ads all over the place, which is usually the case with these kinds of niche service sites.
To start, select your file and wait till it finishes uploading. The max upload size for this site is 100 MB. If you register for a free account, you get a few extra features like the bigger upload size, multi-page PDFs, different input languages, more conversions per hour, etc.
Next, choose your input language and then choose the output format. You can choose from Word, Excel, or Plain Text. Click the Convert button and you’ll see the text displayed at the bottom in a box along with a download link.
If all you want is the text, just copy and paste it from the box. However, I suggest you download the Word document because it does a surprisingly great job of keeping the layout of the original document.
For example, when I opened the Word document for my second test, I was surprised to find that the document included a table with three columns, just like in the image.
Out of all the sites, this one was the best by far. It’s totally worth registering for if you need to do a lot of conversions.
For completeness, I am also going to link to the output files created by each service so you can see the results for yourself. Here are the results from OnlineOCR: Test1 Doc and Test2 Doc.
Note that when opening these Word documents on your computer, you’ll get a message in Word stating that it’s from the Internet and editing has been disabled. That is perfectly OK because Word doesn’t trust documents from the Internet and you really do not have to enable editing if you just want to view the document.
i2OCR
Another site that gave pretty good results was i2OCR. The process is very similar: choose your language, file, and then press Extract Text.
You’ll have to wait a minute or two here because this site takes a bit longer. Also, in Step 2, make sure that your image is showing right-side up in the preview, otherwise you’ll get a bunch of gibberish as output. For some reason, the images from my iPhone were showing in portrait mode on my computer, but landscape when I uploaded to this site.
I had to manually open the image in a photo editing app, rotate it 90 degrees, then rotate it back to portrait and then save it again. Once complete, scroll down and it’ll show you a preview of the text along with a download button.
This site fared pretty well with the output for the first test, but didn’t do so well with the second test that had the column layout. Here are the results from i2OCR: Test1 Doc and Test2 Doc.
FreeOCR
Free-OCR.com will take your images and convert them into plain text. It does not have an option to export to Word format. Choose your file, select a language and then click Start.
The site is fast and you’ll get the output fairly quickly. Just click on the link to download the text file to your computer.
As with NewOCR mentioned down below, this site capitalizes all the T’s in the document. I have no idea why it would do that, but for some odd reason this site and NewOCR both did this. It’s not a big deal to change it, but it’s a tedious process you really shouldn’t have to do.
Here are the results from FreeOCR: Test1 Doc and Test2 Doc.
ABBYY FineReader Online
In order to use FineReader Online, you have to register for an account, which gets you a 15-day free trial to OCR up to 10 pages for free. If you only need to do a one-time OCR for a couple of pages, then you can use this service. Make sure that you click the verify link in the confirmation email after you register.
Click on Recognize at the top and then click Upload to select your file. Choose your language, output format and then click Recognize at the bottom. This site has a clean interface and no ads too.
In my tests, this site was able to grab the text from the first test document, but it was absolutely enormous when I opened the Word doc, so I ended up doing it again and choosing Plain Text as the output format.
For the second test with the columns, the Word document was empty and I couldn’t even find the text. Not sure what happened there, but it doesn’t seem to be able to handle anything other than simple paragraphs. Here are the results from FineReader: Test1 Doc and Test2 Doc.
NewOCR
The next site, NewOCR.com, was OK, but not nearly as good as the first site. Firstly, it’s got ads, but thankfully not a ton. You first select your file and then click the Preview button.
You can then rotate the image and adjust the area where you want to scan for text. It’s pretty much kind of like how the scanning process works on a computer with an attached scanner.
If the document has multiple columns, you can check the Page layout analysis button and it will try to split the text up into columns. Click the OCR button, wait a few seconds for it to complete and then scroll down to the bottom when the page refreshes.
In the first test, it got all the text correctly, but for some reason capitalized every T in the document! No idea why it would do that, but it did. In the second test with page analysis enabled, it got most of the text, but the layout was completely off.
Here are the results from NewOCR: Test1 Doc and Test2 Doc.
Conclusion
As you can see, free doesn’t really give you very good results most of the time unfortunately. The first site mentioned is the best by far because not only did it do a great job of recognizing all the text, it also managed to retain the format of the original document.
If you just need text, though, most of the websites above should be able to do that for you. If you have any questions, feel free to comment. Enjoy!