突出显示PDF文档中的文本可以方便地标记您以后可以快速访问的重要区域。您可以使用Microsoft Edge 突出显示 PDF(Microsoft Edge to highlight PDF) 或任何其他带有PDF突出显示功能的软件。有时,您可能还觉得只需要突出显示的文本,这样您就可以获得包含所有基本文本的PDF概要。(PDF)如果您正在寻找一些方法来仅将PDF中突出显示的文本保存为TXT文件,那么这篇文章可能会有所帮助。
从 PDF 中提取突出显示的文本
有一些免费软件和服务可以从PDF文件中提取突出显示的文本并将其保存为文本文件:
- PDF 高光提取器
- 福昕阅读器
- Sumnotes.net
- DyAnnotationExtractor。
让我们一一检查这些PDF Highlight Extractor软件。(PDF Highlight Extractor)
1] PDF高亮提取器
PDF Highlight Extractor是从PDF文件中提取突出显示文本的最简单的选项之一。这个开源的(open-source) PDF 文本高亮提取器(PDF text highlight extractor)有两个引人注目的功能。您可以在软件界面上预览高亮显示(preview highlighted text)的 PDF 文本。
第二个功能是您可以设置开始或结束页面或页面范围来提取文本(set start or end page or page range to extract the text)。因此,您可以定义页码来获取突出显示的文本,而不是扫描整个PDF 。
另一个不错的功能是您可以选择将文本保存为纯文本(save text as plain text)或Excel 文件(Excel file)。
在其界面上,使用给定选项添加您的PDF文件,然后按提取(Extract)按钮。如果要设置页面范围或保持原样,请取消选中所有页面选项。(All Pages)获取文本后,您可以预览它。最后,按文本(Text)或Excel按钮保存突出显示的文本。
你可以从这里(here)下载这个软件。使用这个软件也需要Java 。因此,安装Java(如果尚未安装)并执行此软件以供使用。
2]福昕阅读器
Foxit Reader 是最好的免费 PDF 阅读器之一。您可以在单独的选项卡中打开多个PDF文件、突出显示PDF、添加注释、导出评论(export comments)、添加签名(add signatures)等。在巨大的功能列表中,从PDF中提取突出显示的文本也在那里。此功能的最佳部分是它还可以将页码与提取的文本一起保存(saves page numbers along with the extracted text)。
要从PDF中获取突出显示的文本,请在其界面上打开PDF文件,然后访问“评论(Comment)”选项卡。在该选项卡中,单击“管理评论(Manage Comments)”部分中的“导出(Export)”选项。您将看到突出显示的文本(Highlighted Text)选项。使用该选项,然后您可以将所有突出显示的文本保存为文本文件。
这(Here)是该软件的下载链接。在安装过程中,您应该选择自定义安装(custom installation)以仅包含此软件所需的组件。
3] Sumnotes.net
Sumnotes.net是一项免费服务,可让您注释PDF以及提取突出显示的文本。所有突出显示的文本都在左侧边栏中单独可见。使用该侧边栏,您还可以删除不需要的突出显示文本,(remove highlighted text that you don’t need)然后下载其余突出显示的文本。
在下载高亮文本之前,您还可以包含页码并排除(exclude)特定颜色(highlighted text of specific color)的高亮文本。
您还可以选择将PDF 中突出显示的文本另存为 Excel(save highlighted text from PDF as Excel)或Word文件。所以,功能还是不错的。您可以注册一个免费计划,然后每次下载(per download)提取 50 个亮点(extract 50 highlights)或注释,这在大多数情况下就足够了。
这(Here)是其主页的链接。要从PDF中提取突出显示的文本,请从 PC 或Google Drive添加PDF。上传PDF(PDF)后,左侧可以看到注释和突出显示的文本。使用下载注释(Download Annotations)选项,然后您可以将突出显示的文本保存为TXT、XLSX或DOC格式文件。
4] DyAnnotationExtractor
DyAnnotationExtractor软件可以帮助您从 PDF 文档中提取突出显示的文本和注释。(comments)它是一个命令行(command-line)软件,但使用起来非常简单。只需一个命令即可获取输入PDF文件中突出显示的文本。
您可以使用此链接(this link)获取此软件。下载(Download)其ZIP文件,然后解压缩。为了更容易运行命令,您还应该将PDF放在您提取此软件的同一文件夹中。之后,在该文件夹中打开命令提示符(Command Prompt)窗口。您可以通过在该文件夹的地址框中键入cmd然后按Enter键来执行此操作。
打开CMD(CMD)窗口后,添加本软件的BAT文件、输入命令,包括输入PDF的路径、输出命令、输出文件的名称以及 '.txt' 扩展名。完整的命令将是 -
DyAnnotationExtractor.bat -input path of input PDF -output outputfilename.txt
执行命令。等待(Wait)几秒钟,纯文本文件将准备就绪,其中包含从该PDF中提取的所有突出显示的文本和注释。输出文件保存在同一输入文件夹中。
因此,您可以使用这些选项从PDF中提取突出显示的文本,然后将输出保存为文本文件。希望(Hope)这些帮助。
How to Extract Highlighted Text from PDF as Text File?
Highlighting text in a PDF document is handу to mark the important areas that you cаn access later quickly. You сan use Microsoft Edge to highlight PDF or any other software that come with PDF highlighting feature. Sometimes, you might have also felt the need to have only the highlighted text so that you can have the summary of PDF containing all the essential text. If you are looking for some ways to save only highlighted text from a PDF as a TXT file, then this post can be helpful.
Extract Highlighted Text from PDF
There are some free software and a service to extract highlighted text from a PDF file and save it as a text file:
- PDF Highlight Extractor
- Foxit Reader
- Sumnotes.net
- DyAnnotationExtractor.
Let’s check these PDF Highlight Extractor software one by one.
1] PDF Highlight Extractor
PDF Highlight Extractor is one of the easiest options to extract the highlighted text from a PDF file. This open-source PDF text highlight extractor has two features that catch the attention. You can preview highlighted text of PDF on the software interface.
The second feature is you can set start or end page or page range to extract the text. So, instead of scanning the entire PDF, you can define page numbers to get the highlighted text.
Another good feature is you have the option to save text as plain text or Excel file.
On its interface, add your PDF file using the given option, and then press the Extract button. Uncheck All Pages option if you want to set the page range or leave it as it is. Once the text is fetched, you can preview it. Finally, press the Text or Excel button to save the highlighted text.
You can download this software from here. Java is also needed to use this software. So, install Java (if not already) and execute this software to use.
2] Foxit Reader
Foxit Reader is one of the best free PDF readers. You can open multiple PDF files in separate tabs, highlight PDF, add a note, export comments, add signatures, and more. Among the huge list of features, extracting highlighted text from PDF is also there. The best part of this feature is it also saves page numbers along with the extracted text.
To fetch highlighted text from PDF, open PDF file on its interface, and access the Comment tab. In that tab, click on Export option available in Manage Comments section. You will see the Highlighted Text option. Use that option and then you can save all the highlighted text as a text file.
Here is the download link for this software. During the installation, you should select custom installation to include only the required components of this software.
3] Sumnotes.net
Sumnotes.net is a free service that lets you annotate PDF as well as extract the highlighted text. All the highlighted text is visible separately on the left sidebar. Using that sidebar, you can also remove highlighted text that you don’t need and then download the rest of the highlighted text.
Before downloading the highlighted text, you can also include page numbers and exclude the highlighted text of specific color.
You also have the option to save highlighted text from PDF as Excel or Word file. So, the features are good. You can sign up with a free plan and then extract 50 highlights or annotations per download, which is sufficient in most cases.
Here is the link to its homepage. To extract highlighted text from PDF, add a PDF from PC or Google Drive. When the PDF is uploaded, annotations and highlighted text are visible on the left side. Use the Download Annotations option and then you can save the highlighted text in TXT, XLSX, or DOC format file.
4] DyAnnotationExtractor
DyAnnotationExtractor software can help you extract highlighted text and comments from a PDF document. It is a command-line software, but using it is very simple. Just a single command will fetch the text highlighted in the input PDF file.
You can get this software using this link. Download its ZIP file and then extract it. To make it easier to run the command, you should also put PDF in the same folder where you extracted this software. After that, open Command Prompt window in that folder. You can do that by typing cmd in the address box of that folder and then pressing Enter key.
When CMD window is opened, add BAT file of this software, input command including the path of input PDF, output command, and name of output file along with ‘.txt’ extension. The full command will be-
DyAnnotationExtractor.bat -input path of input PDF -output outputfilename.txt
Execute the command. Wait for a few seconds and plain text file will be ready containing all the highlighted text and comments fetched from that PDF. The output file is saved in the same input folder.
So, these are some options you can use to extract highlighted text from PDF and then save the output as a text file. Hope these help.