Poppler (software)

(Redirected from Pdftotext)

Poppler is a free and open-source software library for rendering Portable Document Format (PDF) documents. Its development is supported by freedesktop.org. Commonly used on Linux systems,[3] it powers the PDF viewers of the GNOME and KDE desktop environments.

Poppler
Developer(s)freedesktop.org
Initial release4 March 2005; 19 years ago (2005-03-04)[nb 1]
Repository
Written inC++
Operating systemLinux, Unix, BSD, Windows
TypeLibrary
LicenseGPLv2 or GPLv3[2]
Websitepoppler.freedesktop.org Edit this at Wikidata

History

edit

The project was started by Kristian Høgsberg with two goals:[4] to provide PDF rendering functionality as a shared library, to centralize maintenance effort and to go beyond the goals of Xpdf, and to integrate with functionality provided by modern operating systems.

By the version 0.18 release in 2011, the poppler library represented a complete implementation of ISO 32000-1,[3] the PDF format standard, and was the first major free PDF library to support its forms (only Acroforms but not full XFA forms)[5][6] and annotations features.[3]

Poppler is a fork of Xpdf-3.0, a PDF file viewer developed by Derek Noonburg of Glyph and Cog, LLC.[4][7]

The name Poppler comes from "The Problem with Popplers," an episode of the animated series Futurama.[7]

Applications

edit

Notable free software applications using Poppler to render PDF documents include:[8]

Application GUI widgets
Evince GTK
Inkscape GTK
LibreOffice 4.x GTK[9]
Okular Qt
pdftotext, pdftohtml, etc. none
TeXstudio Qt
TeXworks Qt
xpopple Motif
Zathura GTK

Features

edit

Poppler can use two back-ends for drawing PDF documents, Cairo and Splash. Its features may depend on which back-end it employs. A third back-end based on Qt4's painting framework "Arthur", is available, but is incomplete and no longer under active development.[10] Bindings exist for Glib and Qt5, that provide interfaces to the Poppler backends, although the Qt5 bindings support only the Splash and Arthur backends. There is a patchset available to add support for the Cairo backend to the Qt5 bindings,[11] but the Poppler project does not currently wish to integrate the feature into the library proper.[12]

Some characteristics of the back-ends include:

Poppler comes with a text-rendering back-end as well, which can be invoked from the command line utility pdftotext. It is useful for searching for strings in PDFs from the command line, using the utility grep, for instance.[13]

Example:

pdftotext file.pdf - | grep string

Poppler partially supports annotations and Acroforms. It does not support JavaScript[14] nor the rendering of full XFA forms.[5]

poppler-utils

edit

poppler-utils is a collection of command-line utilities built on Poppler's library API, to manage PDF and extract contents:

  • pdfattach – add a new embedded file (attachment) to an existing PDF
  • pdfdetach – extract embedded documents from a PDF
  • pdffonts – lists the fonts used in a PDF
  • pdfimages – extract all embedded images at native resolution from a PDF
  • pdfinfo – list all information of a PDF
  • pdfseparate – extract single pages from a PDF
  • pdftocairo – convert single pages from a PDF to vector or bitmap formats using cairo
  • pdftohtml – convert PDF to HTML format retaining formatting
  • pdftoppm – convert a PDF page to a bitmap
  • pdftops – convert PDF to printable PS format
  • pdftotext – extract all text from PDF
  • pdfunite – merges several PDFs

See also

edit

Notes

edit
  1. ^ This file-modification date appears on the version 0.1.1 tarball, the "first real release", according to Poppler's release history.[1]

References

edit
  1. ^ a b "Poppler Releases". Retrieved 7 December 2020.
  2. ^ "Poppler README-XPDF". Retrieved 26 September 2015.
  3. ^ a b c "GNU PDF project leaves FSF High Priority Projects list; mission complete!". 6 October 2011. Retrieved 11 October 2011.
  4. ^ a b "Poppler README file". Archived from the original on 8 July 2012. Retrieved 21 January 2010.
  5. ^ a b Bug 18935 - Form data is not saved for PDF files using XFA forms, will show old values when opened in acroread / Adobe Reader, 7 December 2008
  6. ^ PDF v1.7 asks to upgrade Adobe Reader, 27 January 2009
  7. ^ a b "Poppler Homepage". Retrieved 3 January 2015.
  8. ^ a b c "Poppler Wiki. Information about Poppler". Retrieved 21 January 2010.
  9. ^ "LibreOffice 4.2 ReleaseNotes". documentfoundation.org.
  10. ^ Albert Astals Cid (15 May 2009). "Re: [poppler] Qt4 Arthur". mail-archive.com.
  11. ^ "giddie/poppler-cairo-backend". GitHub. 8 December 2021.
  12. ^ "Bug 25240 – Cairo backend for Qt4 wrapper". freedesktop.org.
  13. ^ "Searching PDF Files With grep". Retrieved 21 January 2010.
  14. ^ Albert Astals Cid (8 February 2008). "Support JavaScript (#162)". GitLab. Retrieved 3 October 2018.
  • Albert Astals Cid (29 August 2005) The Poppler Library, presentation at the 2005 KDE conference
edit