Developer Tools
|
Office Productivity Applications
|
Platform-Agnostic APIs
|
Home | Online Demos | Downloads | Buy Now | Support | About Us | News | Working Together | Contact Us
In Version 4 of PDFOne, we introduced a new method getPageElements()
in the PdfDocument
class.
List getPageElements(int pageNum, int elementTypes) List getPageElements(String pageRange, int elementTypes)
This method returns a list containing PdfPageElement
instances. But, PdfPageElement
is the parent class of individual element classes, namely PdfPageCompositeElement
, PdfPageImageElement
, PdfPagePathElement
, and PdfPageTextElement
. You can directly access items in the returned list as instances of these derived classes.
The derived classes provide a lot more information about the retrieved page element. For example, with the PdfPageTextElement
instance, you can not only find the actual text represented by the text element but also the location, font rotation (if any), and other details. In the following code snippet, we will see how this is done.
import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import java.util.ArrayList; import javax.imageio.ImageIO; import com.gnostice.pdfone.PDFOne; import com.gnostice.pdfone.PdfDocument; import com.gnostice.pdfone.PdfException; import com.gnostice.pdfone.PdfPageElement; import com.gnostice.pdfone.PdfPageImageElement; import com.gnostice.pdfone.PdfPageTextElement; public class Page_Element_Parsing_Demo { public static void main(String[] args) throws IOException, PdfException, Exception { int i, n; PdfPageTextElement PdfPageTextElement1; PdfPageImageElement PdfPageImageElement1; BufferedImage BufferedImage1; // Load a PDF document PdfDocument doc = new PdfDocument(); doc.load("sample.pdf"); // Retrieve image elements from page 1 of the document ArrayList lstImageElements = (ArrayList) doc.getPageElements(1, PdfPageElement.ELEMENT_TYPE_IMAGE); // Retrieve text elements from page 1 of the document ArrayList lstTextElements = (ArrayList) doc.getPageElements(1, PdfPageElement.ELEMENT_TYPE_TEXT); // Iterate through retrieved image elements n = lstImageElements.size(); for (i = 0; i < n; i++) { // Save image content of the current image element to file PdfPageImageElement1 = (PdfPageImageElement) lstImageElements.get(i); BufferedImage1 = PdfPageImageElement1.getImage(); File File1 = new File("page1_image" + (i+1) + ".png"); try { ImageIO.write(BufferedImage1, "png", File1); } catch (Exception e) { System.out.println("Sorry, there was an error." + e.getMessage()); } // Print details of the current image element System.out.println("Image Element #" + (i+1) + " saved to: " + "page1_image" + (i+1) + ".bmp (" + PdfPageImageElement1.getImageHeight() + " x " + PdfPageImageElement1.getImageWidth() + ")"); } // Close the document - it needs to be loaded only when images // need to be extracted - images are accessed only on-demand. doc.close(); // Iterate through retrieved text elements n = lstTextElements.size(); for (i = 0; i < n; i++) { PdfPageTextElement1 = (PdfPageTextElement) lstTextElements.get(i); // Print details of the current text element System.out.println("Text Element #" + (i+1) + " \"" + PdfPageTextElement1.getText() + "\" uses font " + PdfPageTextElement1.getTextFontInfo().getFontName()); } } }
This code snippet tries to parse text and image elements in page 1 of a document. The text elements are displayed in the console while image elements are saved to a file. Here is the document that was used to test this document.
Here is the output of the program when used with the above document. The output mentions the image and text elements that were found in page 1 of the document.
Here is the image element after it was saved to a file.
---o0O0o---
Our .NET Developer Tools | |
---|---|
Gnostice Document Studio .NETMulti-format document-processing component suite for .NET developers. |
PDFOne .NETA .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications. |
Our Delphi/C++Builder developer tools | |
---|---|
Gnostice Document Studio DelphiMulti-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms. |
eDocEngine VCLA Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools. |
PDFtoolkit VCLA Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents. |
Our Java developer tools | |
---|---|
Gnostice Document Studio JavaMulti-format document-processing component suite for Java developers. |
PDFOne (for Java)A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications. |
Our Platform-Agnostic Cloud and On-Premises APIs | |
---|---|
StarDocsCloud-hosted and On-Premises REST-based document-processing and document-viewing APIs |
Privacy | Legal | Feedback | Newsletter | Blog | Resellers | © 2002-2024 Gnostice Information Technologies Private Limited. All rights reserved. |