Developer Tools
|
Office Productivity Applications
|
Platform-Agnostic APIs
|
Home | Online Demos | Downloads | Buy Now | Support | About Us | News | Working Together | Contact Us
The inspiration for this article is from a query sent by one of our customers.
The customer is a user of PDFtoolkit VCL. He receives a lot of PDF documents containing demographic data - output of some process over which he had no control. He had to extract the demographic data from the PDF files and use that data for some other process.
The data was in a structured format and occurred in the same locations on the first page of all the documents. Now, given the location of the data, was there a way to extract the data, he wanted to know.
The following is a slightly abridged version of the code snippet we sent to the client.
var PageElements: TgtPDFPageElementList; PageItem: TgtPDFTextElement; LI, JI : Integer; XCord, YCord : Double; begin try Result := ""; PDFDoc.LoadFromFile("input.pdf"); // Gets text elements from page 1 PageElements := PDFDoc.GetPageElements(1,[etText],muPixels); // Parses the text elements in page 1 for JI := 0 to PageElements.Count -1 do begin PageItem := TgtPDFTextElement(PageElements.Items[JI]); // Retrieves coordinates of the text element XCord := TgtPDFPageElement(PageItem).XCordOrigin; YCord := TgtPDFPageElement(PageItem).YCordOrigin; // Checks if the text element is at (100, 250) if ((Trunc(XCord) = 100) and (Trunc(YCord) = 250)) then begin Result := PageItem.Text; break; end; end; finally FreeAndNil(PageElements); end; end;
This method is written so that it will extract text data occurring at coordinates (100, 250) on page 1 of a PDF document input.pdf. So, the method parses all text elements on page 1 of the PDF file, checks coordinates of each, and when the coordinates match (100, 250) returns the text string represented by that text element.
---o0O0o---
Our .NET Developer Tools | |
---|---|
![]() Gnostice Document Studio .NETMulti-format document-processing component suite for .NET developers. |
![]() PDFOne .NETA .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications. |
Our Delphi/C++Builder developer tools | |
---|---|
![]() Gnostice Document Studio DelphiMulti-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms. |
![]() eDocEngine VCLA Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools. |
![]() PDFtoolkit VCLA Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents. |
Our Java developer tools | |
---|---|
![]() Gnostice Document Studio JavaMulti-format document-processing component suite for Java developers. |
![]() PDFOne (for Java)A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications. |
Our Platform-Agnostic Cloud and On-Premises APIs | |
---|---|
![]() StarDocsCloud-hosted and On-Premises REST-based document-processing and document-viewing APIs |
Privacy | Legal | Feedback | Newsletter | Blog | Resellers | © 2002-2025 Gnostice Information Technologies Private Limited. All rights reserved. |