PDFtoolkit VCL
Edit, enhance, secure, merge, split, view, print PDF and AcroForms documents
Compatibility
Delphi C++Builder

PDF Processing with Gnostice PDFtoolkit (Part 1)

In the first part of this article, originally published on Codegear.com last month, we will see what Gnostice PDFtoolkit VCL can do for you. We will be using code examples to illustrate the ease with which PDFtoolkit will help you accomplish your PDF-related tasks.
By V. Subhash

Why PDF?

PDF is best known for its ability to retain high fidelity on all platforms. It is also a final form document format in that people do not expect PDF documents to undergo further change. That is why PDF is a popular choice for making invoices and user manuals, and also for transmitting documents over the Internet. PDF is also liked for its features such as font embedding, bookmarks, thumbnails, attachments, watermarks, annotations, encryption, and digital signatures. Last but not the least, PDF is an open format.

For these reasons, PDF has become a part of our technology-oriented lives. From e-books to web forms to sophisticated workflow transports, PDF has seen applications in innumerable ways.

Although the format supports a lot of features, most applications that produce PDF documents make use of only a few features. Usually, it is just text and images. At other times, we may have just text and form fields. PDF users often require more value, such as encryption, compression, bookmarks, stamps, or watermarks. In a workflow-like environment, the demands to cut, chop, and mince PDF documents are even more.

To meet this need, there is a flourishing market for PDF processors. In this arena, Gnostice PDFtoolkit has long established its name as a leader.

Gnostice PDFtoolkit VCL

PDFtoolkit works with existing PDF documents. PDFtoolkit helps in:

  1. Manipulation
  2. Content Extraction (text and images)
  3. Transformation (merging and splitting)
  4. Enhancement (adding bookmarks, hyperlinks, comments, stamps and watermarks; encryption; compression)
  5. Forms Processing (adding/editing/deleting/flattening form fields)
  6. Viewing and Printing (visual components)
  7. Text Search (visual component)

Reading a PDF document is straightforward, as shown in the code snippet below. Create a TgtPDFDocument object, load a document, and we are ready to roll.

...

// Create a PDF document object
gtPDFDocument1 := TgtPDFDocument.Create(Nil);
try
 // Load a document
 gtPDFDocument1.LoadFromFile('sample_doc.pdf');

 // Check if the document is loaded
 if gtPDFDocument1.IsLoaded then
  // Display page number
  Writeln('Number of pages: '
          + IntToStr(gtPDFDocument1.PageCount));

...

I. Manipulation of PDF Documents

After a PDF document has been loaded, document contents and their properties can be read and modified using the properties and methods of TgtPDFDocument object.

In the next code snippet, we first specify the measurement unit that will be used when rendering elements on a PDF page. Next, a HTML-formatted string is written on the last page. The number of the last page is obtained from the property TgtPDFDocument.PageCount().

...

// Set document measurement units to pixels
gtPDFDocument1.MeasurementUnit := muPixels;

// Write formatted text at the center of the last page
gtPDFDocument1.TextOut(
  'Hello, World!',         // HTML-formatted string
  IntToStr(gtPDFDocument1.PageCount),    // Page range
  gtPDFDocument1.                        // X-coordinate
     GetPageSize(gtPDFDocument1.PageCount, muPixels).Width/2,
  gtPDFDocument1.                        // Y-coordinate
     GetPagesize(gtPDFDocument1.PageCount, muPixels).Height/2);

// Save the modified document
gtPDFDocument1.SaveToFile('modified_doc.pdf');

...

The formatted string is written at the center of the last page. To obtain the location of the center of the page, we first call TgtPDFDocument.GetPageSize().

function GetPageSize(
    PageNo: Integer;
    MMUnit: TgtMeasurementUnit): TgtPageSize;

TgtMeasurementUnit 
  = (muPixels, muPoints, muInches, muMM, muTwips);

TgtPageSize = record
  Width,
  Height: Double;
end;

This method returns a TgtPageSize record, whose fields TgtPageSize.Width and TgtPageSize.Height provide the dimensions of the specified page. From this information, it is easy to calculate the location of the center of the page.

As you can see, PDFtoolkit provides an elegant interface that hides the complexities imposed by the format specification.

II. Content Extraction from PDF Documents

The TgtPDFDocument.GetPageElements() is another example of a very capable PDFtoolkit method.

function GetPageElements(
   APageNo: Integer;
   ElementTypes: TgtElementTypes;
   MMUnit: TgtMeasurementUnit): TgtPDFPageElementList;

TgtPDFElementType = (etText, etImage, etPath, etFormField);

This method returns a list of PDF page elements from a specified page. If the page element returned by the method is a text element, then its properties expose details such as location, font, and color. If it were an image, then you can get hold of its coordinates, scaling factor, and the actual image in the form of a TGraphic object. That’s the level of control you get with PDFtoolkit.

PDFtoolkit offers more than one way of doing the same thing, each one more useful in a special situation. The TgtPDFDocument.SearchAll() method can perform a variety of text searches for a given search string.

Function SearchAll(
    Const SearchText: String;
    AOptions: TgtSearchTypes;
    SearchList: TStringList):Integer;

TgtPDFSearchTypes = (stCaseSensitive, stWholeWord, stNone);

You can also extract all text in one go.

// Returns a list of text extracted from a specified page
function ExtractText(APageNo: Integer): TStringList;
// Returns formatted text extracted from a specified page
function ExtractTextFormatted(APageNo: Integer): TStringList;

III. Document Transformation

PDFtoolkit makes merging and splitting files a breeze. Here is a code snippet that shows how to merge several documents into one.

...

var
    gtPDFDocument1: TgtPDFDocument;
    StringList1: TStringList;
begin

  // Create a document object
  gtPDFDocument1 := TgtPDFDocument.Create(Nil);
  // Load a list with names of the
  // documents that need to be merged
  StringList1 := TStringList.Create();
  StringList1.Add('sample_doc1.pdf');
  StringList1.Add('sample_doc2.pdf');
  StringList1.Add('sample_doc3.pdf');

  try
    // Merge the documents
    gtPDFDocument1.MergeDocs(StringList1);
    // Save the merged document to file
    gtPDFDocument1.SaveToFile('merged_doc.pdf');

...

IV. Document Enhancement

PDFtoolkit can enhance a PDF document in a number of useful ways, such as

In this code snippet, we see how to add bookmarks for all pages in a document.

...

var
 I: Integer;
 gtPDFDocument1: TgtPDFDocument;
 // Bookmark
 gtPDFOutline1: TgtPDFOutline;
 // Destination linked by a bookmark
 gtPDFDestination1: TgtPDFDestination;
 // Display style of a bookmark in bookmark panel
 gtBookmarkAttribute1: TgtBookmarkAttribute;
begin
 gtPDFDocument1 := TgtPDFDocument.Create(Nil);
 try
  gtPDFDocument1.LoadFromFile('sample_doc.pdf');
  if gtPDFDocument1.IsLoaded then
   begin
   // For each page in the document
   for I := 1 to gtPDFDocument1.PageCount do
   begin
   // Create a bookmark that links to the top-left
   // corner of the page in the current iteration
   gtPDFDestination1 :=
     TgtPDFDestination.Create(
     I,      // Number of the page
     dtXYZ,  // Destination type (use x-y coordinates and zoom)
     0,      // X-coordinate of the destination
     0,      // Y-coordinate of the destination
     100);   // Zoom

   // Create bookmarks with maroon-colored, bold-italic text
   gtBookmarkAttribute1 :=
      TgtBookmarkAttribute.Create([fsBold, fsItalic], clMaroon);

   if I = 1 then
    begin
    // If it's the first page, then create a new bookmark
    gtPDFOutline1 := gtPDFDocument1.CreateNewBookmark(
        'Page #' + IntToStr(I),  // Bookmark title text
        gtPDFDestination1,
        gtBookmarkAttribute1);
    end
   else
    begin
    // For other pages, add a bookmark next to the
    // previously created bookmark
    gtPDFOutline1 := gtPDFOutline1.AddNext(
            'Page #' + IntToStr(I),  // Bookmark title text
            gtPDFDestination1,
            gtBookmarkAttribute1);
    end;
   end;
  end;
  // Save the modified document
  gtPDFDocument1.SaveToFile('modified_doc.pdf');
...

Here is how to encrypt a PDF document.

...
uses
 ...
 gtPDFCrypt,
 gtPDFDoc;

var
 gtPDFDocument1: TgtPDFDocument;
begin
 // Create a document object
 gtPDFDocument1 := TgtPDFDocument.Create(Nil);

 try
  // Load input document
  gtPDFDocument1.LoadFromFile('unencrypted_doc.pdf');

  if gtPDFDocument1.IsLoaded then
   begin
   // Modify documents encryption settings with
   // the TgtPDFEncryption object returned by
   // TgtPDFDocument.Encryption property
   with gtPDFDocument1.Encryption do
    begin
    Enabled := True;
    Level := el128bit;   // 128-bit encryption level
    OwnerPassword := 'Owner';
    UserPassword := 'User';
    UserPermissions
         := [AllowAccessibility,
             AllowPrint,
             AllowHighResPrint];
   end;
  end;

  // Save the encrypted document to file
  gtPDFDocument1.SaveToFile('encrypted_doc.pdf');
...

This code snippet shows how to mark page numbers on all pages in a PDF document.

...

var
 I: Integer;
 gtPDFDocument1: TgtPDFDocument;
begin
 gtPDFDocument1 := TgtPDFDocument.Create(Nil);
 try
  gtPDFDocument1.LoadFromFile('sample_doc.pdf');

  if gtPDFDocument1.IsLoaded then
   begin
   gtPDFDocument1.MeasurementUnit := muPixels;
   // Write formatted string on all pages
   // at specified location
   gtPDFDocument1.TextOut(
     'Page <%PageNo%> of <%TotPage%>', // page number
     gtPDFDocument1.                   // x-coordinate
        GetPageSize(I, muPixels).Width - 150,
     100);                             // y-coordinate
   end;
  // Save the modified document
  gtPDFDocument1.SaveToFile('numbered_pages_doc.pdf');
...

The text string is written by an overloaded TgtPDFDocument.TextOut() method. The string contains two built-in placeholders for the current page number and the total page number. PDFtoolkit substitutes built-in placeholders with their values at run time.

You can use placeholders with any TgtPDFDocument method that writes text to a document. You can create your own placeholders and have them substituted at run time by writing a handler for the TgtPDFDocument OnCalcVariables() event.

property OnCalcVariables: TgtOnCalcVariablesEvent
   read FOnCalcVariables
   write SetOnCalcVariables;

V. Processing PDF Forms Documents (AcroForms)

PDFtoolkit can add, edit, fill, and flatten PDF form fields. Editing a PDF form field involves changing its properties such as its looks, position, or interactivity. Filling a PDF form field involves specifying a particular value for the form field and saving the modified form field to the document. Flattening a form field removes all interactivity from the form field but ensures that the form field still looks its original self.

In this code snippet, we see how to add form fields to a document.

...

var
 gtPDFDocument1: TgtPDFDocument;
 // List box form field
 gtPDFFormListBox1: TgtPDFFormListBox;
 // Push button form field
 gtPDFFormPushButton1: TgtPDFFormPushButton;
 // Rectangles
 gtRect1: TgtRect;
 gtRect2: TgtRect;
begin
 gtPDFDocument1 := TgtPDFDocument.Create(Nil);
 try
  gtPDFDocument1.LoadFromFile('sample_doc.pdf');
  if gtPDFDocument1.IsLoaded then
   begin
   // Set document measurement unit
   gtPDFDocument1.MeasurementUnit := muInches;

   // Specify rectangle position for list box
   gtRect1.Left := 1;
   gtRect1.Right := 2;
   gtRect1.Top := 1;
   gtRect1.Bottom := 2;
 
   // Create a list box form field
   gtPDFFormListBox1 := TgtPDFFormListBox.Create();
   // Specify name for the list box in the document
   gtPDFFormListBox1.FieldName := 'lstCountry';
   // Add options to the list box
   gtPDFFormListBox1.AddItem('India');
   gtPDFFormListBox1.AddItem('USA');
   gtPDFFormListBox1.AddItem('Russia');
   gtPDFFormListBox1.AddItem('Germany');
   gtPDFFormListBox1.AddItem('Japan');
   gtPDFFormListBox1.AddItem('China');
   // Specify background color for the list box
   gtPDFFormListBox1.BackgroundColor := clWindow;
   gtPDFFormListBox1.BorderColor := clWindowFrame;
   gtPDFFormListBox1.DefaultValue := 'Select a country';
 
   // Specify location of the list box
   gtPDFFormListBox1.Rect := gtRect1;
  
   // Add the list box to the last page
   gtPDFDocument1.AddFormField(
    gtPDFFormListBox1,
    gtPDFDocument1.PageCount); // page number

  // Specify rectangle position for submit button
  gtRect2.Left := 1;
  gtRect2.Right := 2;
  gtRect2.Top := 3;
  gtRect2.Bottom := 3.25;

  // Create a push button
  gtPDFFormPushButton1 := TgtPDFFormPushButton.Create();
  gtPDFFormPushButton1.FieldName := 'btnSubmit';
  gtPDFFormPushButton1.NormalCaption := 'Submit';
  gtPDFFormPushButton1.Rect := gtRect2;
  // Set button to submit form contents when
  // it is clicked inside a viewer application
  gtPDFFormPushButton1.Action := pbaSubmit;
  // Specify URL where form contents should be
  // submitted
  gtPDFFormPushButton1.SubmitURL
     := 'http://www.gnostice.com/newsletters' +
        '/demos/200804/forms_test.asp';

  // Add push button to document
  gtPDFDocument1.AddFormField(
     gtPDFFormPushButton1,
     gtPDFDocument1.PageCount);

  // Save the modified document to file
  gtPDFDocument1.SaveToFile('forms_doc.pdf');
 end;
...

VI. Viewing and Printing PDF Documents

PDFtoolkit’s viewer is a visual component that can be used to display PDF documents on a VCL forms application. It does not require Adobe® Reader to be installed on the client machine. The viewer’s API provides methods to implement navigation, zooming, and other toolbar-driven functionality.

...

 gtPDFDocument1: TgtPDFDocument;
 gtPDFViewer1: TgtPDFViewer;
 OpenDialog1: TOpenDialog;
 edFilePath: TEdit;
 edNumberOfPages: TEdit;

...

// Select a PDF document
if not OpenDialog1.Execute then
 exit;

// Update text field
edFilePath.Text := OpenDialog1.FileName;

// Unload any previously loaded document
if gtPDFDocument1.IsLoaded then
 gtPDFDocument1.Reset;

try
 // Load the selected PDF document
 gtPDFDocument1.LoadFromFile(edFilePath.Text);

 // Check if document has been successfully loaded
 if gtPDFDocument1.IsLoaded then
  begin
  // Display number of pages
  edNumberOfPages.Text := IntToStr(gtPDFDocument1.PageCount);
  // Specify document that needs to be
  // displayed by the viewer
  gtPDFViewer1.PDFDocument :=  gtPDFDocument1;
  // Activate viewer
  gtPDFViewer1.Active := True;
...
(Click to enlarge)

PDFtoolkit’s PDF printer is a non-visual component. It has methods and properties that allow a VCL application to query available printers, select a printer, specify print settings, and print a specified set of pages to the selected printer. The most attractive thing about the printer component is that it can print PDF documents without requiring external components such as GhostScript or Adobe® Reader.

VI. Other Capabilities

PDFtoolkit includes a visual component meant for providing interactive text search capabilities to VCL forms applications. It needs to be used in conjunction with the PDFtoolkit’s viewer component. The functionality of the search panel is similar to the one found in Adobe Reader. See screenshot.

PDFtoolkit has several other components such as the PDFOutlineViewer, which can be used to display a bookmark panel for a PDF document.

In summary, Gnostice PDFtoolkit is a component suite that has well-rounded capabilities in PDF processing.

What’s Next

The next version of PDFtoolkit is currently in beta. Gnostice PDFtoolkit v3.0 will use a whole new PDF processing engine that is separate from the PDFtoolkit API logic. The key objective in writing the new PDF processor was also modularization of logic. The advantage of this approach has been phenomenal increase in speed, scalability, robustness, and scope for optimization. In the next part of this article, we will learn more about this.

---o0O0o---

Links:

---o0O0o---

Our .NET Developer Tools
Gnostice Document Studio .NET

Multi-format document-processing component suite for .NET developers.

PDFOne .NET

A .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications.

Our Delphi/C++Builder developer tools
Gnostice Document Studio Delphi

Multi-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms.

eDocEngine VCL

A Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools.

PDFtoolkit VCL

A Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents.

Our Java developer tools
Gnostice Document Studio Java

Multi-format document-processing component suite for Java developers.

PDFOne (for Java)

A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications.

Our Platform-Agnostic Cloud and On-Premises APIs
StarDocs

Cloud-hosted and On-Premises REST-based document-processing and document-viewing APIs

Privacy | Legal | Feedback | Newsletter | Blog | Resellers © 2002-2024 Gnostice Information Technologies Private Limited. All rights reserved.