Developer Tools
|
Office Productivity Applications
|
Platform-Agnostic APIs
|
Home | Online Demos | Downloads | Buy Now | Support | About Us | News | Working Together | Contact Us
In this article, we will explore various aspects of the new Gnostice PDFtoolkit 3.0. We’ll start with an overview of the PDF format, the design goals and architecture of the new PDF Processor core of PDFtoolkit, some interesting new features of v3.0, the QC systems and finally explore the demo of the new product.
As PDFtoolkit is a software library to enable software developers to work on PDF documents, it would be beneficial for the software developer to have a general understanding of the underlying technology: the technology of the Portable Document Format (PDF). This description is by no means an in-depth description of the PDF format; it is only intended to equip the software developer to better use the technology.
A PDF file, internally, is broadly divided into four parts: the Header, Body, Cross-Reference Table and Trailer. The Header is expected to contain the version number of the PDF format that the file was written to – PDF-1.7 is the newest and corresponds to Acrobat 8; 1.6 to Acrobat 7 and so on. The version number is a general indication of the version of the PDF specification only; PDF readers and processors cannot necessarily rely on it to operate on the file. The Body holds the content of the PDF that we see on screen, such as text, images, drawings, bookmarks, forms, including all the necessary information to show and make use of the content in a proper way. All the parts of the Body are divided up into what are known as objects. For example, there’s an object for each page in the document, each font used, each image and so on. The page object holds, along with other page attributes, all the drawing commands that represent the page that we see on screen. The drawing commands reference other objects in the Body, such as fonts, images, etc. A point to note here is that many pages can reference to one font or image or other resource. This should also tell us that we can optimize use of resources in PDF through reuse and generate a much smaller PDF document. The Trailer contains some very important keys to the PDF document, without which no reading of the PDF document can even be possible. As we found out that all content in the Body is held in specialized types of objects, and that the objects are reusable, it would not make much sense to store the content in objects if they were not accessible randomly. The PDF document format is a quite well evolved and well thought out document format and it does take care of the random access of the objects in the Body. Random access is achieved by storing the absolute byte offset address of each object in a table known as the Cross-reference table (or XRef Table) and one of the keys that the Trailer contains is the starting point of this XRef table. The implementation detail to this is that the offset addresses in the XRef table needs to be updated for all objects that occur after the object that was modified in the PDF document. There are of course optimized mechanisms to handle this scenario.
As the PDF format evolved, it acquired many useful qualities and technologies to make the file more compact, faster to read, and support some of the standard document control and verification technologies. Following is a short description of a pick of those technologies and techniques:
Now that we have a good understanding of the PDF format, we should also be able to better appreciate the implications of performing operations on a PDF file, but more importantly, realize the potential possibilities when working with PDF documents, which is exactly the aim of the new Gnostice PDF Processor, to enable the software developer to harness the power of PDF.
The new PDF Processor core is the core engine that powers PDFtoolkit 3.0. The new PDF Processor has been designed and built from the ground up with the following key objectives:
The PDF Processor, in its design and organization, closely reflects the PDF format, where each PDF object type, starting from the base type, has a corresponding class in the PDF Processor, including its hierarchy. It is also modular in such a way that only the necessary parts and layers needed to achieve a desired task can be used. For example, to read a PDF document and its objects there is a separate layer, to edit another, another to view/print and so on. The advantage that this design and organization provides us is exactly those we set out to achieve, through our objectives.
So far most of the core PDF Processor’s implementation is complete and now the team is integrating the core into PDFtoolkit 3.0. Further down in this article is a pre-release demonstration to the capabilities of the new PDF Processor.
Right through the development of the new PDF Processor, we have placed great emphasis on the quality of the product, testing, tuning, optimizing and documenting all the parts. One example of the systems that we implemented (in addition to the extensive unit test automation through DUnit*) is an automated testing framework that automates the allocation and performance testing of all the features on a large set of scenarios using several thousand PDF documents we have been able to collect so far. The testing framework uses AutomatedQA’s award winning AQTime testing tool and it’s SDK to perform this testing and generate reports that the team can act on.
The PDFtoolkit version 3.0 EXE demo program showcases the new, optimized PDF Processor core of PDFtoolkit 3.0.
This demo enables us to try out some of the complex functions of PDFtoolkit that require high computation and file manipulation, and experience:
For example, files such as the Acrobat PDF Specification (31.7 MB, 1310 pages, latest PDF 1.7 format with cross-reference streams) loads almost instantly - a 100x improvement over the earlier version.
Functions exposed in this EXE demo:
Updates to EXE demo with more functions exposed will be provided shortly.
The zipped EXE demo can be downloaded from this link. For more information regarding download, features, purchase, and others, please see follow the links listed below.
---o0O0o---
Our .NET Developer Tools | |
---|---|
Gnostice Document Studio .NETMulti-format document-processing component suite for .NET developers. |
PDFOne .NETA .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications. |
Our Delphi/C++Builder developer tools | |
---|---|
Gnostice Document Studio DelphiMulti-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms. |
eDocEngine VCLA Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools. |
PDFtoolkit VCLA Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents. |
Our Java developer tools | |
---|---|
Gnostice Document Studio JavaMulti-format document-processing component suite for Java developers. |
PDFOne (for Java)A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications. |
Our Platform-Agnostic Cloud and On-Premises APIs | |
---|---|
StarDocsCloud-hosted and On-Premises REST-based document-processing and document-viewing APIs |
Privacy | Legal | Feedback | Newsletter | Blog | Resellers | © 2002-2024 Gnostice Information Technologies Private Limited. All rights reserved. |