Developer Tools
|
Office Productivity Applications
|
Platform-Agnostic APIs
|
Home | Online Demos | Downloads | Buy Now | Support | About Us | News | Working Together | Contact Us
Gnostice Document Studio .NET |
PDFOne .NET |
Gnostice Document Studio Java |
PDFOne (for Java) |
Gnostice Document Studio Delphi |
eDocEngine VCL |
PDFtoolkit VCL |
StarDocs Web APIs |
Select the language for the code snippets
Digitization is the process of converting analog content to a digital form which makes the content amenable for further processing. Digitizing printed matter consists of two steps. The first step is the acquisition of the printed page(s) as a set of images, which is accomplished using a scanner or a high resolution camera. The second step is the optical recognition and digitization of the content present in the acquired image. Going further the recognized text and the original image can be combined such that the image is retained as-is while also making the document searchable. The PDF file format allows such composition where the original image forms the main page content and the recognized text is superimposed on the image as an invisible layer. The placement and sizing of the invisible text is matched as closely as possible to the original content in the image so that selection highlight of the invisible text closely matches the corresponding printed text on the image. The searchable PDF thus produced retains the legibility for human reading while also making the document amenable for further processing such as content search and copy/extraction.
StarDocs provides APIs to create a searchable PDF given a scanned image or a scanned PDF. These APIs are part of the Document Converter APIs. These APIs can accept a scanned image either as an image file(s) or as a PDF file and can produce a searchable PDF file.
The below screenshot shows a scanned image being viewed in the StarDocs HTML viewer after it has been acquired.
The below screenshot shows the same image after it is converted to a searchable PDF file. The user can now search for content in the viewer.
Let's look at the API for converting scanned content to searchable PDF. Before proceeding further please make sure you have selected the appropriate language for the code snippets using the drop down menu at the top of this article.
After authentication and uploading of the scanned document (image or PDF file) you need to make searchable, you will get the document URL or list of URLs. We pass in this URL (or list) to the document converter API as shown below.
// Set up connection details var stardocs = new Gnostice.StarDocs( new Gnostice.ConnectionInfo( 'https://api.gnostice.com/stardocs/v1', '<API Key>', '<API Secret>'), new Preferences( // Whether to force full permissions on PDF files protected // with a permissions/owner/master password new DocPasswordSettings(true)) ); // Authenticate stardocs.auth.loginApp() .done(function(response) { // Upload file var selectedFile = document.getElementById('input').files[0]; stardocs.storage.upload(selectedFile) .done(function(response) { var documentUrl = response.documents[0].url; // Setup the digitizer settings var digitizerSettings = { // Supported values are "off" (default), "allImages" digitizationMode: "allImages", // Array of strings listing the languages of the text present in // the scanned document. "eng" is default. documentLanguages: ["eng", "deu"], // The type of elements that need to be recognized and digitized. // Currently only "text" is supported recognizeElements: ["eng"], // Whether any skew correction should be performed (default is true) skewCorrection: true, // Which image enhancement techniques (if any) should be applied to // the input image before attempting to recognize the elements imageEnhancementSettings: { // Supported values are "off" (default), "auto" and "useSpecified" enhancementMode: "auto" } }; // Convert to searchable PDF stardocs.docOperations.convertToPDF("convertToSingleFile", [docUrls], null, null, null, digitizerSettings) .done(function(response) { var newDocUrl = response.documents[0].url; // Do something with resultant document (newDocUrl) // ... }); }); });
// Set up connection details StarDocs starDocs = new StarDocs( new ConnectionInfo( new Uri("https://api.gnostice.com/stardocs/v1"), "<API Key>", "<API Secret>"), new Preferences( // Force full permissions on PDF files protected // with an permissions/owner/master password new DocPasswordSettings(true)) ); // Authenticate starDocs.Auth.loginApp(); // Input file FileObject fileObjectInput = new FileObject(@"C:\Documents\Statement.pdf"); ListfileObjectInputs = new List () { fileObjectInput }; // Setup the digitizer settings ConverterDigitizerSettings digitizerSettings = new ConverterDigitizerSettings(); digitizerSettings.DigitizationMode = DigitizationMode.AllImages; // Array of strings listing the languages of the text present in // the scanned document. "eng" is default. digitizerSettings.DocumentLanguages = new string[] { "eng", "deu" }; // The type of elements that need to be recognized and digitized. // Currently only text is supported digitizerSettings.RecognizeElements = RecognizableElementType.Text; // Which image enhancement techniques (if any) should be applied to // the input image before attempting to recognize the elements digitizerSettings.ImageEnhancementSettings.ImageEnhancementMode = ImageEnhancementMode.Auto; // Whether any skew correction should be performed (default is true) digitizerSettings.SkewCorrection = true; // Convert to searchable PDF List outFiles = starDocs.DocOperations.ConvertToPDF(fileObjectInputs, null, null, null, ConversionMode.ConvertToSingleFile, digitizerSettings); DocObject docObjectOutput = outFiles[0]; // Do something with resultant document (docObjectOutput) // ...
var StarDocs: TgtStarDocsSDK; LInFiles: TObjectList; LOutFiles: TObjectList ; FileObjectInput: TgtFileObject; DocObjectOutput: TgtDocObject; DocumentLanguages: TArray ; begin StarDocs := nil; LInFiles := nil; LOutFiles := nil; DocObjectOutput := nil; try // Set up connection details StarDocs := TgtStarDocsSDK.Create(nil); StarDocs.ConnectionSettings.ApiServerUri := 'http://api.gnostice.com/stardocs/v1'; StarDocs.ConnectionSettings.ApiKey := '<API Key>'; StarDocs.ConnectionSettings.ApiSecret := '<API Secret>'; // Force full permissions on PDF files protected // with an permissions/owner/master password StarDocs.Preferences.DocPasswordSettings.ForceFullPermission := True; // Authenticate StarDocs.Auth.loginApp; // Input file LInFiles := TObjectList .Create; LInFiles.Add(TgtFileObject.Create ('D:\Work\Demos\build2016\demos\SampleFiles\OCR\Deutsch.png')); // Setup the digitizer settings StarDocs.DocOperations.ConverterDigitizerSettings.DigitizationMode := dmoAllImages; // Array of strings listing the languages of the text present in // the scanned document. "eng" is default. DocumentLanguages := TArray .Create(); SetLength(DocumentLanguages, 2); DocumentLanguages[0] := 'eng'; DocumentLanguages[1] := 'deu'; StarDocs.DocOperations.ConverterDigitizerSettings.DocumentLanguages := DocumentLanguages; // The type of elements that need to be recognized and digitized. // Currently only text is supported StarDocs.DocOperations.ConverterDigitizerSettings.RecognizeElements := [retText]; // Which image enhancement techniques (if any) should be applied to // the input image before attempting to recognize the elements StarDocs.DocOperations.ConverterDigitizerSettings. ImageEnhancementSettings.ImageEnhancementMode := iemAuto; // Whether any skew correction should be performed (default is true) StarDocs.DocOperations.ConverterDigitizerSettings.SkewCorrection := True; // Convert to searchable PDF OutFiles := StarDocs.DocOperations.ConvertToPDF(LInFiles, nil, nil); DocObjectOutput := OutFiles[0]; // Do something with resultant document (DocObjectOutput) // ... finally // Free objects if Assigned(LOutFiles) then FreeAndNil(LOutFiles); if Assigned(LInFiles) then FreeAndNil(LInFiles); if Assigned(StarDocs) then FreeAndNil(StarDocs); end; end;
// Set up connection details StarDocs starDocs = new StarDocs( new ConnectionInfo( new java.net.URI("https://api.gnostice.com/stardocs/v1"), "<API Key>", "<API Secret>"), new Preferences( // Force full permissions on PDF files protected // with an permissions/owner/master password new DocPasswordSettings(true)) ); // Authenticate starDocs.auth.loginApp(); // Input file FileObject fileObjectInput = new FileObject("C:\\Documents\\Statement.pdf"); ArrayListfileObjectInputs = new ArrayList (Arrays.asList(new FileObject[] {fileObjectInput})); // Setup the digitizer settings ConverterDigitizerSettings digitizerSettings = new ConverterDigitizerSettings(); digitizerSettings.setDigitizationMode(DigitizationMode.AllImages); // Array of strings listing the languages of the text present in // the scanned document. "eng" is default. digitizerSettings.setDocumentLanguages(new String[] { "eng", "deu" }); // The type of elements that need to be recognized and digitized. // Currently only text is supported digitizerSettings.setRecognizeElements( EnumSet.of(RecognizableElementType.Text)); // Which image enhancement techniques (if any) should be applied to // the input image before attempting to recognize the elements digitizerSettings.getImageEnhancementSettings().setImageEnhancementMode( ImageEnhancementMode.Auto); // Whether any skew correction should be performed (default is true) digitizerSettings.setSkewCorrection(true); // Convert to searchable PDF outFiles = starDocs.docOperations.convertToPDF(inFiles, null, null, null, ConversionMode.ConvertToSeparateFiles, digitizerSettings); DocObject docObjectOutput = outFiles.get(0); // Do something with resultant document (docObjectOutput) // ...
That's it! This article showed how to use the Gnostice StarDocs Document Converter API to convert scanned documents to searchable PDF files.
---o0O0o---
Our .NET Developer Tools | |
---|---|
![]() Gnostice Document Studio .NETMulti-format document-processing component suite for .NET developers. |
![]() PDFOne .NETA .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications. |
Our Delphi/C++Builder developer tools | |
---|---|
![]() Gnostice Document Studio DelphiMulti-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms. |
![]() eDocEngine VCLA Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools. |
![]() PDFtoolkit VCLA Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents. |
Our Java developer tools | |
---|---|
![]() Gnostice Document Studio JavaMulti-format document-processing component suite for Java developers. |
![]() PDFOne (for Java)A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications. |
Our Platform-Agnostic Cloud and On-Premises APIs | |
---|---|
![]() StarDocsCloud-hosted and On-Premises REST-based document-processing and document-viewing APIs |
Privacy | Legal | Feedback | Newsletter | Blog | Resellers | © 2002-2025 Gnostice Information Technologies Private Limited. All rights reserved. |