PDFOne (for Java)
Create, edit, view, print & enhance PDF documents and forms in Java SE/EE
Compatibility
J2SE J2EE Windows Linux Mac (OS X)

PDF Text Redaction Using PDFOne (for Java)

Removing unwanted text from a document.
By V. Subhash

In Version 4 of PDFOne, you will find a new method redactText() in the PdfDocument class. There are four overloads for this method. This method allows you to redact all instances of a specified text in a single page or in a specified set of pages. You also have the option to stroke and fill the redacted region. The search text can be a simple text string or it can also be a regular expression. Text redaction removes the text from the page's contents. The space occupied by the text is left intact. In this article, you will learn how these text redaction methods work.

Text Redaction Methods

void 	redactText(int pageNum,          // number of the page
                   String searchString,  // text that needs to be redacted
                   int searchMode,       // is it a text literal or a regular expression
                   int searchOptions     // other search options
                  )

void 	redactText(int pageNum,
                   String searchString,
                   int searchMode,
                   int searchOptions,
                   PdfPen pen,           // pen used to stroke the redacted region
                   PdfBrush brush,       // brush used to fill the redacted region
                   boolean isStroke,     // whether to outline the region using the pen
                   boolean isFill        // whether to fill the region using the brush
                   )

void 	redactText(String pageRange,     // pages where text needs to be redacted
                   String searchString,
                   int searchMode,
                   int searchOptions)

void 	redactText(String pageRange,
                   String searchString,
                   int searchMode,
                   int searchOptions,
                   PdfPen pen,
                   PdfBrush brush,
                   boolean isStroke,
                   boolean isFill)

Simple Text Redaction

Here is a simple code snippet that shows how to perform text redaction on a PDF document. It seeks to redact all instance of the word "gnostice" from pages 1 and 2 of the loaded document.

import java.awt.Color;
import java.io.IOException;

import com.gnostice.pdfone.PDFOne;
import com.gnostice.pdfone.PdfDocument;
import com.gnostice.pdfone.PdfException;
import com.gnostice.pdfone.PdfSearchMode;
import com.gnostice.pdfone.PdfSearchOptions;
import com.gnostice.pdfone.graphics.PdfBrush;
import com.gnostice.pdfone.graphics.PdfPen;

public class Text_Redaction_Demo
{
    public static void main(String[] args) throws IOException, PdfException, Exception {
        
        // Create brush for fill the redacted regions
        PdfBrush pbRedactBrush = new PdfBrush();
        pbRedactBrush.fillColor = Color.YELLOW;
        
        // Create pen to stroke the redacted regions
        PdfPen pnRedactPen = new PdfPen();
        pnRedactPen.strokeColor = Color.MAGENTA;

        // Load a PDF document
        PdfDocument doc = new PdfDocument();
        doc.load("sample.pdf");       
        
        // Redact all instance of the text "gnostice" in pages 1 and 2
        doc.redactText("1-2",
                      "gnostice", 
                      PdfSearchMode.LITERAL, 
                      PdfSearchOptions.NONE, 
                      pnRedactPen, 
                      pbRedactBrush, 
                      true, 
                      true);
        
        // Save the redacted document to specified file
        doc.setOpenAfterSave(true);
        doc.save("redacted_doc.pdf");        
        doc.close();      
    }       
}

To test this code snippet, we used this sample document.

Sample Input PDF Document

Here is it is after redaction was performed.

Redacted Version

Text Redaction Using A Regular Expression

PDFOne supports regular expressions when you wish to do text redaction. When you use a regular expression instead of a simple text search string, you need to set the searchMode parameter of the redactText() method to the constant PdfSearchMode.REGEX. (Earlier, for a simple text search, we used the constant PdfSearchMode.LITERAL. To find URL, I am using a simple regular expression - "http://{1}\\S+". Here is the code that performs text redaction using the regular expression.

import java.awt.Color;
import java.io.IOException;

import com.gnostice.pdfone.PDFOne;
import com.gnostice.pdfone.PdfDocument;
import com.gnostice.pdfone.PdfException;
import com.gnostice.pdfone.PdfSearchMode;
import com.gnostice.pdfone.PdfSearchOptions;
import com.gnostice.pdfone.graphics.PdfBrush;
import com.gnostice.pdfone.graphics.PdfPen;

public class Advanced_Text_Redaction_Demo
{

    public static void main(String[] args) throws IOException, PdfException, Exception {

        // Create brush for fill the redacted regions
        PdfBrush pbRedactBrush = new PdfBrush();
        pbRedactBrush.fillColor = Color.BLACK;

        // Create pen to stroke the redacted regions
        PdfPen pnRedactPen = new PdfPen();
        pnRedactPen.strokeColor = Color.WHITE;

        // Load a PDF document
        PdfDocument doc = new PdfDocument();
        doc.load("sample_doc2.pdf");

        // Redact all URLs page 2
        doc.redactText(2,
                       "http://{1}\\S+",
                       PdfSearchMode.REGEX,
                       PdfSearchOptions.NONE,
                       pnRedactPen,
                       pbRedactBrush,
                       true,
                       true);

        // Save the redacted document to specified file
        doc.setOpenAfterSave(true);
        doc.save("redacted_doc.pdf");
        doc.close();
    }
}

To test this code snippet, I used a document that had lots of URLs. Here is page 2 of that document.

Original PDF Document

Here is how the URLs got redacted.

Redacted PDF Document

---o0O0o---

Our .NET Developer Tools
Gnostice Document Studio .NET

Multi-format document-processing component suite for .NET developers.

PDFOne .NET

A .NET PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, and bookmark PDF documents in .NET applications.

Our Delphi/C++Builder developer tools
Gnostice Document Studio Delphi

Multi-format document-processing component suite for Delphi/C++Builder developers, covering both VCL and FireMonkey platforms.

eDocEngine VCL

A Delphi/C++Builder component suite for creating documents in over 20 formats and also export reports from popular Delphi reporting tools.

PDFtoolkit VCL

A Delphi/C++Builder component suite to edit, enhance, view, print, merge, split, encrypt, annotate, and bookmark PDF documents.

Our Java developer tools
Gnostice Document Studio Java

Multi-format document-processing component suite for Java developers.

PDFOne (for Java)

A Java PDF component suite to create, edit, view, print, reorganize, encrypt, annotate, bookmark PDF documents in Java applications.

Our Platform-Agnostic Cloud and On-Premises APIs
StarDocs

Cloud-hosted and On-Premises REST-based document-processing and document-viewing APIs

Privacy | Legal | Feedback | Newsletter | Blog | Resellers © 2002-2024 Gnostice Information Technologies Private Limited. All rights reserved.