Redaction of sensitive information in PDF documents

Version 4.3 of XFINIUM.PDF brings support for enhanced PDF redaction of text and images and support for redaction annotations.

Redaction is the process of removing sensitive information from a document.

XFINIUM.PDF library can redact both text and images from a PDF file. You define a region on the page and the text and images that fall inside the region will be removed. If an image falls only partially in a redaction area, only that area will be redacted.
Note: XFINIUM.PDF 4.3 cannot redact JPEG2000 images or images used as part of a brush that fills/strokes a path.

Redaction of PDF content is very simple with XFINIUM.PDF library. You create a content redactor object for that page you want to redact and then redact each region of the page.

If you want to redact multiple regions on a page it is recommended to use batch redaction because it yields better performance, the page content is parsed only once and the redactions are applied in a single step.

The code above causes the page content to be parsed and processed every time the RedactArea method is called and this can cause a performance problem is the page content is very large and complex and you want to redact multiple regions on the page.
The code below shows how to use batch redaction:

When redacting an area on the page you have the option to fill that area with a color or leave it blank. For text content this is not a problem because you can see the background that remains after removing the text. With images the situation is different. Images are redacted by setting the corresponding bits to 0. Depending on the colorspace used by the image you can get a solid black or a different color.
So if you want to make sure that all the redactions look the same then use a specific color to fill the redacted area.

Redaction annotations are rectangular annotations that define areas on the page that will be later redacted. You can create redaction annotations with XFINIUM.PDF and then have them redacted with Adobe Acrobat or vice-versa.
A redaction annotation can be created like this:

Many people believe that drawing an opaque rectangle over text will cause that text to be redacted. This is not true, the text is there, it is just covered. Many PDF editors let you remove graphic objects from the PDF file so this overlay rectangle can be removed. Encrypting the file does not help very much. Also the text still being in the document any PDF viewer will let you select and copy it.

XFINIUM.PDF does not just masks the content with an opaque rectangle, it truly removes the content from the PDF file.

4 thoughts on “Redaction of sensitive information in PDF documents”

  1. Hi, i’m trying to replace a snippet of text from a pdf. My idea is:

    – Search the text to replace.
    – Redact searched text area. Works fine and the text is removed.
    – DrawString with new text at the redacted area. In this step something goes wrong and all the page content disappear.

    My code look like that:

    PdfContentRedactor crText = new PdfContentRedactor(page);

    for (int i = 0; i < searchResults.Count; i++)
    double minX = double.MaxValue;
    double minY = double.MaxValue;

    double maxX = double.MinValue;
    double maxY = double.MinValue;

    PdfTextFragmentCollection tfc = searchResults[i].TextFragments;
    for (int j = 0; j < tfc.Count; j++)
    for (int frag = 0; frag tfc [j].FragmentCorners [frag].X)
    minX = tfc [j].FragmentCorners [frag].X;

    if (maxX tfc [j].FragmentCorners [frag].Y)
    minY = tfc [j].FragmentCorners [frag].Y;

    if (maxY < tfc [j].FragmentCorners [frag].Y)
    maxY = tfc [j].FragmentCorners [frag].Y;

    crText.RedactArea(new PdfVisualRectangle(minX, minY, maxX - minX, maxY - minY));

    PdfStringAppearanceOptions sao = new PdfStringAppearanceOptions();
    sao.Brush = brush;
    sao.Font = font;

    PdfStringLayoutOptions slo = new PdfStringLayoutOptions();
    slo.HorizontalAlign = PdfStringHorizontalAlign.Center;
    slo.VerticalAlign = PdfStringVerticalAlign.Bottom;
    slo.X = minX;
    slo.Y = maxY;
    slo.Width = maxX - minX;

    page.Graphics.DrawString ("New text", sao, slo);

    Any ideas?
    Thanks in advance.

      1. Thanks for your reply.

        Yesterday at last i found the problem. I was generating de pdf and doing de redaction over the same pdf file. That fails.

        If i open a previously generated pdf file and try to redact and then DrawString the new text it works fine.

        However, is there a better solution to do this? It is, open a pdf template with some fields that i have to replace o fill, like name, phone number, etc. I try with forms fields, but it isn’t valid valid because the pdf template is generated with Microsoft Word and i have no idea how to include this form fields.

        Thank you very much!

      2. The best solution for this scenario is to use form fields. Design the template in Microsoft Word, save it as PDF, then load it with Adobe Acrobat and add the fields. If Adobe Acrobat is not available you can use your code above but instead of drawing text over the redacted zone you create a textbox form field at that location. You run the code only once to create the initial template. Then at application runtime you use the template and fill the form fields with actual data.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.