Create and extract PDF optional content in PDF documents in .NET

Today we released XFINUM.PDF 3.4 which brings support for pdf optional content extraction. With this new version we include full support for manipulating optional content in PDF files, whether it is optional content creation or extraction.

PDF specification describes optional content as “sections of content in a PDF document that can be selectively viewed or hidden by document authors or consumers. This capability is useful in items such as CAD drawings, layered artwork, maps, and multi-language documents.”.

An optional content section is started in the page graphics using the BeginOptionalContentGroup method and it ends with a call to EndOptionalContentGroup method. The BeginOptionalContentGroup method receives as parameter a PdfOptionalContentGroup object, thus associating the section of content in the page graphics with the optional content group. The optional content group object specifies the name, visibility, print and export states. Also optional content group objects can be locked or not.

A simple optional content group is created like this:

Multipart PDF optional content groups

PDF specification allows to create optional content groups that span over multiple content sections (non-contiguous optional content groups) and XFINIUM.PDF supports this feature. By using the same optional content group with multiple BeginOptionalContentGroup method calls all the content sections are linked to the same optional content group.

The code below shows how to create an optional content group that consists of 2 content sections with some other content between them.

Nested PDF optional content groups

Optional content groups support multiple levels of nesting (layers and sub-layers). This feature is achieved by nesting multiple BeginOptionalContentGroup method calls. When optional content groups are nested, the outer group visibility affects the inner groups visibility, so that if the outer group is hidden then the inner groups are hidden, but if the outer group is visible the visibility of the inner groups is dictated by their attributes.

Nested optional content groups can be created like this:

 Multi-page PDF optional content groups

Another interesting feature of optional content groups is that they can span over multiple pages. This is implemented by associating a single optional content group object with multiple content sections over multiple pages. In this way you can show/hide content sections over multiple pages in a single operation.

 PDF Optional content visual tree

PDF specification defines several structures for displaying the structure of optional content in a PDF document using a visual tree where each node corresponds to an optional content group. This visual tree does not necessary show the optional content structure as it exists physically in the PDF document, the visual tree can be built to show the author’s view over the optional content structure. For example nested optional content groups can be shown as sibling nodes in the tree or sibling optional content groups in the page content can be shown as parent-child nodes in the tree.

Also you can choose not to show some optional content groups in the tree. This technique is used when you want content to appear only when the document is printed (a watermark for example) and you do not want the end-user to change the print status of the optional content group.

The optional content visual tree is built by creating first a PdfOptionalContentProperties object and attaching it to document’s OptionalContentProperties property. The DisplayTree property of the PdfOptionalContentProperties object represents the optional content visual tree. You create a PdfOptionalContentDisplayTreeNode object for each optional content group you want to appear in the tree and add the node object to the tree. Each node object has its Nodes property that specifies the children nodes of that node.

The last lines of code in each of the code sections above show how to create the optional content visual tree for different scenarios.

PDF Optional content extraction

As it was said above an optional content group is a section of page content. This section of page content is not self-defined in terms of graphics properties, content before it affects its content and its content also affects the content that follows.

When a page with optional content is displayed, the optional content is still executed even if the optional content is not visible. For example the stroke color can be set inside the optional content and it affects the content after the optional content even if the optional content is not visible. In the same way stroke color can be set before the optional content starts and it will be used by the optional content. This makes optional content extraction quite difficult because the content in the group is affected by content outside it. Just extracting the content that appears between BeginOptionalContentGroup and EndOptionalContentGroup methods will work only in a small number of situations.

We managed to implement optional content extraction that works with almost any kind of optional content groups. We implemented an analyzer of the content outside the group so that when the optional content is extracted we create all the additional graphic properties that affect the group and the group is properly displayed when drawn on a new page.

Optional content can be extracted in 2 ways:

  1. using the PdfFile.ExtractPageOptionalContent(int pageNumber, string ocgName) method which extracts only the required objects from the Pdf file or
  2. if you already have the page in your application as a PdfPage object, you can create a PdfContentExtractor object for the page and call the PdfContentExtractor.ExtractPageOptionalContent(string ocgName) method.

The optional content group is extracted as a PdfPageOptionalContent object. The PdfPageOptionalContent class inherits from PdfFormXObject so the extracted optional content group can be drawn on another page using the PdfGraphics.DrawFormXObject method.

All the code above is included in our Samples Explorer applications, Optional Content and Optional Content Extraction samples, available for download in our Samples page.

Leave a Reply