Simple HTML to PDF conversion 23

The arrival of formatted content in XFINIUM.PDF 4.4 brings the possibility of implementing simple HTML to PDF conversion using XFINIUM.PDF library.

Formatted content lets you create complex text layouts on a PDF page combining paragraphs, text blocks with various fonts and colors, links, bullet lists. However creating a complex layout can require a lot of code.
Wouldn’t it be simpler to have the content described using a markup language such as HTML?

This article shows how to parse an HTML fragment (actually XHTML since it uses the XML parser included in .NET), create the corresponding formatted content objects and draw them on the page. The sample implements only a few HTML tags for basic text formatting, but more tags can be added (full HTML to PDF conversion is not possible because not all HTML tags can be translated into formatted content objects).

The mains sample method is
public PdfFixedDocument Convert(Stream html)
which takes the HTML in the given stream and converts it to a PdfFixedDocument object.

This method has 2 parts, the conversion of HTML content to a PdfFormattedContent object and the rendering of the PdfFormattedContent object on document’s pages.

The ConvertHtmlToFormattedContent uses the XmlReader class to parse the HTML content. For each supported tag the corresponding objects are created or properties are set. A stack of fonts and colors is used for keeping track of current font and color. The supported tags in the sample are: p, font, a, b, strong, i, em, u, ul, li but the sample can be extended with other tags (h1, h2, code, span, etc).
The source code of this method is quite long to be posted here but the sample project is available for download.

The DrawFormattedContent method splits the formatted content over multiple pages and draws them.

The page margins are set to half an inch. From the initial formatted content the part that fits the given box is extracted and drawn on the page. The procedure is repeated till no more formatted content is available.

The full sample project can be downloaded here. It is a Windows console application but the SimpleHtmlToPdf.cs file which contains all the conversion logic can be compiled on any supported platform.

23 thoughts on “Simple HTML to PDF conversion

  1. Reply Jim Jul 19,2014 12:00 pm

    Trying this on Xamarin for Android. There seems to be a problem in the SplitByBox method. The height I give the method seems to split too soon…if I multiply the height by a factor of 1.8 it appears to work.

    • Reply xfinium.pdf Jul 21,2014 7:33 am

      Please send us a sample project. It will help us investigate the problem because it depends very much on the HTML text you use and the values for the split box.

  2. Reply Sanya Feb 9,2015 10:28 pm

    I want to try render HTML table, but I can’t understand how to add lines to PdfFormattedContent object. Maybe some examples available?

    • Reply xfinium.pdf Feb 10,2015 9:48 am

      The PdfFormattedContent object cannot draw lines. In theory you would have to handle each cell as a PdfFormattedContent object and draw each one separately. Support for tables will be available during the following months.

  3. Reply christian massironi May 13,2015 6:05 am

    Is it possible to use the font size attribute? If yes, how? I’ve tried size=”18″ but is not working.

  4. Reply Dnyanesh May 28,2015 8:49 am

    How to convert html to pdf in xamarin.forms

    • Reply xfinium.pdf May 28,2015 8:58 am

      The code shown in the article also works in Xamarin.Forms, the XFINIUM.PDF API is the same across all supported platforms.
      The article shows how to implement conversion of simple HTML tags to PDF, it is not intended to convert any HTML page to PDF.

  5. Reply Ha Duyen Hoa Jun 10,2015 3:57 pm

    hello,

    I have to draw a long string to my PdfPage. Then, I have also to draw a box outside this text. My problem is: when I use PdfFormattedTextBlock & PdfFormattedParagraph to draw text (by set the right font and color), the method .SplitByBox() does not work. The text is truncated in the screen.
    Here is my code:

    //I have a PdfFixedDocument and a PdfPage added to that document
    //PdfFixedDocument pdfDoc, PdfPage currentPage

    var fc = new PdfFormattedContent();
    var paragraph = new PdfFormattedParagraph ();
    fc.Paragraphs.Add (paragraph);

    //add textblock
    var textFont = new PdfStandardFont();
    textFont.Size = 20;

    string text = “a very long string here …”

    var textBlock = new PdfFormattedTextBlock (text, textFont);
    paragraph.Blocks.Add (textBlock); //add textblock to paragraph

    PdfFormattedContent fragment = fc.SplitByBox(300,20); //here, the fragment is not null but fragment.Paragraphs is empty

    //display the first fragment (just for testing).
    currentPage.Graphics.DrawFormattedContent (fragment, 40, 20); //I see nothing in the pdf file.

  6. Reply Ha Duyen Hoa Jun 11,2015 8:41 am

    I’ve sent it. Thanks for your support.

  7. Reply Beate Sep 16,2015 12:51 pm

    Hi,
    I use the SplitByBox method to split the formatted content on to several pages. Is it possible to get some lines to stay on the same page? I have name on one line and title on the next, and I don’t want these lines to split on different pages.

    I now create one paragraph with one textblock inside for both name and title and add the paragraphs to the formatted content.

    • Reply xfinium.pdf Sep 17,2015 2:05 pm

      At this moment we do not support this feature. We plan to add support for this feature (keep 2 or more paragraphs on the same page) in the near future.

  8. Reply Beate Sep 23,2015 12:28 pm

    Thank you for your answer!

    I have another problem. I want to save my pdf-document as PdfAFormat.PdfA1b. When I do, I get the message “{“Page 0: Page content uses CMYK colors but the document Output Profile is not set to CMYK.”}”. I tried to change to Rgb, but I got the same exception.

    How/where can I set the output profile for my document? I have tried the example code I found here:http://www.xfiniumpdf.com/samples/xfinium-pdf-samples-explorer-aspnet-mvc/ pdf/a, but Adobe will not open the generated document, so something must be wrong.

    • Reply xfinium.pdf Sep 23,2015 12:41 pm

      The PDF/A sample shows how to set an output profile on a document. The profile used in the sample is RGB so you have to use also RGB colors in the document. If you could send us (support@xfiniumpdf.com) a sample project that we can run it would help us identify the problem and give you a solution.

  9. Reply leodc Feb 9,2017 10:50 am

    How to convert html tables to pdf?

    • Reply xfinium.pdf Feb 9,2017 11:38 am

      The code in the article is simple and it uses only the PdfFromattedContent object which does not support tables. We’re working to update the code to use the FlowDocument API which supports a more flexible layout including tables.

  10. Reply ICGC Sep 6,2017 4:54 pm

    How to convert HTML with SVG to pdf?

  11. Reply Krzysztof Brozek Sep 22,2017 10:01 am

    Is thre a chance for a simple Convert method which will take html(including css, tables, images and anything possible in html) and produce a nice PDF document? Right now conversion is limited to simple tags.

    • Reply xfinium.pdf Sep 22,2017 2:21 pm

      The conversion code is provided as source code so that it can be extended as needed. We plan to update it in the future to support tables and other tags but full HTML to PDF conversion is a long road.

Leave a Reply