The arrival of formatted content in XFINIUM.PDF 4.4 brings the possibility of implementing simple HTML to PDF conversion using XFINIUM.PDF library.
Formatted content lets you create complex text layouts on a PDF page combining paragraphs, text blocks with various fonts and colors, links, bullet lists. However creating a complex layout can require a lot of code.
Wouldn’t it be simpler to have the content described using a markup language such as HTML?
This article shows how to parse an HTML fragment (actually XHTML since it uses the XML parser included in .NET), create the corresponding formatted content objects and draw them on the page. The sample implements only a few HTML tags for basic text formatting, but more tags can be added (full HTML to PDF conversion is not possible because not all HTML tags can be translated into formatted content objects).
The mains sample method is
public PdfFixedDocument Convert(Stream html)
which takes the HTML in the given stream and converts it to a PdfFixedDocument object.
This method has 2 parts, the conversion of HTML content to a PdfFormattedContent object and the rendering of the PdfFormattedContent object on document’s pages.
public PdfFixedDocument Convert(Stream html) { PdfFixedDocument document = new PdfFixedDocument(); PdfFormattedContent fc = ConvertHtmlToFormattedContent(html); DrawFormattedContent(document, fc); return document; }
Public Function Convert(html As Stream) As PdfFixedDocument Dim document As New PdfFixedDocument() Dim fc As PdfFormattedContent = ConvertHtmlToFormattedContent(html) DrawFormattedContent(document, fc) Return document End Function
The ConvertHtmlToFormattedContent
uses the XmlReader
class to parse the HTML content. For each supported tag the corresponding objects are created or properties are set. A stack of fonts and colors is used for keeping track of current font and color. The supported tags in the sample are: p, font, a, b, strong, i, em, u, ul, li but the sample can be extended with other tags (h1, h2, code, span, etc).
The source code of this method is quite long to be posted here but the sample project is available for download.
The DrawFormattedContent
method splits the formatted content over multiple pages and draws them.
private void DrawFormattedContent(PdfFixedDocument document, PdfFormattedContent fc) { double leftMargin, topMargin, rightMargin, bottomMargin; leftMargin = topMargin = rightMargin = bottomMargin = 36; PdfPage page = document.Pages.Add(); PdfFormattedContent fragment = fc.SplitByBox(page.Width - leftMargin - rightMargin, page.Height - topMargin - bottomMargin); while (fragment != null) { page.Graphics.DrawFormattedContent(fragment, leftMargin, topMargin, page.Width - leftMargin - rightMargin, page.Height - topMargin - bottomMargin); page.Graphics.CompressAndClose(); fragment = fc.SplitByBox(page.Width - leftMargin - rightMargin, page.Height - topMargin - bottomMargin); if (fragment != null) { page = document.Pages.Add(); } } }
Private Sub DrawFormattedContent(document As PdfFixedDocument, fc As PdfFormattedContent) Dim leftMargin As Double = 36 Dim topMargin As Double = 36 Dim rightMargin As Double = 36 Dim bottomMargin As Double = 36 Dim page As PdfPage = document.Pages.Add() Dim fragment As PdfFormattedContent = fc.SplitByBox(page.Width - leftMargin - rightMargin, page.Height - topMargin - bottomMargin) While fragment IsNot Nothing page.Graphics.DrawFormattedContent(fragment, leftMargin, topMargin, page.Width - leftMargin - rightMargin, page.Height - topMargin - bottomMargin) page.Graphics.CompressAndClose() fragment = fc.SplitByBox(page.Width - leftMargin - rightMargin, page.Height - topMargin - bottomMargin) If fragment IsNot Nothing Then page = document.Pages.Add() End If End While End Sub
The page margins are set to half an inch. From the initial formatted content the part that fits the given box is extracted and drawn on the page. The procedure is repeated till no more formatted content is available.
The full sample project can be downloaded here. It is a Windows console application but the SimpleHtmlToPdf.cs file which contains all the conversion logic can be compiled on any supported platform.
Trying this on Xamarin for Android. There seems to be a problem in the SplitByBox method. The height I give the method seems to split too soon…if I multiply the height by a factor of 1.8 it appears to work.
Please send us a sample project. It will help us investigate the problem because it depends very much on the HTML text you use and the values for the split box.
I want to try render HTML table, but I can’t understand how to add lines to PdfFormattedContent object. Maybe some examples available?
The PdfFormattedContent object cannot draw lines. In theory you would have to handle each cell as a PdfFormattedContent object and draw each one separately. Support for tables will be available during the following months.
Is it possible to use the font size attribute? If yes, how? I’ve tried size=”18″ but is not working.
Sorry. I’ve just found the way in your html example.
How to convert html to pdf in xamarin.forms
The code shown in the article also works in Xamarin.Forms, the XFINIUM.PDF API is the same across all supported platforms.
The article shows how to implement conversion of simple HTML tags to PDF, it is not intended to convert any HTML page to PDF.
hello,
I have to draw a long string to my PdfPage. Then, I have also to draw a box outside this text. My problem is: when I use PdfFormattedTextBlock & PdfFormattedParagraph to draw text (by set the right font and color), the method .SplitByBox() does not work. The text is truncated in the screen.
Here is my code:
//I have a PdfFixedDocument and a PdfPage added to that document
//PdfFixedDocument pdfDoc, PdfPage currentPage
var fc = new PdfFormattedContent();
var paragraph = new PdfFormattedParagraph ();
fc.Paragraphs.Add (paragraph);
//add textblock
var textFont = new PdfStandardFont();
textFont.Size = 20;
string text = “a very long string here …”
var textBlock = new PdfFormattedTextBlock (text, textFont);
paragraph.Blocks.Add (textBlock); //add textblock to paragraph
PdfFormattedContent fragment = fc.SplitByBox(300,20); //here, the fragment is not null but fragment.Paragraphs is empty
//display the first fragment (just for testing).
currentPage.Graphics.DrawFormattedContent (fragment, 40, 20); //I see nothing in the pdf file.
Can you send a sample project to support at xfiniumpdf.com? It will help us investigate this problem.
I don’t know how to upload file to your site. I have the .zip of my sample project. Its size is < 200KB
Or do you have any email to receive this file?
Please send it to support@xfiniumpdf.com
I’ve sent it. Thanks for your support.
Hi,
I use the SplitByBox method to split the formatted content on to several pages. Is it possible to get some lines to stay on the same page? I have name on one line and title on the next, and I don’t want these lines to split on different pages.
I now create one paragraph with one textblock inside for both name and title and add the paragraphs to the formatted content.
At this moment we do not support this feature. We plan to add support for this feature (keep 2 or more paragraphs on the same page) in the near future.
Thank you for your answer!
I have another problem. I want to save my pdf-document as PdfAFormat.PdfA1b. When I do, I get the message “{“Page 0: Page content uses CMYK colors but the document Output Profile is not set to CMYK.”}”. I tried to change to Rgb, but I got the same exception.
How/where can I set the output profile for my document? I have tried the example code I found here:http://www.xfiniumpdf.com/samples/xfinium-pdf-samples-explorer-aspnet-mvc/ pdf/a, but Adobe will not open the generated document, so something must be wrong.
The PDF/A sample shows how to set an output profile on a document. The profile used in the sample is RGB so you have to use also RGB colors in the document. If you could send us (support@xfiniumpdf.com) a sample project that we can run it would help us identify the problem and give you a solution.
How to convert html tables to pdf?
The code in the article is simple and it uses only the PdfFromattedContent object which does not support tables. We’re working to update the code to use the FlowDocument API which supports a more flexible layout including tables.
How to convert HTML with SVG to pdf?
Conversion of full HTML with SVG to PDF is not supported.
Is thre a chance for a simple Convert method which will take html(including css, tables, images and anything possible in html) and produce a nice PDF document? Right now conversion is limited to simple tags.
The conversion code is provided as source code so that it can be extended as needed. We plan to update it in the future to support tables and other tags but full HTML to PDF conversion is a long road.
Can you provide updated code with image, tables, css tags?
The updated sample is not available yet.