How to convert HTML to PDF with Azure Functions and wkhtmltopdf

Programming

In this article, we will use Azure Functions and wkhtmltopdf tool to create PDF files from HTML files.

You might want to create a PDF file for a variety of reasons, such as generating sales invoices, medical reports for your patients, insurance forms for your customers, etc. And there are several ways to do this. this.

Firstly, you can use AdobeFill and sign tool to fill out forms. But this mostly requires human interaction and so it’s not scalable or convenient.

The second option is that you create a PDF file directly. Based on the platform you’re working on, you’ll have the tools to do this. If it’s a very simple PDF you can do it this way.

Which brings us to our final and most convenient option, wkhtmltopdf. This is a really great tool that allows you to convert HTML to PDF. Since it’s free, open source, and can be compiled for almost any platform, it’s our best choice.

Prerequisites

  • VS Code Editor is installed
  • An account on Azure Gateway
  • Linux Basic Application Service Pack (B1). If you already have a Windows Basic Application Service Pack (B1), you can use it.
  • Azure Storage Account.

How to use Azure functions

Since converting HTML to PDF is a time consuming job, we shouldn’t run it on our main web server. Otherwise, it may start blocking other important requests. Azure Functions is the best way to delegate such tasks.

To create a function, you need to install Azure Functions on your computer first. Based on your operating system, install Azure Functions Core Tools.

Once installed, open your command line tool to activate the command below. html2pdf is the name of the project here, but you can replace it with any name.

func init html2pdf

When you execute the command, it will ask for the worker runtime. Here choose option 1, dotnet as it is a Microsoft product and provides great support for dotnet.

This will create a folder name html2pdf in your current directory. Since Visual Studio Code allows direct publishing to Azure Functions, we will use it to code and deploy.

After you open your project in VS Code, create a file named Html2Pdf.cs. Azure Functions offers many types cause to execute the function. Now we’ll start with an HTTP trigger, which is a function that can be called directly over the HTTP protocol.

In our newly created file, paste the below content:

using System;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;

namespace Html2Pdf
{
    public class Html2Pdf
    {
        // The name of the function
        [FunctionName("Html2Pdf")]
        
        // The first arugment tells that the functions can be triggerd by a POST HTTP request. 
        // The second argument is mainly used for logging information, warnings or errors
        public void Run([HttpTrigger(AuthorizationLevel.Function, "POST")] Html2PdfRequest Request, ILogger Log)
        {
        }
    }
}

We have created the skeleton and now we will fill in the details. As you may have noticed the required variable type is Html2PdfRequest. So let’s make a model Html2PdfRequest.cs class as below:

namespace Html2Pdf
{
    public class Html2PdfRequest
    {
        // The HTML content that needs to be converted.
        public string HtmlContent { get; set; }
      
        // The name of the PDF file to be generated
        public string PDFFileName { get; set; }
    }
}

How to add DinkToPdf to your project

To call wkhtmltopdf from our managed code, we use a technology called P/Invoke.

Briefly P / invite allows us to access structs, callbacks and functions in unmanaged libraries. There is a nice P/Invoke wrapper named DinkToPdf that allows us to abstract away the technical features.

You can add DinkToPdf to your project via nuget. Just run the command from your root directory.

dotnet add package DinkToPdf --version 1.0.8

Time to add some code at the top of our class Html2Pdf:

// Read more about converter on: https://github.com/rdvojmoc/DinkToPdf
// For our purposes we are going to use SynchronizedConverter
IPdfConverter pdfConverter = new SynchronizedConverter(new PdfTools());

// A function to convert html content to pdf based on the configuration passed as arguments
// Arguments:
// HtmlContent: the html content to be converted
// Width: the width of the pdf to be created. e.g. "8.5in", "21.59cm" etc.
// Height: the height of the pdf to be created. e.g. "11in", "27.94cm" etc.
// Margins: the margis around the content
// DPI: The dpi is very important when you want to print the pdf.
// Returns a byte array of the pdf which can be stored as a file
private byte[] BuildPdf(string HtmlContent, string Width, string Height, MarginSettings Margins, int? DPI = 180)
{
  // Call the Convert method of SynchronizedConverter "pdfConverter"
  return pdfConverter.Convert(new HtmlToPdfDocument()
            {
                // Set the html content
                Objects =
                {
                    new ObjectSettings
                    {
                        HtmlContent = HtmlContent
                    }
                },
                // Set the configurations
                GlobalSettings = new GlobalSettings
                {
                    // PaperKind.A4 can also be used instead PechkinPaperSize
                    PaperSize = new PechkinPaperSize(Width, Height),
                    DPI = DPI,
                    Margins = Margins
                }
            });
}

I’ve added inline comments so the code is self-explanatory. If you have any questions, you can ask me on Twitter. Let’s call the function created above from Run method.

// PDFByteArray is a byte array of pdf generated from the HtmlContent 
var PDFByteArray = BuildPdf(Request.HtmlContent, "8.5in", "11in", new MarginSettings(0, 0, 0,0));

Once the byte array is created, store it as a blob in Azure Storage. Before you upload the blob, make sure you’ve created a container. After you do that, add below code after PDFByteArray.

// The connection string of the Storage Account to which our PDF file will be uploaded
// Make sure to replace with your connection string.
var StorageConnectionString = "DefaultEndpointsProtocol=https;AccountName=<YOUR ACCOUNT NAME>;AccountKey=<YOUR ACCOUNT KEY>;EndpointSuffix=core.windows.net";

// Generate an instance of CloudStorageAccount by parsing the connection string
var StorageAccount = CloudStorageAccount.Parse(StorageConnectionString);

// Create an instance of CloudBlobClient to connect to our storage account
CloudBlobClient BlobClient = StorageAccount.CreateCloudBlobClient();

// Get the instance of CloudBlobContainer which points to a container name "pdf"
// Replace your own container name
CloudBlobContainer BlobContainer = BlobClient.GetContainerReference("pdf");

// Get the instance of the CloudBlockBlob to which the PDFByteArray will be uploaded
CloudBlockBlob Blob = BlobContainer.GetBlockBlobReference(Request.PDFFileName);

// Upload the pdf blob
await Blob.UploadFromByteArrayAsync(PDFByteArray, 0, PDFByteArray.Length);

You will see some errors and warnings after adding this code. To fix those, add the missing import statements first. Second, change the return type from void arrive async Task give Run Constan. Here’s what’s in the end Html2Pdf.cs the file will look like this:

using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.Extensions.Logging;
using DinkToPdf;
using IPdfConverter = DinkToPdf.Contracts.IConverter;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Threading.Tasks;

namespace Html2Pdf
{
    public class Html2Pdf
    {
        // Read more about converter on: https://github.com/rdvojmoc/DinkToPdf
        // For our purposes we are going to use SynchronizedConverter
        IPdfConverter pdfConverter = new SynchronizedConverter(new PdfTools());

        // A function to convert html content to pdf based on the configuration passed as arguments
        // Arguments:
        // HtmlContent: the html content to be converted
        // Width: the width of the pdf to be created. e.g. "8.5in", "21.59cm" etc.
        // Height: the height of the pdf to be created. e.g. "11in", "27.94cm" etc.
        // Margins: the margis around the content
        // DPI: The dpi is very important when you want to print the pdf.
        // Returns a byte array of the pdf which can be stored as a file
        private byte[] BuildPdf(string HtmlContent, string Width, string Height, MarginSettings Margins, int? DPI = 180)
        {
            // Call the Convert method of SynchronizedConverter "pdfConverter"
            return pdfConverter.Convert(new HtmlToPdfDocument()
            {
                // Set the html content
                Objects =
                {
                    new ObjectSettings
                    {
                        HtmlContent = HtmlContent
                    }
                },
                // Set the configurations
                GlobalSettings = new GlobalSettings
                {
                    // PaperKind.A4 can also be used instead of width & height
                    PaperSize = new PechkinPaperSize(Width, Height),
                    DPI = DPI,
                    Margins = Margins
                }
            });
        }

        // The name of the function
        [FunctionName("Html2Pdf")]

        // The first arugment tells that the functions can be triggerd by a POST HTTP request. 
        // The second argument is mainly used for logging information, warnings or errors
        public async Task Run([HttpTrigger(AuthorizationLevel.Function, "POST")] Html2PdfRequest Request, ILogger Log)
        {
            // PDFByteArray is a byte array of pdf generated from the HtmlContent 
            var PDFByteArray = BuildPdf(Request.HtmlContent, "8.5in", "11in", new MarginSettings(0, 0, 0, 0));

            // The connection string of the Storage Account to which our PDF file will be uploaded
            var StorageConnectionString = "DefaultEndpointsProtocol=https;AccountName=<YOUR ACCOUNT NAME>;AccountKey=<YOUR ACCOUNT KEY>;EndpointSuffix=core.windows.net";
            
            // Generate an instance of CloudStorageAccount by parsing the connection string
            var StorageAccount = CloudStorageAccount.Parse(StorageConnectionString);

            // Create an instance of CloudBlobClient to connect to our storage account
            CloudBlobClient BlobClient = StorageAccount.CreateCloudBlobClient();

            // Get the instance of CloudBlobContainer which points to a container name "pdf"
            // Replace your own container name
            CloudBlobContainer BlobContainer = BlobClient.GetContainerReference("pdf");
            
            // Get the instance of the CloudBlockBlob to which the PDFByteArray will be uploaded
            CloudBlockBlob Blob = BlobContainer.GetBlockBlobReference(Request.PDFFileName);
            
            // Upload the pdf blob
            await Blob.UploadFromByteArrayAsync(PDFByteArray, 0, PDFByteArray.Length);
        }
    }
}

This concludes the coding portion of this tutorial!

How to add wkhtmltopdf to your project

We still need to add the wkhtmltopdf library to our project. There are some caveats when you choose a particular Azure App Plan. Based on the plan we will have to get the wkhtmltopdf library.

For our purposes, we chose the Linux Basic Application Service Pack (B1) because the Windows Basic Application Service Pack (B1) is five times more expensive.

At the time of this writing, Azure App Service Plan is using Debian 10 with amd64 architecture. Lucky for us, DinkToPdf offers precompiled library for Linux, Windows and MacOS.

Download the .so library for Linux and place it in your project’s root directory. I’m working on MacOS so I also downloaded libwkhtmltox.dylib.

If you are using Windows or if you have hosted Azure Functions on a Windows App Service Pack, you must download libwkhtmltox.dll. This is how our project structure will now look:

Project structure

When creating a build, we need to include the .so library. To do that, open your csproj file and add below content in ItemGroup.

<None Update="./libwkhtmltox.so">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    <CopyToPublishDirectory>Always</CopyToPublishDirectory>
</None>

Here is the entire csproj file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
    <AzureFunctionsVersion>v3</AzureFunctionsVersion>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="DinkToPdf" Version="1.0.8" />
    <PackageReference Include="Microsoft.NET.Sdk.Functions" Version="3.0.11" />
  </ItemGroup>
  <ItemGroup>
    <None Update="host.json">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </None>
    <None Update="local.settings.json">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
      <CopyToPublishDirectory>Never</CopyToPublishDirectory>
    </None>
    <None Update="./libwkhtmltox.so">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
      <CopyToPublishDirectory>Always</CopyToPublishDirectory>
    </None>
  </ItemGroup>
</Project>

How to create an Azure Functions app

Before we deploy Azure Functions, we will have to create it in Azure Portal. You can go to Azure Portal and start creating Azure Functions source. Follow the screenshots below for more clarity.

Untitled-1
Version details

In the screenshot below, make sure to select or create at least Communication Make a plan here. Second, under Operating System, select Linux.

Screenshot-2021-03-22-at-10.30.48-AM
Plan details

It’s good to have Application details as you will be able to see logs and monitor functions. Besides, it costs almost nothing. As shown in the screenshot below, select Right if you want to enable it.

Screenshot-2021-03-22-at-10.31.11-AM
Application details

Select Next: Card and again click Next and click Create to create your resource. It may take a few minutes to create Azure Functions source.

How to Deploy Azure Functions

Once created, we will deploy our code directly to Azure Functions through VS Code. For that you will have to go to extensions and settings Azure Functions extension. With its help we will be able to login and manage Azure Functions.

Screenshot-2021-03-22-at-10.03.00-AM
Azure Functions in the Marketplace

Once installed, you will see an Azure icon in the sidebar. When you click on it, it opens a panel with the option Sign in to Azure.

Screenshot-2021-03-22-at-10.19.08-AM
Azure Function Extensions

Option Sign in to Azure will open a browser where you can sign in with your account. Once logged in, you can return to VS Code and see the list of Azure Functions in your side panel.

Screenshot-2021-03-22-at-10.43.07-AM
List of Azure Functions

For me, there are four functional applications. Since you can only create one, it will show one. Now it’s time to deploy the application.

Press F1 to open a menu with a list of actions. Option Azure Functions: Deploy to Functional Apps… will open a list of Azure Functions that you can deploy.

Select our newly created Azure Functions App. This will ask for a confirmation popup, so go ahead and implement it. It will take a few minutes to deploy your Application.

How to configure wkhtmltopdf

Once you’ve deployed Azure Functions, there’s still one last thing to do. We will need more libwkhtmltox.so to an appropriate location on our Azure Functions App.

Sign in to the Azure portal and navigate to our Azure Functions App. On the side panel, search for SSH and click To go button.

Screenshot-2021-03-22-at-12.14.03-PM
SSH Search for Azure Functions

This will open the SSH console in a new tab. Our website is located at /home/site/wwwroot. So navigate to that directory by entering the command below:

cd /home/site/wwwroot/bin

When you make ls command to see the contents of a file that you won’t see libwkhtmltox.so file. It is actually located at /home/sites/wwwroot.

That is not the exact location. We need to copy it to the bin directory. To do that, execute the command below:

cp ../libwkhtmltox.so libwkhtmltox.so

If you know a better way to put files in the bin folder, please let me know.

That’s it! You have a fully functional Azure Functions application. Time to call it from our demo dotnet project.

How to call Azure function

All said and done, we still need to test and call our function. Before we do that, we need to understand Code is required to call the Function.

The Code is a secret that needs to be included in order to safely call the Function. To get Code navigate to Azure Portal and open your Function App. In the side panel, search for Function.

Screenshot-2021-03-22-at-12.28.21-PM
Search function

You will see Html2Pdf in the list. Clicking on that function opens a detailed view. In the side panel there will be an option for Function keys. Select that option to see a hidden default Code has been added for you.

Screenshot-2021-03-22-at-12.29.55-PM

Copy the code and keep it handy as we will need it in the code. To test the functionality, I have created a sample console application for you. Replace base URL and code as below:

using System;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;
using Newtonsoft.Json;

namespace Demo.ConsoleApp
{
    public class Program
    {
        public static async Task Main(string[] args)
        {
            string AzureFunctionsUrl = "https://<Your Base Url>/api/Html2Pdf?code=<Replace with your Code>";


            using (HttpClient client = new HttpClient())
            {
                var Request = new Html2PdfRequest
                {
                    HtmlContent = "<h1>Hello World</h1>",
                    PDFFileName = "hello-world.pdf"
                };
                string json = JsonConvert.SerializeObject(Request);
                var buffer = System.Text.Encoding.UTF8.GetBytes(json);
                var byteContent = new ByteArrayContent(buffer);

                byteContent.Headers.ContentType = new MediaTypeHeaderValue("application/json");


                using (HttpResponseMessage res = await client.PostAsync(AzureFunctionsUrl, byteContent))
                {
                    if (res.StatusCode != HttpStatusCode.NoContent)
                    {
                        throw new Exception("There was an error uploading the pdf");
                    }
                }
            }
        }
    }

    public class Html2PdfRequest
    {
        // The HTML content that needs to be converted.
        public string HtmlContent { get; set; }

        // The name of the PDF file to be generated
        public string PDFFileName { get; set; }
    }

}

Again, the code should be fairly easy to understand. If you have any feedback or questions, let me know. After you run the console application above, it will generate a hello-world.pdf your input file pdf containers in Azure Storage.

Inference

That concludes our tutorial on how to convert HTML to PDF using Azure Functions. While it can be a bit difficult to set up, it is one of the cheapest solutions for using serverless.

Read some of my other posts here:

Hope this helps!

Source link

Share: