Convert PDF to Microsoft Word Documents in .NET

This article explains how to convert PDF to Microsoft Word Documents using C#. It covers these topics.

Format: DOC

Format: DOCX

Format: Word

The following code snippet also works with Aspose.PDF.Drawing library.

PDF to DOC and DOCX Conversion

One of the most popular features is the PDF to Microsoft Word DOC conversion, which makes content management more effortless. Aspose.PDF for .NET allows you to convert PDF files to DOC and DOCX format quickly and efficiently.

Convert PDF to DOC (Microsoft Word 97-2003) file

Convert PDF files to DOC format with ease and complete control. Aspose.PDF for .NET is flexible and supports a wide variety of conversions. Converting pages from PDF documents to images, for example, is a very popular feature.

Many of our customers have requested a conversion from PDF to DOC: converting a PDF file to a Microsoft Word document. Customers want this because PDF files cannot easily be edited, whereas Word documents can. Some companies want their users to be able to manipulate text, tables and images in files that started as PDFs.

Keeping alive the tradition of making things simple and understandable, Aspose.PDF for .NET lets you transform a source PDF file into a DOC file with two lines of code. To accomplish this feature, we have introduced an enumeration named SaveFormat and its value .Doc lets you save the source file to Microsoft Word format.

The following C# code snippet shows converting a PDF file into DOC format.

  1. Create an instance of Document object with the source PDF document.
  2. Save it to SaveFormat.Doc format by calling Document.Save() method.
public static void ConvertPDFtoWord()   // Open the source PDF document  Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");  // Save the file into MS document format  pdfDocument.Save(_dataDir + "PDFToDOC_out.doc", SaveFormat.Doc);  > 

Using the DocSaveOptions Class

The DocSaveOptions class provides numerous properties that improve converting PDF files to DOC format. Among these properties, Mode enables you to specify the recognition mode for PDF content. You can select any value from the RecognitionMode enumeration for this property. Each of these values has specific benefits and limitations:

The RelativeHorizontalProximity property can be used to control the relative proximity between textual elements. It means that distance is normed by the font size. Larger fonts may have bigger spaces between syllables and still be considered a single whole. It is specified as a percentage of the font size; for example, 1 = 100%. This means that two characters of 12pt placed 12 pt apart are proximal.

public static void ConvertPDFtoWordDocAdvanced()   var pdfFile = Path.Combine(_dataDir, "PDF-to-DOC.pdf");  var docFile = Path.Combine(_dataDir, "PDF-to-DOC.doc");  Document pdfDocument = new Document(pdfFile);  DocSaveOptions saveOptions = new DocSaveOptions    Format = DocSaveOptions.DocFormat.Doc,  // Set the recognition mode as Flow  Mode = DocSaveOptions.RecognitionMode.Flow,  // Set the Horizontal proximity as 2.5  RelativeHorizontalProximity = 2.5f,  // Enable the value to recognize bullets during the conversion process  RecognizeBullets = true  >;  pdfDocument.Save(docFile, saveOptions); > 

Try to convert PDF to DOC online

Aspose.PDF for .NET presents you online free application “PDF to DOC”, where you may try to investigate the functionality and quality it works.

Convert PDF to DOC

Convert PDF to DOCX (Microsoft Word 2007-2021) file

Aspose.PDF for .NET API lets you read and convert PDF documents to DOCX using C# and any .NET language. DOCX is a well-known format for Microsoft Word documents whose structure was changed from plain binary to a combination of XML and binary files. Docx files can be opened with Word 2007 and later versions but not with earlier versions of MS Word, which support DOC file extensions.

The following C# code snippet shows converting a PDF file into DOCX format.

  1. Create an instance of Document object with the source PDF document.
  2. Save it to SaveFormat.DocX format by calling Document.Save() method.
public static void ConvertPDFtoWord_DOCX_Format()   // Open the source PDF document  Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");  // Save the resultant DOC file  pdfDocument.Save(_dataDir + "saveOptionsOutput_out.doc", SaveFormat.DocX); > 

Convert PDF to DOCX in Enhanced Mode

To get better results of PDF to DOCX conversion, you can use EnhancedFlow mode. The main difference between Flow and Enhanced Flow is that tables (both with and without borders) are recognized as real tables, not as text with a picture in the background. There is also recognition of numbered lists and many other minor things.

public static void ConvertPDFtoWord_Advanced_DOCX_Format()   // Open the source PDF document  Document pdfDocument = new Document(_dataDir + "PDFToDOC.pdf");   // Instantiate DocSaveOptions object  DocSaveOptions saveOptions = new DocSaveOptions    // Specify the output format as DOCX  Format = DocSaveOptions.DocFormat.DocX  // Set other DocSaveOptions params  Mode = DocSaveOptions.RecognitionMode.EnhancedFlow  >;  // Save document in docx format  pdfDocument.Save("ConvertToDOCX_out.docx", saveOptions); > 

Try to convert PDF to DOCX online

Aspose.PDF for .NET presents you online free application “PDF to Word”, where you may try to investigate the functionality and quality it works.

Aspose.PDF Convertion PDF to Word Free App

See Also

This article also covers these topics. The codes are the same as above.

Format: Word

Format: DOC

Format: DOCX