Convert PDF to MS Word DOCX in Python Without Acrobat Throws Error

RoryJudith · September 10, 2021, 3:32am

When I try to convert a document from PDF to docx, I get the error message “‘property’ object has no attribute ‘swagger_types’” when I send the request to the API to convert the document. I haven’t done anything unusual with the requests. Can anyone advise me as to what’s going wrong and how to fix it?

tilal.ahmad · September 10, 2021, 5:05am

@RoryJudith

I tested to convert PDF to MS Word DOCX with GroupDocs.Conversion Cloud API with a sample PDF document and was unable to notice the issue. Please share your sample input document along with the code. We will look into these and will guide you accordingly.

Convert PDF to Word with cURL

curl -X POST "https://api.groupdocs.cloud/v2.0/conversion" 
-H "accept: application/json" 
-H "authorization: Bearer [Access_Token]" 
-H "Content-Type: application/json" 
-H "x-aspose-client: Containerize.Swagger"
 -d "{ \"FilePath\": \"3amazon.pdf\", \"Format\": \"docx\", \"OutputPath\": \"3amazon_output.docx\"}"

RoryJudith · September 12, 2021, 11:05pm

The code is

import groupdocs_conversion_cloud as gcc

gcc_id = “xxxx-xxxx-xxxx-xxx-xxxxxxx”
gcc_sec = “xxxxxxxxxxxxxxxxxxxxxxx”

convertAPI = gcc.ConvertApi.from_keys(gcc_id, gcc_sec)
fileAPI = gcc.FileApi.from_keys(gcc_id, gcc_sec)

inputName = “MIT - 205 - 2011-6086 - MPW - PMP - Appendix B - MPW RA v.4 (pp 66-78).pdf”
outputName = “EP1.docx”
remote_name = “MIT - 205 - 2011-6086 - MPW - PMP - Appendix B - MPW RA v.4 (pp 66-78).pdf”
outputFormat = “docx”

request_upload = gcc.UploadFileRequest(remote_name, inputName)
response_upload = fileAPI.upload_file(request_upload)

gcc_settings = gcc.ConvertSettings()
gcc_settings.FilePath = remote_name
gcc_settings.format = outputFormat
gcc_settings.OutputPath = outputName

loadOptions = gcc.PdfLoadOptions
loadOptions.hide_pdf_annotations = True
loadOptions.remove_embedded_files = False
loadOptions.flatten_all_fields = True
gcc_settings.load_options = loadOptions

convertOptions = gcc.DocxConvertOptions()
convertOptions.pages = range(66,79)
gcc_settings.convert_options = convertOptions

request = gcc.ConvertDocumentRequest(gcc_settings)
response = convertAPI.convert_document(request)

I have attached the document I’ve been testing it on:
MIT - 205 - 2011-6086 - MPW - PMP - Appendix B - MPW RA v.4 (pp 66-78).pdf (3.0 MB)

tilal.ahmad · September 13, 2021, 5:27am

@RoryJudith

Please check the following sample code Convert PDF to MS Word DOCX in Python with GroupDocs.Conversion Cloud SDK for Python, use from_page and pages_count properties as follows instead of range.

How to Convert PDF to DOCX in Python

Sign up with groupdocs.cloud to get credentials
Install GroupDocs.Conversion Cloud SDK for Python from PIP
Import groupdocs_conversion_cloud python module
Upload PDF document to cloud storage
Convert PDF to DOCX using convert_document method
Download output Word DOCX from cloud storage

Python Code to Convert PDF to DOCX

# Import module
import groupdocs_conversion_cloud as gcc
from shutil import copyfile

# Get your Client ID and Client Secret at https://dashboard.groupdocs.cloud
# (free registration is required).
gcc_id = "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
gcc_sec = "xxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Create instance of the API
convert_api = gcc.ConvertApi.from_keys(gcc_id, gcc_sec)
fileAPI = gcc.FileApi.from_keys(gcc_id, gcc_sec)

try:

        #upload source PDF file to Cloud Storage
        filename = 'MIT - 205 - 2011-6086 - MPW - PMP - Appendix B - MPW RA v.4 (pp 66-78).pdf'
        remote_name = 'MIT - 205 - 2011-6086 - MPW - PMP - Appendix B - MPW RA v.4 (pp 66-78).pdf'
        output_name= 'EPI.docx'
        outputFormat='docx'

        request_upload = gcc.UploadFileRequest(remote_name,filename)
        response_upload = fileAPI.upload_file(request_upload)
        
        #Convert PDF to DOCX in Python
        gcc_settings = gcc.ConvertSettings()
        gcc_settings.file_path =remote_name
        gcc_settings.format = outputFormat
        gcc_settings.output_path = output_name

        loadOptions = gcc.PdfLoadOptions()
        loadOptions.hide_pdf_annotations = True
        loadOptions.remove_embedded_files = False
        loadOptions.flatten_all_fields = True

        gcc_settings.load_options = loadOptions

        convertOptions = gcc.DocxConvertOptions()
        #convertOptions.pages = range(66,79)
        convertOptions.from_page = 66
        convertOptions.pages_count = 13
            
        gcc_settings.convert_options = convertOptions
                
        convertPDFtoDOCXRequest = gcc.ConvertDocumentRequest(gcc_settings)
        convertPDFtoDOCXResponse = convert_api.convert_document(convertPDFtoDOCXRequest)
        print("Document converted successfully: " + str(convertPDFtoDOCXResponse))
        
        #Download Word Document from Storage        
        request_download = gcc.DownloadFileRequest(output_name)
        response_download = fileAPI.download_file(request_download)
       
        copyfile(response_download, 'EPI_copy.docx')
        print("Result {}".format(response_download))
        
except gcc.ApiException as e:
        print("Exception when calling get_supported_conversion_types: {0}".format(e.message))

EPI_copy.docx (5.6 MB)

RoryJudith · September 13, 2021, 5:51am

@tilal.ahmad I tried the code you gave me and I’m still getting the same error message

tilal.ahmad · September 13, 2021, 1:00pm

@RoryJudith

It is quite strange that conversion is working fine at my end. Please ensure that you are using the latest SDK of GroupDocs.Conversion Cloud SDK for Python.

RoryJudith · September 13, 2021, 11:19pm

It’s working now. Thank you.