Python Page formatting issue in PDF to Word Document Conversion

Gitika_Srivastava · August 30, 2022, 10:52am

Also @tilal.ahmad I have a pdf that is actually somehow adjusted in A4 size but while converting it to word using GroupDocs it is not completely getting fit into A4 size. 20% of the page is cut.
How can I adjust it so that everything is there on the A4 size and nothing gets cut .Is that possible . Can you please help me out here.If possible can you share me a sample code for that as well.
Thanks for the support

tilal.ahmad · August 30, 2022, 4:51pm

@Gitika_Srivastava

Please share your input and output document. We will look into the issue and will guide you accordingly.

Gitika_Srivastava · September 1, 2022, 8:02am

Input.PNG (63.7 KB)
Output.PNG (49.4 KB)

Hi @tilal.ahmad thanks for getting back. I have attached two screenshots. First attachment is the input pdf which needs to be converted to word doc and second attachment is the output word doc which is downloaded and is not getting fit (20%) of the page is cut.

Request you to please look into this and please help me out how can I retrieve the exact output (noting should be cut out and everything should be same as that of input file) .

Also if there’s a way we can resolve this request you to please share the sample code and how/where to use it. As of now I am using the same code to convert pdf to word i.e

import groupdocs_conversion_cloud
from shutil import copyfile
import PyPDF2
import re
import glob
import shutil

# Get your client_id and client_key at https://dashboard.groupdocs.cloud (free registration is required).
client_id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
client_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(client_id, client_key)

count=0

for filename1 in glob.glob(mypath + "/*.pdf"):

    try:

            #Convert PDF to DOCX
            # Prepare request
            request = groupdocs_conversion_cloud.ConvertDocumentDirectRequest("docx", filename1)
            output_name= filename1[0:-4]+'.docx'
            output_name=output_name.replace("Multiple PDF's","Multiple PDF's\Group_Docs")

            #Convert
            result = convert_api.convert_document_direct(request)       
            copyfile(result, 'output.docx')
            print("Result {}".format(result))
            

    except groupdocs_conversion_cloud.ApiException as e:
            print("Exception when calling get_supported_conversion_types: {0}".format(e.message))

Thanks again!

tilal.ahmad · September 1, 2022, 4:41pm

@Gitika_Srivastava

As requested above we need your input document for the investigation. If it has some confidential data then you may share it with me via a private message. We honor customers’ privacy.

Gitika_Srivastava · September 1, 2022, 4:59pm

sure @tilal.ahmad. Allow me sometime I’ll share that with you… Its just that I need my leadership approval before sharing…
Thanks

Gitika_Srivastava · September 1, 2022, 5:23pm

Hey @tilal.ahmad, PFA the input pdf doc and then retrieved word output doc.Test final.docx (39.4 KB)
Test final.pdf (563.3 KB)

Please let me know what can be done here to resolve this issue…

Thanks

tilal.ahmad · September 2, 2022, 5:18am

@Gitika_Srivastava

Thanks for sharing the source documents. We are looking into the requirements and will share the update with you shortly.

tilal.ahmad · September 2, 2022, 12:17pm

@Gitika_Srivastava

You can use DocxConvertOption to set page width and height of output DOCX. Please note the width and height parameters are not working as expected at the moment. We have logged a ticket CONVERSIONCLOUD-482 to fix the issue. However, as a work around you can use the dpi and zoom parameter for the purpose as following.

# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(client_id, client_key)

try:

        #Convert PDF to DOCX
        # Prepare request
        load_options = groupdocs_conversion_cloud.PdfLoadOptions()
        load_options.format = "pdf"
        
        convert_options = groupdocs_conversion_cloud.DocxConvertOptions()
        convert_options.width= '1920'
        convert_options.height= '1080'
        convert_options.dpi= '96'
        convert_options.zoom= '100'

        request = groupdocs_conversion_cloud.ConvertDocumentDirectRequest(format="docx", file="Test final.pdf",load_options=load_options,convert_options=convert_options)
        # request = groupdocs_conversion_cloud.ConvertDocumentDirectRequest("docx", "Book1_LoadOptions.pdf")

        # Convert
        result = convert_api.convert_document_direct(request)       
        copyfile(result, 'output.docx')
        print("Result {}".format(result))
        
except groupdocs_conversion_cloud.ApiException as e:
        print("Exception when calling get_supported_conversion_types: {0}".format(e.message))

Gitika_Srivastava · September 20, 2022, 5:52pm

Hi @tilal.ahmad hope you are doing great…
Getting back to you on this. So for converting pdf to word I want to use Google Drive as the storage . So once I choose the Google Drive as the storage option it asks to provide Client ID, Client Secret and Refresh Token. So how can I get this I am not sure

Also do you have the sample code where I can convert pdf to word with Google Drive as my storage and not the internal storage. So with the existing code where I was mentioning secret id and key, in case of Google Drive what changes need to be done in code.

Would be really helpful if you can share the sample code for converting PDF to word using Google Drive as storage.

Thanks!

tilal.ahmad · September 21, 2022, 3:42am

@Gitika_Srivastava

For using Google Drive storage with GroupDocs Cloud APIs, you need to configure it as following. And later you just need to pass the storage name in the API Call.

tilal.ahmad · September 21, 2022, 4:38am

@Gitika_Srivastava

Please find the sample code, to convert a document from Cloud storage. You can amend it as per your requirement.

# Import module
import groupdocs_conversion_cloud
from shutil import copyfile

# Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
app_sid = "xxxxx-xxxx-xxxx-xxxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxx"

# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)


try:

        #upload soruce file to storage
        filename = 'Book1.xlsx'
        remote_name = 'Book1.xlsx'
        output_name= 'Book1_LoadOptions.png'
        strformat='png'
        storage='GD_google'

        request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename,storage)
        response_upload = file_api.upload_file(request_upload)
        
        #Convert XLSX to PNG
        settings = groupdocs_conversion_cloud.ConvertSettings()
        settings.file_path =remote_name
        settings.format = strformat
        settings.output_path = output_name
        settings.storage_name = storage

        loadOptions = groupdocs_conversion_cloud.SpreadsheetLoadOptions()
        loadOptions.hide_comments = True
        loadOptions.one_page_per_sheet = True
        loadOptions.convert_range = 'I5:K6'

        settings.load_options = loadOptions

        convertOptions = groupdocs_conversion_cloud.PngConvertOptions()
        convertOptions.from_page = 1
        convertOptions.pages_count = 1
            
        settings.convert_options = convertOptions
                
        requestConvertXLSXtoPngPython = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
        responseConvertXLSXtoPngPython = convert_api.convert_document(requestConvertXLSXtoPngPython)
        print("Document converted successfully: " + str(responseConvertXLSXtoPngPython))
        
        #Download Document from Storage        
        request_download = groupdocs_conversion_cloud.DownloadFileRequest(output_name,storage)
        response_download = file_api.download_file(request_download)
       
        copyfile(response_download, output_name)
        print("Result {}".format(response_download))
        
except groupdocs_conversion_cloud.ApiException as e:
        print("Exception when calling get_supported_conversion_types: {0}".format(e.message))

tilal.ahmad · October 19, 2022, 2:50pm

A post was split to a new topic: Python Bulk Find and Replace in Word Document