Error in converting from HTML to docx as getting unicodes

ans11 · November 25, 2020, 10:01pm

I have a project where I have to convert a PDF to Docx, then do some format changes, and then convert HTML to Docx. I cant to PDF to Docx due to intermediate format changes. PDF to HTML is working perfect. I am getting unicodes in my Docs when I convert the HTML to Docx. Can you tell why is it coming so?

tilal.ahmad · November 26, 2020, 3:02am

@ans11

We will appreciate it if you please share your input HTML and output DOCX as a zip file. It will help us to understand the exact issue and address it.

ans11 · November 26, 2020, 8:01am

demo11.zip (3.3 MB)

I uploaded a PDF and I converted the first 10 pages to HTML and then that HTML to Docx. So I have uploaded the HTML and Docx for the same. We are interested t purchase the premium plan if this works well and good.

tilal.ahmad · November 26, 2020, 4:54pm

@ans11

Thanks for sharing the sample HTML file. We have converted the HTML to DOCX both with GroupDocs.Conversion Cloud and MS Word. Both are generating similar issues. It seems some font related issue. We will appreciate it if you please share your input PDF document and custom fonts as well. We will further investigate the issue and share our findings with you.

ans11 · November 30, 2020, 10:16am

Attached the PDF. Regarding the font thing, we expected that when we convert HTML to DOCX it should retain the fonts as the HTML has a beautiful output in the browser.

We checked the fonts in the Docx which has the base font and a subset of it like Nudi04e + RSTYGF something like this. Can you let us know why the fonts are not loading?4th-language-savikannada-1-1-51-1-10.pdf (1.7 MB)

tilal.ahmad · November 30, 2020, 4:23pm

@ans11

Thanks for sharing your input PDF document. We have logged a ticket CONVERSIONCLOUD-396 for further investigation and rectification. We will complete the investigation asap and will update you.