Extract Specific Data from PDF using Python

DorukEmek · November 28, 2022, 8:31am

When I use the API here I manage to get the data but only the first line. Is there a way to get more than a single line?

tilal.ahmad · November 28, 2022, 4:04pm

@DorukEmek

Please share your sample document along with the code. We will look into it and guide you.

DorukEmek · November 28, 2022, 9:05pm

https://drive.google.com/drive/folders/1BfqS-9-0BjE9UwILPsLGMiLY1nmtu8tC?usp=sharing
You can find my project and the article I used for testing here.

tilal.ahmad · November 29, 2022, 7:32am

@DorukEmek

Thanks for sharing the code and input file. We are looking into the issue and will update you shortly.

DorukEmek · November 30, 2022, 8:03pm

Hi, sorry for bothering you but is there any development about the subject?

DorukEmek · November 30, 2022, 10:57pm

Hi, I managed to fix it by removing autoscale. Now I’m trying to find if there’s a way to stop extracting data when the program finds a specific keyword. Like it’s going to start extracting from “Abstract” but will stop once it comes upon “keywords”.

tilal.ahmad · December 1, 2022, 9:52am

@DorukEmek

It is good to know that you have managed to resolve the issue on your own.

We have logged a ticket(PARSERCLOUD-331) to investigate your requirement and will post the results as soon as possible.

DorukEmek · January 29, 2023, 11:53pm

Sorry for bothering you but is there any development in this subject?

tilal.ahmad · January 30, 2023, 4:04am

@DorukEmek

Please note that your above requirement is a new feature and it is still not resolved. For your information, at first, the feature will be implemented in GroupDocs.Parser for .NET, the parent API of GroupDocs.Parser Cloud. It will later be ported to GroupDocs.Parser Cloud.