Using the viewer with pre-existing Amazon S3 bucket

nick.byrd · September 10, 2015, 1:40am

Greetings!

I am in the process of evaluating GroupDocs’ Cloud-based Viewer (and, later, Annotation) product for integration with my company’s web applications. I was pleased to see that there is a working online demo that I can play with, even allowing me to integrate it with our pre-existing Amazon S3 buckets. Very impressive!

I do have some questions, however. As I said above, we have an Amazon S3 bucket that we want to use as our storage solution. This bucket has literally millions of objects (PDFs, RTFs, JPGs, you name it). Through the Dashboard, I am able to navigate these objects, open them with the viewer, annotate them, embed them in toy HTML pages, the works. My question is this: using the Cloud’s APIs, how can I correlate my S3 objects’ keys (which, in the dashboard, are called “File Name”) to the GroupDocs’ GUIDs? I see how to get it based on your documentation, but for my company’s purposes we need to be able to get the GUID procedurally, without relying upon the UI. (Again, we have literally millions of pre-existing documents that we’d like to integrate with the embedded viewer; being able to procedurally link the unique file names to the GUIDs is absolutely critical to our process if we use these tools!)

I have been digging into the documentation for a while and haven’t found the answer. Please advise!

Thanks in advance!

evgen.efimov · September 10, 2015, 6:05am

Hello,

Thank you for your inquiry.

Yes, you can get the document GUID programmatically via our Cloud API. For this you should use ListEntities method . In response this method will return all info about your files and then you should use loop over the response object, to find required document by file name. On shared screenshot you can see example of such response. Also you can check our demo sample for PHP SDK here .

If you will have more questions please feel free to contact us.

Best regards,
Evgen Efimov

http://groupdocs.com
Your Document Collaboration APIs
Follow us on LinkedIn, Twitter, Facebook and Google+

evgen.efimov · September 14, 2015, 8:38am

Hello ,

Thank you for coming back.

We have checked your issue on our test account and seems that the indexation of files from amazon bucket is working well for us and ListEntities method can index all our test files. Seems that you got the issue because you have few millions files in your bucket.

So we have discussed with our product team your issue and they will investigate it. At this moment, we have logged this problem in our issue tracking system as CORE-2165 and we have linked this ticket to this forum thread and you will be notified via this forum thread once we will have any results.

Thanks for your patience.

Best regards,
Evgen Efimov

http://groupdocs.com
Your Document Collaboration APIs
Follow us on LinkedIn, Twitter, Facebook and Google+

nick.byrd · September 11, 2015, 11:22pm

Thanks for the quick response! I understand how to use the ListEntities method now and have been able to use it to fulfill some of my requirements.

However, in the process of testing further, I have discovered a few more problems that I am unable to find the solution to. Two problems, to be precise. Before I describe the two problems in detail, let me first clarify the structure of my company’s S3 bucket. The bucket is shallow (essentially no sub-folders), and contains a few million files. These files all have UUIDs for file-names, so there are files named things like “00001234-5678-49abc-8def-0123456789ab”, “fffffedc-ba98-4765-b432-10fedcba9876”, and everything in-between.

The first problem that I’ve discovered is that only files whose names start with “0” are searchable. That is, I can easily find the GUID for file “00001234-5678-49abc-8def-0123456789ab”, but I when I search for any file that starts with the characters 1-9 or a-f, the ListEntities method returns a rather boring result with no files: something like this. That’s rather disconcerting, and the result is that about 90-95% of my data is un-searchable. Can you explain why this would be the case? Is it because I’m using a trial account and there is some limit on the amount of data that’s indexed for filtering?

The second problem that I’ve encountered is that files uploaded to my test S3 bucket subsequent to linking my GroupDocs client to my S3 storage engine are similarly not searchable via the ListEntites API method. It appears that they are not indexed for filtering. I wonder if it’s related to the first problem I listed above. Does GroupDocs offer real-time filtering of the content as it exists in the S3 bucket? Or is there a separate method that I can use to index new files programmatically? This is crucial functionality for my use-case, because my company’s clients upload content to this S3 Bucket through a number of external APIs (e.g.not GroupDocs); we want the content to be searchable and viewable regardless of how it got into the bucket. How can I accomplish this?

Again, thank you for answering my first post so swiftly and accurately. I look forward to your further replies!

nick.byrd · September 14, 2015, 5:54pm

Thank you, I look forward to your reply.

evgen.efimov · September 16, 2015, 8:22am

Hello,

We have good news for us.

Our product team have fixed your issue and now you can try again to receive all files and index them via our ListEntities method.

Please notify us about your testing result.

Best regards,
Evgen Efimov

http://groupdocs.com
Your Document Collaboration APIs
Follow us on LinkedIn, Twitter, Facebook and Google+

nick.byrd · September 22, 2015, 12:30am

Hmm,

No, I’m afraid it does not appear to be fixed. I am still only able to use the ListEntities API to find entities whose names start with 0. Do I have to re-synchronize my bucket or something?

At the present, I have developed a work-around where our application will re-upload any file not searchable through the GroupDocs API. It’s slow and obviously a nasty hack, but at this time it appears to be the only thing that works.

evgen.efimov · September 22, 2015, 6:08am

Hello Nick,

We are sorry, that you still getting the issue with our API method. In our previous investigation we found that not all files from Amazon were retrieved due to the large number, we have fixed the issue, but looks like after the fix some issues still present and we definitely will fix it .

We will notify you, when we will have some results.

Best regards,
Evgen Efimov

http://groupdocs.com
Your Document Collaboration APIs
Follow us on LinkedIn, Twitter, Facebook and Google+

nick.byrd · September 24, 2015, 12:41am

Thanks for looking into this. As I pointed out in my previous message, I have devised a work-around that theoretically works for us right now.

My company has decided to move forward with GroupDocs’ cloud viewer, and has purchased a “Startup Basic” plan in order to move forward with our development process. Shortly after doing this, however, I’ve discovered that I am no longer able to upload files through the GroupDocs Storage Upload API . When I try to, I get the following error:

{result: class UploadRequestResult {adj_name: nullurl: nulltype: Cellsfile_type: Docsize: 0version: 0view_job_id: nullthumbnail: nullupload_time: -62135596800000id: 0.0guid: null}

status: StorageLimitExceeded
error_message: There is no enough space in the storage.
composedOn: 1443052673749
}

And when I review the Usage tab under my GroupDocs Profile, it says that I’ve used up 12GB of my 2GB allotted storage.

At this point, I have a few questions, but I feel that this one is the most important: If my company is using a privately-owned Amazon S3 bucket for Storage, why is GroupDocs apparently tracking a storage quota? We are already paying Amazon to host our files, so it doesn’t make sense to me why there would be a limitation on how much I can upload. I understand that there are limitations on the number of ops we can perform per month and would even understand if there were bandwidth restrictions… Can you explain what this means?

evgen.efimov · September 24, 2015, 1:06pm

Hello Nick,

Thank you for request

We investigated why you cannot upload more files. The issue is related with space calculation for external storage and its comparing with free available storage for your plan.

So to fix the issue for you we have increased your extra storage space and now you can continue your operations with files. Please notify us if the issue was fixed for you. Also we will work to improve space calculation in new version of the Platform.

Our product team also have investigated your issue with the sync with Amazon and have not found any problems with it, seems just that it takes a long time for you, because you have a lot of files. When, you try to use ListEntities method the sync yet not completed and you can’t get all files via this method.

We continue to work on the improve of work our service, but in your case, at this moment would be better, if you will be using your workaround.

Best regards,
Evgen Efimov

http://groupdocs.com
Your Document Collaboration APIs
Follow us on LinkedIn, Twitter, Facebook and Google+

Anonymous · October 28, 2015, 6:49pm

Hello,

This is happening once again, however now it is happening with our production account (client id is ‘65301fb48084ce7f’). I thought you said this was fixed.

This is an extremely dire problem; it it blocking our company’s clients from using our application. The error that we are getting is “StorageLimitExceeded” whenever our users try to upload something through the Groupdocs Upload API, which (again) doesn’t make sense since we’re providing our own S3 storage.

evgen.efimov · October 29, 2015, 6:56am

Hello,

We are sorry that you have the issue again with your another account.

We have reported the issue to our Product team and they have fixed it now. Could you please check this functionality again and notify us with the results.

Best regards,
Evgen Efimov

http://groupdocs.com
Your Document Collaboration APIs
Follow us on LinkedIn, Twitter, Facebook and Google+