Storage API Credits hosting on private self infra

osdevv · April 16, 2025, 4:44pm

From the API usage I see, I’m charged one credit for each upload/download

I’ve a few questions within the cloud plan on API billing, if someone can help answer:

I’m confused as to what exactly is the difference between self hosting and cloud hosting with respect to storage credits. Specifically this sentence mentioned in docs: The only exception is with on-premise Docker storage API calls are not billed. More specifically, say I deploy the ‘groupdocs.merger’ docker in our infra, then would we still will be charged for storage APIs?
Does the docker image (groupdocs.merger) connect to AWS S3 rather than default internal storage? I see it mentions connecting to Google cloud.
We are more interested in storing data on our Infra only due to certain security contract concerns, so with the docker image option, can it be integrated? (specifically for groupdocs.merger image)

Thanks

sergei.terentev · April 17, 2025, 6:31am

Hi, @osdevv ! Thank you for your interest in GroupDocs.Merger Cloud. Let me try to answer your questions:

1. Difference Between Self-hosting and Cloud Hosting Regarding Storage Credits

The key difference is that API calls related to storage (uploads and downloads) are not billed when using the self-hosted Docker image. This includes when the Docker image is deployed in your own infrastructure.
In contrast, if you use the cloud-hosted option (GroupDocs.Cloud), storage API calls (such as uploads/downloads) are charged credits as part of the API usage.

Example:

If you deploy the groupdocs.merger Docker image in your infrastructure, you will not be charged for storage API calls.
This aligns with the statement in the documentation: “The only exception is with on-premise Docker storage API calls are not billed.”

2. Does the Docker Image Connect to AWS S3 or Google Cloud Storage?

By default, the Docker image uses local storage within the container for file operations.
However, it can be configured to connect to Google Cloud Storage by providing the following environment variables:
- GOOGLE_APPLICATION_CREDENTIALS: Path to the JSON file containing Google Cloud Storage credentials.
- GOOGLE_STORAGE_BUCKET: Name of the Google Cloud Storage bucket.

Note: The readme does not mention AWS S3 as a direct integration option. If AWS S3 is required, you may need to handle it externally or through custom integration.

3. Using the Docker Image with Internal Infrastructure for Data Storage

Yes, the Docker image (groupdocs.merger) can be configured to use local storage within your infrastructure. This is the default behavior if no external storage configuration (e.g., Google Cloud Storage) is provided.
For organizations with strict security requirements, this ensures that all data remains within your controlled infrastructure.

Steps:

Deploy the Docker container in your infrastructure.
Mount a local folder from your infrastructure to the container’s /data path using the -v option in the Docker command.
- Example: -v $(pwd)/data:/data.
This ensures all files are stored locally within your infrastructure, adhering to your security requirements.

Summary

Storage Billing: Storage API calls (uploads/downloads) are not billed when using the self-hosted Docker image.
Cloud Storage Integration: The Docker image supports Google Cloud Storage for external storage but does not mention AWS S3 integration in the provided documentation.
Local Storage: The Docker image can use local storage, ensuring all data remains within your infrastructure, which is suitable for security-sensitive environments.

osdevv · April 17, 2025, 10:37am

Got it. Clears all my questions. Thanks @sergei.terentev

Few question though now that I’ve deployed docker images and set license keys,

I’ve done my POC with python package functions. Can I assume POC (merging behaviour, functionality) still holds true in case with the image?
Docker page instructions say ’ Note: Authentication is required in case you’re going to use SDK⁠.’ What does it mean? As in, my use-case is to use ‘POST
/merger/join’ as mentioned in Swagger and I do not see requests going with any id/secret. So my assumption that this is not required. Can you please confirm? And if I have to use, can I set existing Client ID & Secret keys in docker image envs?
All APIs on swagger refer to storageName, do I need to create it first? I don’t see API to create one. And how does it logically translate to the volume I attached - Is it an actual folder? More specifically, I’ve attached AWS EFS to the deployed image container. I can create folders in EFS separately, but how does it link to groupdocs storageName?

osdevv · April 17, 2025, 4:20pm

@sergei.terentev can you please help with the above questions?

Swagger is up on: http://dev.groupdocs-merger/swagger/index.html
Health checks return ‘Healthy’

but specifically when I’m trying below curl:

curl -X 'POST' \
  'http://dev.groupdocs-merger/v1.0/merger/join' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "JoinItems": [
    {
      "FileInfo": {
        "FilePath": "/pptx_data/1/ppt1.pptx",
        "StorageName": ""
      },
      "Pages": [
        1, 4, 5
      ]
    },
    {
      "FileInfo": {
        "FilePath": "/pptx_data/1/ppt2.pptx",
        "StorageName": ""
      },
      "Pages": [
        2, 4, 7
      ]
    }

  ],
  "OutputPath": "/pptx_data/1/merged_output.pptx"
}'

I get

{
    "code": "errorItemNotFound",
    "message": "Can't find file located at '/pptx_data/1/ppt1.pptx'.",
    "description": "Operation Failed. Item Not Found.",
    "dateTime": "2025-04-17T16:14:30.7163186Z",
    "innerError": null
}

Also, when I try same request from my internal app via Python, I get 404 with
'404 Client Error: Not Found for url: http://dev.groupdocs-merger/v1.0/merger/join'

Its also weird that I get true for any storageName in this swagger API:

curl 'http://dev.groupdocs-merger/v1.0/merger/storage/pptx_data/exist' \
  -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'Connection: keep-alive' \
  -H 'Referer: http://dev.groupdocs-merger/swagger/index.html' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36' \
  -H 'accept: application/json' \
  --insecure

Response:
{"exists":true}

I suspect it could be due to either:

storageName not set - but I’m not sure how to set this?
For second issue, maybe a different URL for service is required?

sergei.terentev · April 18, 2025, 6:13am

Hi, @osdevv , answering your questions:

1. Does the POC with Python Package Functions Hold True for the Docker Image?

Yes, the behavior and functionality of the Docker image are consistent with the Python package. The Docker image uses the same GroupDocs.Merger Cloud API backend, so your POC results (e.g., merging behavior and functionality) will remain valid when using the Docker image.

2. What Does “Authentication is Required in Case You’re Going to Use SDK” Mean?

SDKs (e.g., Python, .NET, etc.) are designed to work with GroupDocs Cloud APIs, where authentication is required. Before making API calls, SDKs automatically obtain an authentication token using the CLIENT_ID and CLIENT_SECRET provided. This is why authentication is mandatory when using SDKs.
However, when using the Docker image and directly calling API endpoints (e.g., POST /merger/join) without an SDK, authentication is not required unless explicitly enabled in the Docker container by setting CLIENT_ID and CLIENT_SECRET environment variables.

Confirmation for Your Use Case:

Since you are directly using POST /merger/join as mentioned in Swagger, and you do not see requests requiring CLIENT_ID or CLIENT_SECRET, your assumption is correct: authentication is not required in this case.
If you want to enable authentication, you can set the CLIENT_ID and CLIENT_SECRET environment variables in the Docker container.

3. Do I Need to Create `storageName` First?

The storageName parameter originates from the GroupDocs Cloud APIs, where it is used to specify different storage providers (e.g., AWS S3, Google Cloud Storage, etc.).
In the self-hosted Docker image, the storageName parameter is not applicable. The Docker image uses the local storage or AWS S3 storage or Google Cloud Storage depending of container configuration.

How It Works with Your Setup:

Since you have attached AWS EFS to the container, the storage is automatically mapped to the data directory inside the container.
You can reference files and folders within this storage using the path parameter in your API calls. The storageName can be left empty or ignored.

sergei.terentev · April 18, 2025, 6:20am

Hi, @osdevv , to help you with errorItemNotFound error, can you please share your container configuration parameters (replace secrets to placeholders)?

And to help you with python code, can you tplease share your python code also? Are you using GroupDocs.Merger Cloud SDK for Python?

“Its also weird that I get true for any storageName in this swagger API” - this because storageName is not applicable in self-hosted versions. It used with cloud APIs.

osdevv · April 18, 2025, 6:43am

@sergei.terentev thanks for answering each questions.

I’ve tried with data folder, and merge/join works on Swagger correctly.

The only issue remains is that through python SDK API call, it still returns

{
    "error": "404 Client Error: Not Found for url: http://dev.groupdocs-merger/v1.0/merger/join"
}

However same API works from Swagger.

Here’s the required information you have asked for:

We are using NO library as of now. Simple API call. Earlier for cloud, we were using - groupdocs-merger-cloud==24.11 in requirements.txt

Function to call groupdocs.merger API:

import requests

def join_documents(logger, payload):
    url = "http://dev.groupdocs-merger/v1.0/merger/join" # Todo: Pick from ENV
    headers = {
        "accept": "application/json",
        "Content-Type": "application/json"
    }

    try:
        response = requests.post(url, json=payload, headers=headers)
        response.raise_for_status()
        res = response.json()
        logger.debug(f"Response from groupdocs.merger: {res}")
        return res
    except requests.RequestException as e:
        logger.error(f"Groupdocs merger: Error occurred: {e}")
        raise e

Code calling above API:

@with_logger('pptx-merge-docker')
class PptxDockerMerge:
    def on_post(self, req: falcon.Request, resp: falcon.Response) -> None:
        self.logger.info("Received pptx merge request")
        try:
            body = req.media
            S3 = S3Utility()
            tenant_id = body.get("tenant_id", "")
            efs_input_dir = f"/data/{tenant_id}/input"
            efs_output_dir = f"/data/{tenant_id}/output"
            os.makedirs(efs_input_dir, exist_ok=True)
            os.makedirs(efs_output_dir, exist_ok=True)
            files = body.get("files", [])
            merged_file_name = body.get("merged_file_name", f"merged_{uuid4().hex}.pptx")
            merged_file_path = f"{efs_output_dir}/{merged_file_name}"
            upload_path = body.get("upload_path", None)

            self.logger.info(f"Merging {len(files)} slides.")
            temp_dir = tempfile.mkdtemp()
            downloaded_files = []
            input_join_req = {
                "JoinItems": [],
                "OutputPath": merged_file_path
            }

            for file in files:
                s3_path = file.get('s3_path')
                file_name = os.path.basename(s3_path)
                local_path = os.path.join(temp_dir, file_name)
                S3.download_file_from_s3(s3_path, local_path)
                # Move to EFS /data/input/
                efs_target_path = os.path.join(efs_input_dir, file_name)
                shutil.copyfile(local_path, efs_target_path)
                downloaded_files.append(local_path)
                input_join_req["JoinItems"].append({
                    "Pages": file.get('pages', []),
                    "FileInfo": {
                        "FilePath": efs_target_path,
                        "StorageName": ""
                    }

                })
                self.logger.info(f"Downloaded and moved {file_name} to {efs_target_path}")
           
            # Check files in EFS once
            pptx_files = glob.glob(f"/data/{tenant_id}/input/*.pptx")
            self.logger.debug(f"Files in /data/{tenant_id}/input: {pptx_files}")

            # Call groupdocs.merger API
            join_documents(logger=self.logger, payload=input_join_req)
            self.logger.debug(f"Merged docs at: {efs_target_path}. Uploading to S3")
            S3.upload_to_s3(efs_target_path, upload_path)
            self.logger.info(f"Uploaded merged pptx on {upload_path}")

            # Todo: Remove files from EFS
            resp.status = falcon.HTTP_200
            resp.media = {
                "message": "Merged PPTX uploaded successfully",
                "upload_path": upload_path
            }

        except Exception as e:
            self.logger.exception("Failed to merge PPTX slides")
            resp.status = falcon.HTTP_500
            resp.media = {"error": str(e)}

sergei.terentev · April 18, 2025, 8:36am

Hi, I have tried to run your join_documents method and it works fine.

At first, I have run container using command:
docker run -it --rm -e "LICENSE_PUBLIC_KEY=<my_key>" -e "LICENSE_PRIVATE_KEY=<my_private_key>" -v "c:/data:/data" -p 8080:80 groupdocs/merger-cloud
I have checked it by opening URL http://localhost:8080/swagger/index.html in browser
I have put two files to c:\data folder: test.pptx and three-slides.pptx
I have written and run following python code, just changed url in join_documents method:

import requests
import logging

def join_documents(logger, payload):
    url = "http://localhost:8080/v1.0/merger/join" # Todo: Pick from ENV
    headers = {
        "accept": "application/json",
        "Content-Type": "application/json"
    }

    try:
        response = requests.post(url, json=payload, headers=headers)
        response.raise_for_status()
        res = response.json()
        logger.debug(f"Response from groupdocs.merger: {res}")
        return res
    except requests.RequestException as e:
        logger.error(f"Groupdocs merger: Error occurred: {e}")
        raise e

if __name__ == "__main__":

    # Configure logger
    logging.basicConfig(level=logging.DEBUG)
    logger = logging.getLogger(__name__)

    # Define payload
    payload = {
        "JoinItems": [
            {
                "FileInfo": {
                    "FilePath": "test.pptx"
                }
            },
            {
                "FileInfo": {
                    "FilePath": "three-slides.pptx"
                }
            }
        ],
        "OutputPath": "output.pptx"
    }

    # Call the join_documents function
    try:
        result = join_documents(logger, payload)
        logger.info(f"Document join successful. Result: {result}")
    except Exception as e:
        logger.error(f"Failed to join documents: {e}")

Result I got:

>python test.py
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:8080
DEBUG:urllib3.connectionpool:http://localhost:8080 "POST /v1.0/merger/join HTTP/11" 200 22
DEBUG:__main__:Response from groupdocs.merger: {'path': 'output.pptx'}
INFO:__main__:Document join successful. Result: {'path': 'output.pptx'}

I have checked out result file c:\data\output.pptx and it was ok.

The error you get looks like something wrong with container deployment. May be I could reproduce the issue, if you share step-by step instruction how to get it.

osdevv · April 18, 2025, 10:54am

It is working now.
Thanks @sergei.terentev for your quick help.

I was putting documents inside the prepended /data/ folder additionally.
The error was never that python is getting 404. When I added additional logging, I saw response text gave:

{
    "code": "errorItemNotFound",
    "message": "Can't find file located at '/data/1/input/ppt1.pptx'.",
    "description": "Operation Failed. Item Not Found.",
    "dateTime": "2025-04-18T08:17:04.4040601Z",
    "innerError": null
}

sergei.terentev · April 18, 2025, 11:31am

Glad you have figured out the issue!

osdevv · April 18, 2025, 11:44am

@sergei.terentev

For license consumption,
I’ve tried below API (v1 is not supported)
Response from v1 API:

{
  "error": {
    "code": "UnsupportedApiVersion",
    "message": "The HTTP resource that matches the request URI 'http://dev.groupdocs-merger/v1.0/merger/consumption' does not support the API version '1.0'.",
    "innerError": null
  }
}

However, I tried with v2 and it worked (http://dev.groupdocs-merger/v2/merger/consumption)

It gave below result:
{"credit":30.00000,"quantity":224551487.29045}

I see 2 credits going for each merge operation I’m doing - The code is almost same as provided earlier(operation wise), can you help me understand what is the other operation here other than merge adding the extra credit?

Also, is there some API where I can check exact API usage like cloud dashboard?

sergei.terentev · April 18, 2025, 12:26pm

I have checked - yes, the license consumption has only v2.0 enpoint, will be fixed in next version - v1.0 endpoint also will be supported like in other methods.
About credits - it’s about join method, right? The thing is that join method extracts pages from 1st document, if pages or page range specified, and then performs merge for 2nd and next documents. I assume, each operation consume credit. Anyway, please share your request (payload) for more information.
Is there some API where I can check exact API usage like cloud dashboard?
As far as I know, there is no dashboard for this, but you may request detailed logs from sales manager, at purchase forum or by email sales@groupdocs.cloud

osdevv · April 18, 2025, 1:37pm

Here’s the sample request which takes 2 credits:

{
  "JoinItems": [
    {
      "FileInfo": {
        "FilePath": "/1/input/ppt1.pptx",
        "StorageName": ""
      },
      "Pages": [
        1, 4, 5
      ]
    },
    {
      "FileInfo": {
        "FilePath": "/1/input/ppt2.pptx",
        "StorageName": ""
      },
      "Pages": [
        2, 4, 7
      ]
    }

  ],
  "OutputPath": "/1/output/merged_output.pptx"
}

I am under the impression that join will take 1 credit overall across multiple documents. (This is driven by how cloud API was showing API Usage in Logs, see below once). There, only 1 credit charged for join operation, rest are all storage APIs.

Here’s the breakdown from API Usage Logs I picked for 1 merging overall operations:

http://api.groupdocs.cloud/v1.0/merger/storage/exist/input/ppt1.pptx [ObjectExists]
http://api.groupdocs.cloud/v1.0/merger/storage/exist/input/ppt2.pptx [ObjectExists]
http://api.groupdocs.cloud/v1.0/merger/join [Join]
http://api.groupdocs.cloud/v1.0/merger/storage/file/output/temp22.pptx [DownloadFile]

My take was one credit to be charged for step 3 above only now with self hosted infra.

osdevv · April 21, 2025, 6:08am

Hey @sergei.terentev have you found anything?

I’ve also asked for detailed API usage credit wise - but not received it yet.

sergei.terentev · April 21, 2025, 1:23pm

Since you are using page numbers, Join operation starts with pages extraction from 1st document, that can consume additional credit.

osdevv · April 21, 2025, 3:58pm

@sergei.terentev if that is so, this does not go hand in hand with cloud hosting credits usage. Any reason?
And does this mean if I join 3 documents, I’ll be charged more?

sergei.terentev · April 23, 2025, 6:12am

@osdevv , Yes, this differs from cloud hosting credits usage because of different license types. I can submit a ticket about this, if you prefer.
If you join 3 documents in one request, there is still one more credit cunsumed only for the first document.

osdevv · April 23, 2025, 9:33am

Yes @sergei.terentev please raise a ticket for this and share. as this is not mentioned anywhere in documentation of self hosting.

sergei.terentev · April 25, 2025, 7:23am

@osdevv
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): MERGERCLOUD-92

aspose.notifier · May 16, 2025, 6:11am

The issues you have found earlier (filed as MERGERCLOUD-92) have been fixed in this update. This message was posted using Bugs notification tool by sergei.terentev