Memoryleak

piyush.rajput · June 24, 2026, 1:59pm

Hi @sergei.terentev
We tried below options:

livenessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 10
            periodSeconds: 30
            failureThreshold: 3

# Env

env:
  - name: DOTNET_GCConserveMemory
    value: "9"

This caused more restarts with OOM. What might be the issue ?

sergei.terentev · June 25, 2026, 6:14am

Thanks for the update — sorry to hear the restarts got worse, that’s not what we expected.

To figure out what’s going on, a few questions:

Do you have a memory limit set on the pod? Something like resources.limits.memory: "1Gi" in your deployment spec? If so, what’s the value? If not, do you know roughly how much memory the node has available?

What does kubectl describe pod <pod-name> show for the last restart? Specifically the Last State section under the container — it should say whether the exit code was OOMKilled or something else.

What’s the typical workload? Rough doc sizes and how many concurrent requests you’re sending would help a lot.

My suspicion is that the memory limit might be set too low for the conversion workload, and the more aggressive GC is actually exposing that by being less tolerant of spikes. But I’d rather confirm before suggesting a fix.

piyush.rajput · June 25, 2026, 11:09am

Hi @sergei.terentev
Here is the detail:

Pod Resource config and last state

  State:          Running
      Started:      Thu, 25 Jun 2026 10:21:26 +0530
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 25 Jun 2026 07:44:30 +0530
      Finished:     Thu, 25 Jun 2026 10:21:24 +0530
    Ready:          True
    Restart Count:  7
    Limits:
      cpu:     8
      memory:  5Gi
    Requests:
      cpu:     1
      memory:  1Gi

We have a document size limit 400Mb.
And there is not much concurrency, in last 24hrs from now we have got ~ 2K requests.

sergei.terentev · June 25, 2026, 11:57am

@piyush.rajput ,
Thanks, this is helpful.

The issue is likely DOTNET_GCConserveMemory=9 working against you. A 400MB source file can expand to 1–2GB in working memory during conversion, and with the GC running aggressively on top of that, you’re hitting the 5Gi ceiling.

Try dropping it to 5 instead of 9:

env:
  - name: DOTNET_GCConserveMemory
    value: "5"

Also, your liveness probe has no timeoutSeconds set (defaults to 1s) — under load that can cause false restarts. Add timeoutSeconds: 5 to be safe.

Also, in addition, from our’s cloud version experience, the multiple restarts are normal, we have hundreds of them. But also we have multiple instances (pods), so load is distributed between them - when some are being restarted, other continue handle requests.