Hi @sergei.terentev
We tried below options:
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3
# Env
env:
- name: DOTNET_GCConserveMemory
value: "9"
This caused more restarts with OOM. What might be the issue ?
Thanks for the update — sorry to hear the restarts got worse, that’s not what we expected.
To figure out what’s going on, a few questions:
Do you have a memory limit set on the pod? Something like resources.limits.memory: "1Gi" in your deployment spec? If so, what’s the value? If not, do you know roughly how much memory the node has available?
What does kubectl describe pod <pod-name> show for the last restart? Specifically the Last State section under the container — it should say whether the exit code was OOMKilled or something else.
What’s the typical workload? Rough doc sizes and how many concurrent requests you’re sending would help a lot.
My suspicion is that the memory limit might be set too low for the conversion workload, and the more aggressive GC is actually exposing that by being less tolerant of spikes. But I’d rather confirm before suggesting a fix.
Hi @sergei.terentev
Here is the detail:
- Pod Resource config and last state
State: Running
Started: Thu, 25 Jun 2026 10:21:26 +0530
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 25 Jun 2026 07:44:30 +0530
Finished: Thu, 25 Jun 2026 10:21:24 +0530
Ready: True
Restart Count: 7
Limits:
cpu: 8
memory: 5Gi
Requests:
cpu: 1
memory: 1Gi
- We have a document size limit 400Mb.
- And there is not much concurrency, in last 24hrs from now we have got ~ 2K requests.
@piyush.rajput ,
Thanks, this is helpful.
The issue is likely DOTNET_GCConserveMemory=9 working against you. A 400MB source file can expand to 1–2GB in working memory during conversion, and with the GC running aggressively on top of that, you’re hitting the 5Gi ceiling.
Try dropping it to 5 instead of 9:
env:
- name: DOTNET_GCConserveMemory
value: "5"
Also, your liveness probe has no timeoutSeconds set (defaults to 1s) — under load that can cause false restarts. Add timeoutSeconds: 5 to be safe.
Also, in addition, from our’s cloud version experience, the multiple restarts are normal, we have hundreds of them. But also we have multiple instances (pods), so load is distributed between them - when some are being restarted, other continue handle requests.