“A manual certificate renewal is not a fix. It is a scheduled outage waiting to happen.”
In a Senior SRE troubleshooting interview, the prompt will sound incredibly simple: “At 00:00 UTC, traffic to our main API dropped by 90%. CPU and memory are completely idle. Go.”
If you check the database, the queues, or the application code, you are wasting time. The network edge has severed the connection.
When candidates finally suspect a certificate issue, they make a fatal diagnostic mistake that reveals a lack of deep network experience.
- The Failing Move (The "Frontend Dev")
- "I will check if our SSL certificate expired today by looking at our internal dashboard or checking the expiry date of the domain's primary cert."
+ The Passing Move (The "Reliability Architect")
+ "I will immediately test the TLS handshake from an external network using `openssl`. I am not just checking if the leaf certificate expired; I am checking if an Intermediate CA in the chain was rotated or expired, which causes clients to silently drop the connection."
Checking internal metrics is useless if the issue is that external browsers no longer trust your certificate chain. You must test from the outside in.
NET::ERR_CERT_DATE_INVALID errors.grpc calls fail with transport security errors.You must bypass the application and ask the network layer what certificate it is actually serving.
1. The Ultimate Source of Truth (Inspect the Handshake)
echo | openssl s_client -showcerts -servername api.example.com -connect api.example.com:443 2>/dev/null | openssl x509 -inform pem -noout -text | grep -E "Not After|Issuer"Not After) and the Issuer.2. The Quick HTTP Check
curl -vI https://api.example.com-v (verbose) flag prints the TLS handshake process. If it fails here, you know it’s a crypto/cert issue, not an application 500 error.3. Check Local Disk Certificates (If you are on the LB/Proxy)
openssl x509 -enddate -noout -in /etc/ssl/certs/api.crts_client shows an expired cert, your proxy (e.g., Nginx/Envoy) hasn’t been reloaded to pick up the new file in memory.4. Check Load Balancer Logs
grep "SSL_do_handshake() failed" /var/log/nginx/error.log | tail -n 20You need to restore trust immediately.
Why did the certificate expire in the first place?
cert-manager pod crashed, or the ACME (Let’s Encrypt) challenge failed due to a recent DNS change, preventing auto-renewal.SIGHUP signal to Nginx/Envoy to reload the config into memory. It kept serving the old, now-expired cert.To score “Exceptional” (L5/L6), you must prove you will never let a human track an expiration date again.
cert_expiry_days_remaining as a Prometheus metric. Alert at 30 days (Warning) and 14 days (Critical).In an interview, identifying an expired certificate is easy.
Knowing how to verify the intermediate chain, why the proxy needs a reload, and how to architect synthetic probers is what separates the hires from the rejects.
Google SRE interviews test your Execution Sequencing under pressure. If your sequence is wrong, your technical knowledge won’t save you.
I built The Complete Google SRE Career Launchpad to simulate these exact, high-stakes infrastructure failures.
👉 Get The Complete Google SRE Interview Career Launchpad (Gumroad)
The Full Training System Includes:
Don’t let an expired certificate freeze your interview. Train your reflexes.