Read Part 1: Why Your Application Logs Are Your Best Friend (Or Worst Enemy) →
In Part 1, we explored why logging matters and what happens when you get it wrong. Now let’s get practical. How do you implement logging that actually helps you debug faster and build more reliable systems?
The strategies in this post come from years of 3 AM debugging sessions, production incidents, and hard-learned lessons. These aren’t theoretical best practices—they’re battle-tested approaches that work in real applications under real pressure.
The Anatomy of Good Logging
Log the Right Things
Not everything deserves a log entry, but these things definitely do:
Application Lifecycle Events:
- Startup and shutdown
- Configuration loading
- Database connections established/lost
- External service connections
User Actions:
- Authentication attempts (successful and failed)
- Major business operations (placing orders, updating profiles)
- Permission changes
- Data exports or sensitive operations
Errors and Exceptions:
- All exceptions with full stack traces
- Validation failures
- External service failures
- Timeout events
Performance Markers:
- Slow database queries
- Long-running operations
- API response times
- Resource usage spikes
Structure Your Logs
Treat logs as data, not just text. Structured logging uses a consistent format (usually JSON) that makes logs searchable and analyzable:
{ "timestamp": "2025-01-05T14:30:22.123Z", "level": "ERROR", "service": "payment-processor", "event": "payment_failed", "user_id": "user_12345", "order_id": "ord_987654", "payment_method": "credit_card", "error_code": "GATEWAY_TIMEOUT", "retry_count": 2, "response_time_ms": 30000 }
This beats the hell out of parsing text like:
ERROR: Payment failed for user user_12345 order ord_987654 - gateway timeout after 30000ms (retry 2)
Include Context, Not Just Events
Every log entry should answer: Who did what, when, where, and why?
Who: User ID, session ID, or system component What: The specific action or event When: Precise timestamp Where: Service name, function, or module
Why: The context that led to this event
Use Correlation IDs
In distributed systems, a single user action might trigger dozens of service calls. Correlation IDs let you trace a single request across your entire system:
{ "correlation_id": "req_abc123", "service": "user-service", "event": "profile_updated", "user_id": "user_456" }
{ "correlation_id": "req_abc123", "service": "notification-service", "event": "email_sent", "recipient": "user_456", "template": "profile_update_confirmation" }
Now when something goes wrong, you can follow the entire journey of that request.
Common Logging Mistakes (And How to Avoid Them)
Logging Sensitive Data
Never, ever log:
- Passwords or API keys
- Credit card numbers
- Social security numbers
- Personal addresses or phone numbers
- Authentication tokens
Instead, log references:
// Bad {"event": "payment_processed", "credit_card": "4532-1234-5678-9012"} // Good {"event": "payment_processed", "payment_method": "visa_ending_9012"}
Inconsistent Log Levels
Don’t be random with log levels. Here’s a practical guide:
ERROR: Something broke and needs immediate attention WARN: Something unexpected happened but the system is still working INFO: Important business events worth tracking DEBUG: Detailed information for troubleshooting (disabled in production)
Logging Without Purpose
Every log entry should have a reason to exist. Ask yourself: “If this log entry appeared in an alert, would I know what to do about it?”
Not Logging Enough Context
This log entry is useless:
ERROR: Database query failed
This one helps you fix the problem:
{ "level": "ERROR", "event": "database_query_failed", "query": "SELECT * FROM orders WHERE user_id = ?", "user_id": "user_123", "error": "Connection timeout after 30s", "retry_count": 2, "database": "orders_replica_2" }
Building a Logging Strategy
Start with the Basics
Don’t try to build the perfect logging system from day one. Start with:
- Choose a consistent format (JSON recommended)
- Log application startup/shutdown
- Log all errors with context
- Add correlation IDs for request tracing
- Set up centralized log collection
Evolve Based on Pain Points
As you encounter production issues, ask: “What log entry would have helped me solve this faster?” Then add that logging.
Had a performance issue? Add timing logs. Debugging user workflow problems? Add business event logs. Dealing with external service failures? Add integration logs.
Choose the Right Tools
For most applications, you need:
Log Collection: Something to gather logs from all your services Log Storage: A searchable datastore for your logs
Log Analysis: Tools to query and visualize log data Alerting: Notifications when important events occur
The complexity depends on your scale. A small application might use simple file-based logging with log rotation. A distributed system might need something more sophisticated.
For teams that want to focus on building features instead of managing logging infrastructure, platforms like Trailonix provide simple APIs for structured logging with built-in search, alerting, and analytics. The key is choosing tools that match your team’s size and expertise—you want to spend time analyzing logs, not configuring log management systems.
Monitor Your Logs
Your logging system needs monitoring too. Track:
- Log volume trends
- Error rate patterns
- Performance impact of logging
- Storage usage and costs
Set up alerts for unusual patterns, like sudden spikes in error logs or complete absence of logs from a service.
Making Logs Actionable
Design for Your Future Self
When you’re writing log entries, imagine you’re debugging an issue at 2 AM six months from now. What information would you need to quickly understand what happened?
Create Runbooks from Log Patterns
Document common log patterns and their solutions:
- “If you see error code XYZ, check the third-party API status”
- “High response times for endpoint ABC usually mean the cache needs clearing”
- “Database connection errors followed by recovery indicate network flakiness”
Use Logs for Proactive Monitoring
Don’t wait for things to break completely. Set up alerts for:
- Increasing error rates
- Degrading performance
- Unusual user behavior patterns
- Resource exhaustion warnings
Practical Implementation Tips
Log Entry Templates
Create templates for common scenarios:
Error Template:
{ "level": "ERROR", "event": "{operation}_failed", "user_id": "{user_id}", "correlation_id": "{correlation_id}", "error_code": "{error_code}", "error_message": "{error_message}", "retry_count": "{retry_count}", "context": { "additional": "relevant_data" } }
Performance Template:
{ "level": "INFO", "event": "{operation}_completed", "user_id": "{user_id}", "correlation_id": "{correlation_id}", "duration_ms": "{duration}", "result_count": "{count}", "cache_hit": "{boolean}" }
Sampling High-Volume Events
For events that happen thousands of times per minute, consider sampling:
import random
def log_page_view(user_id, page):
# Log 1% of page views
if random.random() < 0.01:
logger.info("page_view", {
"user_id": user_id,
"page": page,
"sampled": True
})
Context Managers for Automatic Logging
Use language features to automatically log entry/exit with timing:
@contextmanager
def log_operation(operation_name, **context):
start_time = time.time()
correlation_id = get_correlation_id()
logger.info(f"{operation_name}_started", {
"correlation_id": correlation_id,
**context
})
try:
yield
duration = (time.time() - start_time) * 1000
logger.info(f"{operation_name}_completed", {
"correlation_id": correlation_id,
"duration_ms": duration,
**context
})
except Exception as e:
duration = (time.time() - start_time) * 1000
logger.error(f"{operation_name}_failed", {
"correlation_id": correlation_id,
"duration_ms": duration,
"error": str(e),
**context
})
raise
# Usage
with log_operation("payment_processing", user_id="123", amount=99.99):
process_payment(user_id, amount)
Getting Your Team on Board
Make It Easy
The easier logging is, the more likely people will do it well. Provide:
- Helper functions for common log patterns
- IDE snippets for log templates
- Documentation with examples
- Code review checklists that include logging
Lead by Example
Start logging comprehensively in your own code. When others see how it helps with debugging, they’ll adopt it naturally.
Share Success Stories
When good logging helps solve a production issue quickly, share that story with the team. Nothing convinces people like seeing real benefits.
The Bottom Line
Good logging is like insurance—you don’t think about it until you desperately need it. But unlike insurance, logging helps you every day by providing insights into user behavior, performance trends, and system health.
The investment in proper logging pays dividends:
- Faster debugging when issues occur
- Better understanding of user behavior
- Proactive problem detection before users are affected
- Confidence in deployments because you can see what’s happening
Start simple, be consistent, and remember that logs are for humans. Write them like you’re leaving notes for a colleague who needs to understand what your application is doing.
Your future self (and your on-call rotation) will thank you.
Quick Reference: Logging Checklist
Before You Deploy:
- [ ] All errors logged with full context
- [ ] Business events tracked consistently
- [ ] Correlation IDs implemented for request tracing
- [ ] No sensitive data in logs
- [ ] Log levels used appropriately
- [ ] Performance markers in place for slow operations
For Your Team:
- [ ] Logging standards documented
- [ ] Helper functions/templates provided
- [ ] Code review includes logging checks
- [ ] Monitoring and alerting configured
- [ ] Runbooks updated with log patterns
Remember: Perfect is the enemy of good. Start with basic structured logging and improve incrementally. The most important thing is to start logging thoughtfully and consistently.
Ready to implement better logging without building infrastructure? Trailonix provides developer-friendly APIs for structured logging with built-in search, alerting, and analytics. Start with 10,000 free daily logs and focus on your application, not your logging infrastructure.