Good evening! Here's your data engineer interview problem for today. This question was asked by Palantir.
You are tasked with developing a function that efficiently processes log data from a web server. Each log entry is represented as a string in the format "YYYY-MM-DD HH:MM:SS,ERROR_LEVEL,Message"
. Your function should parse these logs and return a summary that includes the following information:
The total number of log entries.
A count of log entries per error level (e.g., ERROR, WARNING, INFO).
The earliest and latest timestamps in the logs.
Write a function parse_log_entries(log_entries)
where log_entries
is a list of log entry strings. Ensure your solution is efficient and follows best practices, as you might need to handle a large volume of log data.
Example Input:
log_entries = [
"2024-01-16 09:30:00,ERROR,Failed to connect to database",
"2024-01-16 10:15:30,WARNING,Slow response time detected",
"2024-01-16 11:00:00,INFO,Server health check OK"
]
Example Output:
{
'total_entries': 3,
'error_counts': {'ERROR': 1, 'WARNING': 1, 'INFO': 1},
'earliest_timestamp': '2024-01-16 09:30:00',
'latest_timestamp': '2024-01-16 11:00:00'
}
Constraints:
→ All timestamps are in the format "YYYY-MM-DD HH:MM:SS".
→ Error levels are always uppercase: ERROR, WARNING, INFO.
→ Log messages are non-empty strings.
Solution
Keep reading with a 7-day free trial
Subscribe to Cracking the Data Engineering Interview to keep reading this post and get 7 days of free access to the full post archives.