Member-only story
Troubleshooting App Failures
What do you do when an app fails in production?
App failures can occur at various levels, including infrastructure, configuration, codebase, external sources, and human error. First, identify the different layers of your app and where to search for potential problems.
Infrastructure issues can arise if the OS services are misconfigured or if the server needs more resources like RAM. Configuration issues can occur if asynchronous job daemons are not working correctly or if scheduled jobs are not set up properly. Codebase issues can arise if resources are not adequately released, resulting in a chain of errors.
External sources can also cause app failures, such as when the app fails to load due to issues with the user’s or administrator’s ISP.
Finally, consider the possibility of human error, such as when the administrator or developer incorrectly sets up the project’s runtime configuration.
By recognizing the layer where the failure is happening and considering these potential issues, you can take steps to solve the problem and prevent future losses.
Establish a proper on-call rotation for your team to mitigate any potential issues in the future.