API Error Design in Production vs Development Environments #828

johnwroge · 2024-11-13T15:52:25Z

This is a current action item in Porting Flask to FastAPI #789.

Overview

Currently, API handles errors for production and development environments in the same way. This issue is meant to be a point of discussion to track how the team wants to keep a consistent API response for production and development environments while also bringing up logging as a potential solution to centralize development errors.

Research

The production API should be limiting information sent to the client for security purposes. Sending specific server errors and response codes to the client can open the system to vulnerabilities by exposing detailed information to bad actors. At the same time, using specific error codes with detailed messages for exceptions/errors allows for developers to maintain the existing system more efficiently. While the current MVP system is not completely functional, agreeing on how these errors can be propagated to the client will help mitigate some of the complexity as the backend continues to scale and more developers are added to the team.

One approach, is to distinguish different environments using environment variables while maintaining a consistent error response for the production environment.

Example Error Response for Production:

raise HTTPException(
    status_code=500,
    detail={
        "code": "INTERNAL_SERVER_ERROR",
        "message": "An unexpected error occurred. Please try again later."
    }

This is a basic example of how this would look in the controllers that return specific error messages.

except Exception as e:
    if ENVIRONMENT == "production":
        # Optionally log the error with detailed information on the server side
        logger.error("An error occurred: %s", e, exc_info=True)

        # Send a generic error message to the frontend
        raise HTTPException(
            status_code=500,
            detail={
                "code": "INTERNAL_SERVER_ERROR",
                "message": "An unexpected error occurred. Please try again later."
            }
        )
    else:
        # Optionally log detailed error info for debugging in development
        logger.debug("An error occurred: %s", e, exc_info=True)

        # Send the actual error details to the frontend for easier debugging 
        raise HTTPException(
            status_code=500,
            detail={
                "code": type(e).__name__,
                "message": str(e)
            }
        )

Logging (Optional)

Including a basic logging setup will help centralize server side errors for developers as well. These logs can be visible in development and also accessible through EC2 console. Since the MVP of the application is not complete and the number of users will be limited this may not be necessary at the moment, but would be a nice to have in the future.

Logging Resources

Python Logging How To
Python Logging API
Logging levels

Action Items

Introduce environment variable for production, and development environments.
Update server endpoints to respond with generic error messages in the case of exceptions in production environment.
Review controller responses to verify http status codes and messages are relevant for debugging.

Optional Logging Items

Create a global logging module that sets up logging and handles exceptions
Integrate logging in exception clauses

paulespinosa · 2024-11-14T04:13:58Z

@johnwroge This is great! Thanks for documenting how we can approach errors.

For clarification, is the research done here and our decisions focused only on server errors ("HTTP Status 5xx") or will it also cover "domain"/client errors.

For example:

technical server error: A KeyError exception is thrown and the 500 Internal Server Errors in the examples.
"domain"/client error: "Guest was already invited."

Does FastAPI have a global exception handler that can check for server errors and perform the same environment logic in the example code snippet? Or, would we have to include this code in each request handler?

johnwroge · 2024-11-14T15:53:21Z

Thanks @paulespinosa! Yes, those are all really good points and I should have been more explicit in those cases.

We would be able to use FastAPI's built in exception handler. We can register this globally for all the routes in the main.py file after the other routers.

In main.py

from global_error_module import global_exception_handler

app = FastAPI(lifespan=lifespan)
app.include_router(api_router, prefix="/api")
app.include_router(health_router, prefix="/api/health", tags=["health"])

app.add_exception_handler(Exception, global_exception_handler)

Exception Handler

I think integrating a global exception handler that handles both client and server errors would be ideal because it would centralize how errors are handled. We could use guard clauses to check the environment in this section and then depending on the status code respond with limited information in production or the regular response in development. This message information when the exception is raised could be passed to the exception handler in the endpoints so we don't need to write out the logic above. I'll post an example below.

# Function to get the current environment
def get_environment():
    return os.getenv("ENVIRONMENT", "development")

def configure_logging():
    environment = get_environment()
    log_level = logging.DEBUG if environment == "development" else logging.ERROR
    logging.basicConfig(
        level=log_level,
        format="%(asctime)s [%(levelname)s] %(message)s",
        handlers=[
            logging.StreamHandler()
        ]
    )

configure_logging()

@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):

    environment = get_environment()
    status_code = 500
    error_response = {
        "error": "Internal Server Error"
    }
# we should send a more generic message so it doesn't say anything is wrong. 
# We are unable to respond to your request at this time

    # Handle HTTPException differently based on environment
   if isinstance(exc, HTTPException):
        status_code = exc.status_code
        if environment == "production":
            error_response = {
                "error": "An error occurred. Please try again later."
            }
            logging.error(f"HTTPException occurred: {exc.detail}")
        else:
            error_response = {
                "error": exc.detail
            }
            logging.debug(f"HTTPException occurred: {exc.detail}")
    else:
        # Customize the response based on the environment
        if environment == "production":
            if status_code == 401:
                error_response = {
                    "error": "You are not authorized to perform this action."
                }
            elif status_code == 403:
                error_response = {
                    "error": "You are not allowed to access this resource."
                }
            elif status_code == 404:
                error_response = {
                    "error": "The requested resource was not found."
                }
            elif status_code >= 500:
                error_response = {
                    "error": "An internal server error occurred. Please try again later."
                }
            # Return generic error responses for all other cases in production
            elif 400 <= status_code < 500:
                status_code = 400
                error_response = {
                    "error": "Bad request. Please check your input and try again."
                }
             else # do something else here in case nothing else is invoked. 

        else:
            # Return more detailed error responses for development/staging
            if hasattr(exc, "detail"):
                error_response = {
                    "error": str(exc.detail)
                }
            else:
                error_response = {
                    "error": str(exc)
                }
           logging.debug(f"Exception occurred: {exc}")

    return JSONResponse(status_code=status_code, content=error_response)

There are a few cons in this approach, one being the global exception handler may need to be updated for every case which means this function could become very large and more difficult to manage for error specific handling. The other downside is if we were to implement logs in this file, it could result in latency for the application every time an exception is raised.

Do you think it would be better if we only use the global error handler in certain cases and handle the rest inside of the individual functions? All the other projects I've worked on used a global error handler in all cases, but there may be a better approach.

lola3736 · 2024-11-14T21:58:20Z

@johnwroge please add the applicable points label for this issue, let me know if you have any questions

lola3736 · 2024-11-14T22:14:14Z

@johnwroge thanks for adding the points label. Confirming that you are currently working on this issue and if so, can move to "in progress". Do you need to have this issue reviewed and approved by Dev lead?

johnwroge · 2024-11-14T23:29:38Z

Thanks @lola3736. I moved this to in progress. I wrote this for documentation purposes and I will update the points once we know more detailed action items and how much time it will take to implement.

johnwroge added the Complexity: Medium Issue is clearly defined and documentation is provided. Solution has room for interpretation. label Nov 13, 2024

johnwroge self-assigned this Nov 13, 2024

johnwroge added Role: Back End Feature: Refactoring Feature: Infrastructure Changes to site technical Architecture labels Nov 13, 2024

johnwroge added this to P: HUU: Project Board Nov 13, 2024

github-project-automation bot moved this to New Issue Approval in P: HUU: Project Board Nov 13, 2024

johnwroge added this to the 07.03.01 - Migrate from Flask to FastAPI milestone Nov 13, 2024

lola3736 added the points: missing label Nov 14, 2024

johnwroge added the points: 1 Can be done in 4-6 hours label Nov 14, 2024

lola3736 removed the points: missing label Nov 14, 2024

johnwroge removed the points: 1 Can be done in 4-6 hours label Nov 14, 2024

johnwroge added the points: 1 Can be done in 4-6 hours label Nov 15, 2024

paulespinosa mentioned this issue Nov 27, 2024

Convert/Port Flask code to FastAPI #789

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Error Design in Production vs Development Environments #828

API Error Design in Production vs Development Environments #828

johnwroge commented Nov 13, 2024 •

edited

Loading

paulespinosa commented Nov 14, 2024

johnwroge commented Nov 14, 2024 •

edited

Loading

lola3736 commented Nov 14, 2024

lola3736 commented Nov 14, 2024 •

edited

Loading

johnwroge commented Nov 14, 2024

API Error Design in Production vs Development Environments #828

API Error Design in Production vs Development Environments #828

Comments

johnwroge commented Nov 13, 2024 • edited Loading

Overview

Research

Logging (Optional)

Logging Resources

Action Items

Optional Logging Items

paulespinosa commented Nov 14, 2024

johnwroge commented Nov 14, 2024 • edited Loading

lola3736 commented Nov 14, 2024

lola3736 commented Nov 14, 2024 • edited Loading

johnwroge commented Nov 14, 2024

johnwroge commented Nov 13, 2024 •

edited

Loading

johnwroge commented Nov 14, 2024 •

edited

Loading

lola3736 commented Nov 14, 2024 •

edited

Loading