A Tale of Two Intersecting Quality Attributes: Security and Performance
“I want to build a system that is highly secure, scalable, reliable, performant, compliant, robust, resilient, and durable.” Add more adjectives to that to really dream of a quintessential solution. Is that even possible? Where do we make the two ends meet — aspirations vs reality? What is the right intersection point? To answer that, we probably need to categorize the above, and maybe a few more quality attributes, into two major cross-cutting concerns — security and performance, and see where and how to strike a balance between the two, often referred to as architectural or design trade-offs.
Security is a requirement for each and every component involved in the overall system, which may include devices, networks, data, services, applications, storage, etc. However, it is not always mutually exclusive in the sense that the security of one component can partly or fully ensure the security of other components, depending on how they are configured. For example, we often offload SSL at the gateway level as the data is expected to travel within the internal network after that, which is deemed secure to a certain extent. Likewise, the performance of the overall system depends upon how each individual component is behaving. We may have a fast rendering UI but a slow API response that really sucks, or vice versa.
It is often not possible to address the security and performance concerns in isolation as one impacts the other. For e.g., we may want end-to-end encryption of requests as they traverse the system for higher security but that may impact the performance as we would need to do decryption and re-encryption at different stages to extract required parameters (e.g., tokens). We need to identify that right intersection point in order to strike the right balance between security and performance. For that, it is important to look at each individual component and identify what security and performance mean for those individual components. There is no set way, and it really depends upon the non-functional requirements identified for the product.
While performance is critical from a user experience perspective, the need for enhanced cyber security has become much more pronounced given the increased surface area of attack as a result of a distributed yet connected ecosystem. Though we have to consider security and performance at the component level, we can have a broad categorization as defined below.
Security
Data Security
Data Security refers to data to be secured based on its state. Data could be in transit, could be at rest, or could be in use by applications.
Data at Rest
Use cryptographic algorithms to encrypt data at rest, which can only be decrypted using the key. Apply role-based access control to data to prevent unauthorized access.
Data in Transit
Use SSL/TLS with CA certificates to encrypt data in motion and avoid any interception and unauthorized access. Use secure communication protocols such as HTTPS, SSH, etc.
Data in Use
Use fine-grained access control to protect data in use. Mask sensitive data while displaying and avoid storing the same in memory for a long duration.
Regulatory Data Compliance
Network Security
- Use network security groups, access control lists, and firewalls
- Avoid exposing the services unnecessarily to the public internet. E.g., if you are using Azure App Gateway, API Management, and AKS, only App Gateway can be exposed to the public internet to ingress the traffic, and API Management and AKS can stay within private space
- Secure communication between on-premise and cloud networks using VPN, secure communication protocols (HTTPS, MQTT, SSH, SFTP, etc), direct connect/express route provided by cloud providers, etc
Application Security
- Authentication and authorization (OAuth2.0, JWT, basic auth, etc
- Role-based access control
Device Security (in IoT scenario)
- Allow only authorized devices to connect to the IoT network. For example, in the Azure IoT hub, the digital twin of a device can be created, and authentication can be handled using symmetric keys or X509 certificates.
- Use trusted firmware for secure boot-up and regular upgrades to fix any vulnerabilities.
- Physical security of devices is equally important to avoid tampering and unauthorized access.
Performance
- Frontend performance: How fast can UI render the data?
- Backend performance: How fast can backend APIs/services respond?
- Database performance: How fast can database operations/queries be executed?
- Data ingestion performance: How fast can data pipelines comprising ETL/ELT processes be executed?
- Network performance: How fast is the network when it comes to data movement across components?
The above is not an exhaustive list of security and performance considerations. The key thing is how to establish the balance between the two because improving one of them could adversely affect the other. E.g., as we try to increase the network security, the traffic is checked before moving further, which may impact the performance. Similarly, encryption and decryption are an expensive operation both in terms of time and cost. Do we need to secure every piece of information/data? It’s often not easy to find that sweet spot. However, a methodical approach, weighing both functional and non-functional requirements, can get us there. Below are a few high-level guidelines (not exhaustive, though) we may follow:
- Classify data based on its sensitivity and decide to encrypt what really needs to be encrypted.
- Consider SSL termination at the gateway level if the traffic thereupon stays within the internal network.
- Data caching for a limited period to avoid passing through all the security gates repeatedly while retrieving the data.
- Reduce data transmission: Transfer the data between networks/systems/applications that are really needed. Avoid sending additional/unnecessary data.
- Reduce surface area of attack: Involve only necessary services and components that are needed for building the system and avoid redundant/overlapping services. This will not only increase the surface area of attack but also add complexity.
- Reduce network hops by maintaining clear demarcation of where data needs to be processed.
- Use asynchronous processing wherever possible to reduce performance overhead in terms of real-time response and create space for security.
- Use rate limiting and request throttling to reduce the risk of DDoS attacks and avoid overloading the system.
- Perform regular security and performance testing to derive insights into where the system is lacking and where to focus.
While I have talked about the balance between security and performance, cost is another major attribute that directly gets impacted by the choices we make around security and performance. This leads us to a golden triangle (Security, Performance, Cost) from architectural standpoint, similar to a golden triangle (Scope, Timeline, Cost) that we have on the project management side.