My previous blog post, Exploring the Data/Information Architecture Domain, established what Data/Information Architecture is and touched on the benefits and issues surrounding the topic. In this blog post, I provide a mini-case study summarizing how ABC Bank was able to turn data into actionable insights and information that helped improve the stability of the organization’s Application Architecture.
ABC Bank is a fictitious regional bank located in multiple East-Coast states. The bank’s Lines of Business use several monolithic applications to drive its business forward. Throughout history, these applications have been the subject of numerous critical outages causing upset customers and lost business.
To rectify this issue, ABC Bank needs to understand the root cause of the critical outages and find a way to prevent them before they even occur or drastically reduce the mean time to resolution (MTTR). Being able to predict/prevent these outages would yield significant benefits for ABC Bank, but where should they start?
The answer lies in data and the bank’s ability to convert data into useful insights and information, commonly referred to as “telemetry.” By default, the information systems used by ABC Bank collect raw data regarding the system’s performance. Whether the system is server infrastructure-based, application architecture-based, or reporting-based, IT practitioners in ABC Bank need access to the data and need to be innovative to make the information useful and actionable. Refer to Table 1 to see examples of data sources, data points, and how ABC Bank could potentially use the data to help reduce the number of critical outages experienced.
Using the data points in Table 1, IT practitioners could create alerting and automated self-healing mechanisms to address the problem. Of the three data sources listed, Server Infrastructure and Application Architecture contain the data points that, when addressed, had the most significant positive impacts on the organization.
Using data retrieved from the server infrastructure, IT practitioners could create automated mechanisms to detect and correct issues before they become problems. For example, if the primary disk drive on a Microsoft Windows Server (“C”) became full, the server would crash, potentially causing the entire system to become unavailable. IT practitioners were able to structure server disk data, and automate the analysis of disk drive usage across its servers, searching for drives that are 95% utilized. The automated process would analyze the data and produce a list of servers that were in imminent danger of crashing due to a full disk drive (over 95% utilized). This data would then be fed into another automated process that consumes the list of servers with high disk space utilization and launches a process that deletes pre-approved temporary system files and log files. The automated data analysis and self-healing actions essentially eliminated the problem of ABC Bank’s disk drives becoming full and crashing the systems.
A similar use case can be found in application architecture. Common application-related issues can often be identified before they become problems by analyzing data collected within the application architecture. For example, IT practitioners can establish baseline execution times for processes and workflows within applications. One early warning sign of an information system (application) issue is when a process’s execution time starts to become greater than the average of that process’s execution time throughout history. IT practitioners in ABC Bank have been able to analyze this data and create mechanisms to alert team members of potential issues. Some mechanisms have been developed to fix the issues without human intervention. Additionally, data consistency errors have haunted the application architecture of the organization. Creating data structures and automated quality management validation processes have led to the reduction of application-based critical outages.
Lastly, IT practitioners were able to find valuable data within the organization’s IT Support Ticketing System. ABC Bank was able to identify two key data points from this system: (1) support tickets and their metadata, and (2) metrics regarding the frequency, occurrence, and types of system/application issues that occur within the organization. With this information, ABC Bank could determine when problems were more likely to occur, what systems the issues were likely to occur in, and gain a deeper understanding of how the problem and how to resolve it. The new capability allowed ABC Bank to better plan staff members’ time allocations and on-call shifts.
As discussed in my previous blog post, Exploring the Data/Information Architecture Domain, the Data/Information Architecture domain supports all other domains within Enterprise Architecture (EA). Figure 1 below summarizes how the Data/Information Architecture domain supports all other EA domains.
Although this is a simplified case study, it showcases the value of the Data/Information Architecture domain well. By identifying data sources and data points and structuring the data, ABC Bank was able to create mechanisms that significantly reduced the number of critical system outages by as much as 28% in one year. In these terms, the Data/Information Architecture domain contributed to ABC Bank’s goal of reducing critical outages, which directly corresponds with the organization’s “customer first”/“supplying the best customer experience possible” strategies.