Blog 06: Part 03 – Models, Models, and More Models

I have this memory way way back as an undergrad completing the IST program of being confused about intermediary levels of abstraction.  Specifically, we learn about bits, bytes, ones and zeros as the building blocks of all things computer.  A little later on, a lot of the math behind why this is so powerful….IST 230 I think, with truth tables, Boole & De Morgan, and propositional logic concepts.  In my case, I actually got a little more practical experience with this stuff, I took a few CSE courses my first semester and one of them was a digital logic course where we built out logic circuits on breadboards.  But then, all of a sudden, it’s coding classes, jumping a couple levels of abstraction.  Yeah, they mention assembly and how the code is compiled and linked into machine code….but they never really EXPLAINED how that works.    (As an aside, if you enjoy puzzle games, check out TIS-100 by Zachtronics.  It’s a puzzle game where you solve problems by coding in a psudo assembly language on functional 8088 era hardware and it’s a blast!  http://www.zachtronics.com/tis-100/

Now, in my case, I had completed CSC 120 prior to switching to IST, so that counted as my first three programming credits.  That was in C and we learned about pointers and memory management…contrasted with IST 240 a few years later which was programming in C# which has a garbage collector…memory management didn’t come up once.  So where am I going with all of this?  I sometimes feel the same way about EA models…and to a lesser extent, to IT road maps/blueprints.

I’m sick of sitting in meeting and having people speak in such high level terms that they’re not actually SAYING anything.  Oh, IT is moving from a vertical to a horizontal organization!  Ok….so what does that mean?  We’re doing the exact same things now that we’ve been pulled out of the various different divisions into a centralized shared service, except now I have “two” bosses and am doing the exact same thing.  Or on the EA side of things, you can look at a high level model all you want, but a lot of times you can’t tell if a process is a robust IT system or a literal “swivel chair” person reading data from one system and entering it into another.  Models are great tools for documenting systems because lots of times they omit levels of granularity that decision makers simply don’t need or don’t care about…but it is absolutely critical that someone somewhere is still aware of the details and involved enough to be able to raise the alarm if things go to far off the rails.

I read through the Gartner job description for an Enterprise Solutions Architect and that fits my day to day responsibilities to a T.  The thing I struggle with is everyone wants to deal with the high level concepts and skip over the details of how the work actually gets done or how the system actually needs to function.

Blog 06: Part 02 – Capability Modeling in Action

Again, this week’s topic dovetails nicely with a lot of what was covered back in EA 873.  Last year I was actually able to put my modeling skills to good use at work.  I was working as an infrastructure IT project manager for GE’s Power business at the time.  In the months right after GE’s acquisition of Alstom Power and Grid, the integration of the two massive groups was not going so well.  GE Power at the time was ~80K employees globally and Alstom was right in that same neighborhood. We especially had some difficulty with a lot of the service delivery components for basic IT servicesm e.g. ordering a PC and using it to access certain applications.  The trouble was, while the networks were ostensibly integrated, due to them both using overlapping private IP ranges, it was really two networks with some really fancy DNS scripting.  To make matters worse, a lot of the legacy Alstom resources were not running in strategic data centers, but sitting at remote sites in a closet, on some goofy domain that was causing issues.  Like I said, things were a bit kludge.

The end user support folks were getting hammered, because of the nature of a lot of the issues.  Brand new PCs were arriving missconfigured or employees were only able to access legacy Alstom or legacy GE resources, but not both at once.  As a stop gap, I was tapped and put on a special project to get to the root cause of the issues since IT at this point was looking pretty bad.  (My first role with GE was with the client team, so I had some experience in the service management area).  Things got off to a rocky start.  Basically everyone was using the team as an excuse to skip over the service desk.  The service desk was having trouble with integration issues, but we found very quickly that EVERYTHING all of a sudden was an “integration issue.”  The sheer volume was overwhelming, but the powers that be, in this case the non-IT business leaders who were hearing the complaints and sending folks to us simply didn’t care.  They couldn’t, or wouldn’t, understand what was going on and everyone was concentrating on fixing symptoms rather then fixing the causes of all the issues.  I created the following capability model which very clearly outlined the various roles of all those involved and detailed all responsibilities from start to finish.  Got a nice pat on the back.

Stakeholders

  • End User – The actual end users with the issue.  In practice, these are often executive level employees
  • Integration Team War Room – The first attempt at a collaborative helpdesk containing contract resources from both GE and Alstom service desk vendors, specializing in integration related issues
  • Corp Shared Services Resources – This includes the regular GE service desk as well as the level 2/3/4 domain specific resources, e.g. email, AD, single sign on, Identity Management/HR, etc
  • PT SWAT Team – My group, comprised of various project managers from GE Power’s HQ IT function, with expertise in End User/Client and Network.  We have a higher level (in scope) of technical understanding and a much better professional network within GE Corporate IT
  • Business Integration Leadership – This are business and IT leaders from both GE and GELA who are fielding calls from angry executives and put the SWAT team together

Process

This process STARTS after the regular service desk process has failed.  An end user who is not getting resolution to their issue escalates to the integration war room.  If they are unable to resolve, they create a ticket in a support queue spun up just for the SWAT team.  Often, information provided from the war room is incomplete or incorrect.  SWAT team works directly with end user as needed to triage issue.

Once issue is understood, SWAT team engages L2/L3/L4 teams as required while retaining ownership of case and acting as facilitator until issue is resolved.  Once resolved, the steps to resolution are documented and provided to the regular service desk and a RCA/status update is provided to the business integration leaders.  We will follow up with the end user in a few days to confirm issue resolution.

The SWAT team’s stated, internal mission statement, is to make ourselves as redundant and unneeded as quickly as possible by ensuring solutions are discovered quickly and documented sufficiently, so we can get back to actually doing our real jobs.

 

Blog 06: Part 01 – Business Architecture’s Place in the Organization

William Ulrich does a nice job of defining business architecture in the reading “The Essence of Business Architecture” but I want to take it another step forward.  From my perspective as a career IT guy, BA and the resultant modeling that comes with it operates at a higher level of abstraction than IT systems.  Rather than dealing with information and data flows, it deals in business capabilities.  If you think of the IT OSI model, BA would occur on top of it all, off the page.  In fact, going back to the Scott Bernard “EA Cube” from way back in EA 871, BA is really synonymous with the “Business” portion of the model.  The technology may be the underlying infrastructure for how information flows and data is stored, but BA is the unifying glue that holds the entire thing together, unless your organization is doing IT simply for the sake of IT, rather than generating business value.  Which, to be fair, is a mistake I’ve seen some organizations where Enterprise Architecture is dominated by IT personnel make.

Starbucks Process Map

Last fall when I took EA 873, I had the opportunity to generate some actual models of business process capability models, so I’ll discuss a simple one here.  As an undergrad, I worked for Starbucks as a barista.  The model to the right is how we would end up processing a beverage transaction.  Note that things are only dealt with in the abstract, we’re talking about transactions, people, and products.  Completely missing from this diagram is all the supporting functions and process that enables these higher level functions.  For example, the registers transmit drink orders directly to the espresso bar.  Based on the number, type, and size of those drinks, the inventory system knows how much milk, coffee beans, cups, etc that was used and can order more accordingly in the next delivery shipment.

The key take away here, for the business analyst, is to not get caught up in the details of the inside of a process step (unless that particular step itself is being modeled).  Instead, BAs should focus on the inputs and outputs of a particular process, along with any constraints involved.

Blog 05: Part 03 – The Psychology of Security

It security has always interested me and while I’m not directly responsible for any IT security related decisions, it’s a bit of a hobby for me.  One thing that I’ve observed over the years from experience working in multiple different industries and environments (IT and non-IT) that there is a tendency for employees to prefer the course of action which requires the least amount of work.  It sounds silly when said like that, because of COURSE people in the aggregate are going to prefer working “smarter” not harder, but this has some pretty profound implications for IT security architecture.   There was a specific graphic in the Fujitsu security architecture document in the readings this week that reminded me of this concept.  The graphic (displayed here) shows three desirable traits of ESA: Cost, Security, and Convenience (usability), where maximizing one of those points is done so at the expense of one or more of the other.  You may have a very secure system, but if it’s not easy to use, you will have a user population actively subverting security (either intentionally or unintentionally), netting lower security overall.  Now for a couple of stories:

Figure 6-1 from the Fujitsu Enterprise Security Architecture document

Story 01: At my organization, the vast majority of internal web applications and services are authenticated via SAML and single sign on accounts in a browser.  You even need to authenticate to the proxy to visit external websites.  One of the biggest user complaints was due to the “chore” of having to type in their password each time they opened a new browser session.  How long could this take?  We’re talking the time it takes to tap out a extremely familiar set of characters, this SSO account is after all used for EVERYTHING.  But the complaints from having to type in passwords every few hours or on a new browser session grew so strong, that IT introduced a concept known as “reduced sign on,” where now the password would only be required once every 12 hours and would persist even if the browser was closed.  End user response to such a minor change was incredibly positive.

Story 02:  Passwords again, sort of!  For remote employees must first login to VPN service in order to access network resources while off network.  Authentication happens with a physical RSA hardware token and a user known PIN.  However, again, there was a huge uproar from the imposition of having to enter in the randomly generated digits from the token, to the point we had users even sharing tokens and PINs with each other, which is obviously against policy, but they claimed they NEEDED to in order to work.  The solution was a new connectivity service, which rather than prompting the user to enter in a token+PIN combo, would create an application specific encrypted SSL tunnel and authenticate via a certificate installed on the PC, all in the background.  It’s still VPN….just….hidden from the user.  Again, this new service was received very very positively,  and everyone now talks about the hellish days back when six digits had to be entered from a token.

The take away from these two stories is to not discount the user experience when architecting systems.  I think also, that security can be improved at organizations who think of ways to shepherd users towards good security practices by making those practices desirable or easier.

I’m reminded by the concept of a Desire Path which I’m sure everyone has seen.  This is a worn spot in the grass from foot traffic on campuses, parks, etc.  It’s where enough people take take the short cut and cut the corner by walking where they shouldn’t, rather than staying on the pavement. Administrators there too have a decision to make:  They could put up ropes and “Keep off the Grass” signs, probably to little avail…or they could simply pave the desired path!

 

Desire Paths – a path created as a consequence of erosion caused by human or animal foot-fall or traffic.

Blog 05: Part 02 – Software Defined Hardware

Ahh software.  Soon, everything will be software.  It’s great.  Last August I got myself a ham radio license and have been wading into the hobby over the last year, trying out different operating modes, playing with different antennas, buying super expensive radios.  I joined the local club and I am about 30 years under the average age.  And like a lot of old folks, they LOVE to complain about the state of things now and how great everything was in the past.  In this case, they do not like the fact that modern radio transceivers are essentially software running on a PC, rather than crystals, tubes, and oscillators.  It’s the same mentality I find a lot of the older solutions architects I work with have:  New tech is to be feared.

But you cannot deny the flexibility virtualization gives you.   If I pulled up any physical server that’s dedicated to a specific application, I would find 90% of the resources are never used.  You could run five virtual servers on that same hardware and the applications wouldn’t even notice.  The next big virtualization push, IMO, is going to be in the network hardware space.  Software defined networking is going to bring a huge amount of flexibility and THEORETICALLY an increased level of security, as various hosts will be logically segmented and fire-walled where today they are not due to the expense of purchasing that extra hardware or running those physical cables.

I say theoretically, because it will be critical that architects design secure networks and network administrators implement everything correctly.  Too many times I have seen lazy work network admins do some highly questionable things simply to get the heat off themselves when an application was down.  (The problem was a firewall rule was blocking a specific application.  Strike 01.  Of course, the application guys in charge of it couldn’t tell the network guys which ports/protocols needed to be passed through the firewall. Strike 02.  So because this was a production outage and everyone was yelling and pointing fingers, the network admin set the firewall to allow all traffic on any port to get it working.  And since it was working, everyone stopped caring about it and the “work around” became permanent.  Strike 03, you’re out!)

Networks that are defined by rules are only going to be as good as those rules.

At least on the hardware side of things, it’s a bit more difficult to screw up.  In GE’s Aviation business, because they have DoD contracts, they actually have two, physical, separate networks in their facilities, one for employees and one for contractors.  Contractors are completely forbidden from connecting to the primary network.  They’re effectively airgapped.  In my previous example, it is literally impossible for a unwary network admin to change a firewall rule to allow access across networks, they are physically different. But not so in the case with SDNs.

Software defined networks are going to proliferate rapidly due to the low cost and it will be critical from a security architecture standpoint to ensure 1) Robust architecture and best practices are created as standards and 2) That those standards are enforced and periodically reevaluated.

Blog 05: Part 01 – Venting

I’m going to depart from my usual format for this entry to vent a little about IT security and how it is often perceived in organizations.  While this is mostly going to be me whining, I think some of what I’m going to say is germane to this week’s topic of security architecture:  Today’s corporate culture enables poor IT security practices. 

If you ask any executive, IT or otherwise, of course they will say that IT security is critically important.  After all, data today is an enterprise resource and it makes good business sense to safeguard it as such.  And then there are the more nebulous moral or ethical considerations about obligations of safeguarding other individuals PII.  (I’m looking at you, Equifax!)  Nobody sets out to get hacked, but having security breaches are simply the culmination of years worth of poor security practices in the aggregate.  And bad luck.

 

Being 100% secure is an impossibility.  There is a point of diminishing returns where further investment in time or resources provides poor return.  The location of this line though, is usually well before that point because of the risk tolerances and financial pressure to reduce IT spend.  In my relatively short IT career, I have seen countless countless times IT executives making decisions that maximize short term costs at the expense of the long term, knowing full well they won’t be around to be responsible once the entire setup becomes unsustainable.  It is very difficult to justify spending money on intangible things like IT risk.

Oh, we need to pay $100K to fix Critical System X because there is a defect an attacker might leverage?  Hmm, I sometimes forget to lock my front door and I’ve never been robbed because of it.  I think instead we’ll not pay the money so I get a nice bonus.

Hopefully these recent hacks will make it easier for IT leaders to do the right thing.  My fear though, is that with every increasing hack, people will become desensitized to it.  And it it won’t be long until someone starts calculating that it might actually be cheaper to deal with the fallout of a breach rather than spending money to prevent it in the first place.

Blog 04: Part 03 – IT = OT

There is quite a buzz recently about the addition of “operations technology” within the purview of IT at many organizations. I say recent, even though the Gartner article (G00214810) in with this section’s readings is dated 2011.  But I guess it’s their job to predict future trends.  Good Job!

So, the idea with OT with respect to manufacturing environments is threefold:

  • 1) Integration – This one is not a new idea. SCADA architecture has been around for a while now, however what I have noticed being a recent trend is the ability of machinery and equipment to be networked “out of the box” without needing to purchase additional hardware or software licenses.  Much like the proliferation of IoT devices, the thinking seems to be “When in doubt, put a NIC in it!”
  • 2) Risk Reduction – Hand in hand with the explosion of all these new hosts on the network comes dealing with the associated risks involved.  In my experience, these shop floor devices interface with the network in one of two ways.  A) Indirectly – via a dedicated PC running proprietary interface software and sometimes equipped with a specialty hardware interface.  The PC often “comes with” the machine and is supported by the vendor.  B) The piece of equipment has an embedded version of Windows or Linux built in and can interface with the network directly.  In both these scenarios the risk is derived from having un-managed hosts on your network.  Security tools designed for consumer versions of Windows don’t work on the embedded flavors of the OS and are non-existent on the Linux side.  (This would be fine if it wasn’t always a horribly outdated and un-patched Linux distro on the machine.)  So, because of this additional risk, it is good practice to segment these devices from the rest of the network either logically or physically.
  • 3) Standardization – This one is currently the weakest of the three and hopefully will be fleshed out better in the near future.  Most of the risks I mentioned in #2 would be minimized if all of these devices shared a common operating system or even a common communications protocol.  I feel like in the IoT space manufactures are still jockeying to have THEIR protocol become the next standard, so it kind of feels like a “walled garden” type situation where in a sense you’re locked into a specific manufacturer.  There already exists a (fairly new) standard ISO 20922:2016 however now vendors will actually need to adopt it.

My advice to EAs working on OT architectures in the near future are to concentrate on a robust and secure network.  Dedicated, heavily fire-walled, and segemented is the way to go.  Keep the entire shop floor firewalled off from the corporate network.  Keep equipment from different manufacturers separated in their own VLANS.  And within those VLANs segment different communications protocols within their own subnets.

 

Blog 04: Part 02 – I&O Leader by Committee

In my last post a few days ago I alluded to being assigned to a new project and the Gartner reading on the things your should do during your first 100 days as a new Infrastructure & Operations leader (G00201291) is turning out to be pretty topical.  GE is an interesting organization to work within due to its size and the number of industries that it operates in.  Building gas or steam turbines may be similar to building an aircraft engine, but very different from building a locomotive, MRI machine, or water treatment plant.  Not to mention the sales and service aspects, supporting these products is very different as well.  Due to this, EA initiatives were handled mainly from the level of these so called “Tier 1 Businesses.”  This meant that GE healthcare could tailor their architecture towards their specific business needs and goals and GE Power could do the same.  The net effect of this though is a loose confederacy of architectures when looking at GE as a whole. And I’m being kind.  There are certain systems, e.g. HR and some IT shared services such as email, collaboration, and end user support services that are leveraged across the enterprise, but for the most part each of the high level businesses operate within in their own silos.

So why is this important?  Well, within the last two years GE Digital was created and since then all IT personnel formally working for the tier 1, tier 2, and tier 3 businesses have not been reorganized into this new Information Technology/Operations Technology business unit.  IT now is a completely horizontal function.  And the transformation, which also came with a voluntary job reduction package that a lot of employees took advantage of, was a bit disruptive in terms of current ability to support a lot of these legacy systems.  There are now a few cases where the only folks who were familiar with an application’s architecture are gone now…and knowledge transfer is was minimal or non-existent.  So what does all of this rambling have to do with this week’s topic?

With the creation of the new centralized, horizontal IT organization coupled with the removal of IT personnel within the “businesses” the main goal is to create one single architecture across all these previously siloed tier 1 businesses.  But while GE Digital does have a CIO (and, a CEO!) the actual architecture designs are being left to the EA team to design, the solutions architecture team to build, and the business folks to just kind of….accept, I guess.  It is going to be an interesting next few months as this becomes fleshed out, I know for a fact that the businesses who formally had IT teams of their own are feeling that loss, so this is just as much as a shift in culture as it will be a shift in technical architecture.   Rereading this post, I didn’t do a very good job of tying my story back to the reading, but what I want the key take away to be here is that not always is their a single IT/OT leader in a position to make decisions all on their own.

Blog 04: Part 01 – Getting Stuck in the Weeds

I want to remind everyone that my background is from IT, specifically the infrastructure (boring!) side of IT, so in the Gartner reading (Robertson, G00160635) in this section on the importance of not concentrating solely on enterprise technology architecture I can’t help but agree this is a pretty common mistake of EA teams.  Part of it isn’t our fault.  In every professional organization that I’ve been a part of, the EA team is part of the IT organization.  The architects are usually highly technical people with education in IT disciplines.  As such, while acknowledging that they are creating a holistic architecture, there is a tendency to overly focus on the “lower abstraction” levels of the design.  Arguing over an ISP circuit verses an MPLS circuit, for example.  Yeah, they’re different, cost different, and have implications for the rest of the design…but at the end of the day it only represents a communications channel.  Choosing one or the other is a stupid hill to die on, in the grand scheme of things, especially if you are going to need that political capital later on.

This is all exacerbated by overly technical conversations driving away stakeholders from the business side.  Literally.  I have been on calls where over a few sessions the non-IT folks simply stopped coming due to nothing within their scope being discussed.  Even if there is a highly technical facet of the architecture that MUST be discussed in a cross functional meeting, you have to put the technology in terms that your business partners can understand, which usually time and money.

Any “moratorium on ETA” approach will make people angry. Get used to it. EA isn’t about taking the path of least resistance. Persevere. But also challenge yourself and others to do things differently than before.

I had this situation happen to me just this morning.  I’ve been just assigned a new project, to assist in migrating several applications from infrastructure that is owned by a business unit that has been divested.  The TSA clock is ticking.  On the call were a few IT project managers, solutions architects, enterprise architects, and all of the executive sponsors from the business side for the ~15 applications that have to move.  The conversation very quickly got into the topic on how Chef scripts can be written to move some of these apps over to AWS…and you could just tell that 75% of the people on the call were mentally checking out.  Fact is, those folks DO NOT CARE about how their applications work and what needs to happen to migrate them, so long as they are moved without disruption or incident without breaking the bank.

Blog 03: Part 03 – Disaster Recovery

Going to depart from my original plan for entry three this week to talk about disaster recovery, since I experienced a hard drive failure on my work laptop on Wednesday and DR is an important aspect of data architecture, so it still fits thematically with the other entries.  First though, some context.  I would say that my organization has an unstructured data problem.  We have tons and tons of unclassified data sitting on end user hard drives, network file shares, and cloud storage environments.  Tons of it.  Duplicate data.  Incorrect data.  Corrupt data.  End users who bother to backup their personal data do it poorly: It’s not common for someone’s personal file share to be filled with manual redundant copies, e.g. Jan, Jan-Feb, Jan-Mar, Jan-Apr, etc etc  Some of them even encrypted their data, which sounds good on the face of it, but they didn’t use an appropriate managed encryption system, so not if but WHEN they lose/forget their password, nobody is able to unlock it for them.  For the rest of the user base, they don’t backup.

Nobody cares about backups, until they need them…then it’s always IT’s fault that they don’t exist and everyone is scrambling to and paying a lot of money to recover the data.  I think my favorite story was a sales guy who spilled wine on his laptop.  Hard drive was toast, but we have an agreement with data recovery vendors who can actually recover data from fairly destroyed drives, as long as we pay through the nose.  The sales guy insisted he had important data that needed recovery, so away the drive was sent.  When the ~$8,000 bill arrived, it prompted some questions:  he had to justify the expense.  Turns out, the data he was after was for his fantasy football team.  Whoops.

For the last several years, end user PCs have been backed up to the cloud (much like our servers).  It happens automatically, many times a day, and incrementally, only files that have been changed are backed up.   And that’s great for the end user, but all of these backups of (questionably useful) data take up bandwidth and bandwidth isn’t free.  In fact, due to the expenses of bandwidth plus the costs of the backup service itself increasing from $4 per user per month to $9.50 per user per month, the company made the decision to end the cloud backup offering.  At the same time, for data loss prevention issues, all writing to externally mounted volumes is now blocked as well.  The only means for end users to backup their data is by manually copying to internal cloud storage services designed for collaboration, not archival purposes.   This has generated a great deal of animosity towards IT, however it has saved quite literally millions of dollars, since we have ~300K employees globally.