Categories
Uncategorized

Chaotic Intersection of 5G, Edge, and Cloud

New architecture paradigms always bring new challenges for adopters. Edge computing and 5G are no different. Carriers, Network Equipment Providers, Enterprises, ISVs, Hyperscalers, startups are all eyeing new market opportunities enabled by 5G and Edge computing. But there are several barriers and inter-dependencies across ecosystem players that are slowing the adoption. 5G Open Innovation Lab was created with a mission to identify new use cases, remove market and technology barriers and accelerate adoption of 5G and Edge computing.

The Problem

Telecom ecosystem players hold a network centric view of the edge where they provide 5G network infrastructure and services, leaving applications to others. Hyperscalers are building and leveraging their own edge locations and partnering with operators as needed to leverage telco central offices. Each hyperscaler has its own edge computing model and strategy pursuit, separate from others for competitive reasons. Multiple 5G consortiums have been spun up creating opportunistic alliances among the operators across the world. In addition, the recent opening of 3.5GHz spectrum from CBRS now allows enterprises to set up their own private 5G networks.

Where does this leave the enterprises, startups, and developers?

Enterprises, as developers and end customers, are forced to deal with the complexity of a fragmented ecosystem and a myriad of options. Investment in 5G and Edge computing applications is risky unless they can ensure future proofing those applications by abstracting them from the underlying infrastructure.

Developers want to build applications that are not tied to infrastructure and can work across all networks regardless of who owns or operates the network or whether it is public or private.

Startups are trying to address many of the gaps but have the arduous task of breaking into a legacy industry that has not quite figured out how to embrace modern technologies and development practices.

Cloud native technologies such as Kubernetes have seen massive adoption in the last few years. The core idea of Kubernetes was to have developers build apps, maintain apps, and deploy them in a scalable and reliable way without dealing with underlying infrastructure complexities. There have been attempts to use cloud native technologies to build 5G network infrastructure. Rakuten Mobile is worth mentioning as they are the first operator to open up the 5G network built using cloud native stack.

A hyperscaler like AWS can own and manage all of the infrastructure, facilities, supply chain, business model and provide cloud services with SLA. However, it is much harder to provide services with SLA on the edge. Imagine the complexity of the provider ecosystem on the edge (Telco’s, Cable providers, co-los, Enterprise DCs, hyperscaler owned PoPs etc), changing network conditions, varying footprint and infrastructure power constraints, and other logistical complexities and you’ll see building an “edge cloud” is an order of magnitude harder than the “cloud” as we know it. Therefore, hyperscalers are using the proven cloud model, striking partnerships with Telco’s and Cable companies to put their own gear in the edge locations and connect it back to cloud.

From a software perspective, what worked well in the mainstream computing world, starts to break down as we start to move to the edge. Microservices, Containers, Kubernetes, Serverless and others are seeing rapid adoption in the enterprise. They do not work the same way when you are dealing with new application requirements on the edge. A new framework is needed.

The Solution

Innovation demands a uniform, vendor-neutral way to build, deploy, and manage applications that are tiered across edge and cloud.

There have been multiple efforts in the industry to solve this problem. Many of these efforts are centered around extending the success of Kubernetes in the cloud to the edge as well. It’s worth mentioning cloud vendor agnostic efforts such as Rancher K3SKubeEdge and cloud vendor specific projects that tie proprietary device management (such as AWS GreenGrass, Azure IOT) to application deployment using standard K8S.

Here are some of the tenets of the solution that we believe is right for the 5G/Edge community:

1.      Unlike “cloud” owned by a few hyperscalers, the Edge would be a single federated cloud where developers will build and deploy once to a federated control plane. Hyperscalers, Telco, Cable providers need to be good citizens of the federated world along with other providers.

2.      Compute would be network driven. In other words, the network event will drive the “on-demand” application deployment (like event driven “Functions-as-a-service” in the cloud world)

3.      New challenges in security would need be addressed by the providers to build the trust in the edge ecosystem. The 1:1 trust model in the cloud world won’t work, requiring a federated trust approach.

4.      Edge deployments need a new software framework for application deployment to balance infrastructure capacity, latency, and cost/monetization model. For instance, providers may have “spot-pricing” of resources depending on available infrastructure capacity. Applications should be able to describe the resource needs and cost constraints to the framework, and the framework should take care of the rest.

We don’t have a set of standards to make “write once, deploy on any edge or any cloud” come true although CNCF driven projects are getting wide adoption across the cloud vendors. Given the complexity of the edge, unique application requirements, competition among operators, cable providers, and hyperscalers, it may be a long time before standards emerge covering edge and cloud.

A new set of players in the ecosystem may emerge and address the lack of standards by mediating across providers while providing a uniform interface to the developers.

At 5G Open Innovation Lab, we nurture innovators so together we can deliver a new world of 5G/Edge enabled applications to the world. By building a truly open ecosystem of partners – from Telco to Cloud and Edge Computing leaders – we hope to accelerate the adoption of this federated approach to 5G application development.

Categories
Uncategorized

Cloud Financial Management

Over the last several years working at Microsoft and AWS, I have seen large enterprise customers struggle with adopting cloud. Lack of cloud education and skills across finance, procurement, engineering, security and other functions has contributed to a state where people are learning by making mistakes, often costing enterprises a lot of time and money. For instance, a recent study by RightScale (acquired by Flexera) estimates 35% of cloud spend (~ $10B) is wasted.

While the complexity of cloud adoption spans across multiple functional areas, in this post I’ll focus on the financial aspects of cloud adoption. Cloud Financial Management (CFM)  is a System and a Discipline that maximizes returns for its stakeholders, as enterprises must manage resources effectively during the transition from traditional capex models to opex models enabled by the cloud. The system part of CFM includes measurement, optimization, and forecasting of spend.The discipline part includes automated and transparent flow of information across functions. Together they drive a collaborative culture focused on a single goal – maximize the return on cloud spend.

Most vendors and studies I have seen focusing on cost optimization strategies miss out the bigger picture of CFM. Don’t get me wrong – these studies are great as they point to common pitfalls and missed opportunities (such as in AWS case – use Reserved Instances or Savings Plan, right size instances, measure utilization, and shut down unused or underutilized assets leveraging AWS Cost Explorer, CUR, and other reporting).

My intent from this post is to bring your attention to a key point often missed by others – Associate your business metrics to your cloud spend and create an organization culture where maximizing the return on spend is everyone’s job – engineering, finance, procurement, operations, and other functions.  Once you have this culture in place, smart folks in your organization will collaborate to identify what’s a good spend (one which drives the business metrics) and what’s wasted. For the wasted spend, they’ll chase down all opportunities to eliminate it using internal measures (via revised architectures and operational controls) as well as external measures (e.g. negotiating with cloud vendors).

As an example, Lyft went all-in on AWS committing $300M over 2 years to AWS and presented at AWS Re:Invent 2019 on how they are establishing CFM at Lyft. While Lyft’s is a great story, what worked for them may not work for you.

For what it’s worth, here is my humble guidance:

If you are just starting your cloud journey, just as you’d put a security framework in place to comply with your organization’s security and compliance policies, put a CFM and governance model in place before you unleash your DevOps teams. Setting up a cloud center of excellence (CCOE) at the outset would be a good way to accomplish this objective.

If you are already on cloud and are scaling, you need to think about the readiness of your organization and devise a plan to establish CFM in your organization. Here are some basic questions to get you started –

  1. If you are in C-suite, do you have a good understanding of cloud as a potential vehicle to help your business grow faster and provide a competitive advantage?
  2. Is your finance team formally educated on Cloud 101, and has a good understanding of controls, guardrails and levers offered by the cloud provider (e.g in AWS case, at the very minimum – AWS Cost Explorer, Cost and Usage Reports, AWS Budgets, AWS Organizations, cost differentials across regions, data transfer costs, on-demand instances, spot instances, reserved instances, savings plan, resource tagging)
  3. Are your engineering teams educated on cost aspects of their architectures and following “Well-Architected” practices as published by cloud vendors?
  4. Do you have the tools/technology in place that allow your functional teams to collaborate in context ? (Emails and spreadsheet exchanges aren’t the most efficient ways). For instance, what’s the approval process for an architectural change that’s going to result in exceeding the forecasted spend? Oh .. wait,  do you even have a well defined process for forecasting cloud spend?
  5. How do product managers or business leaders create a view of COGS and margins when your cloud spend is spread across multiple accounts spanning dev, test, staging, and production? Can that process be improved and automated?

Regardless of where you are in your cloud journey, establishing a robust CFM as core to your growing business would serve you well, maximizing your return on cloud spend

Categories
Uncategorized

Cloud led disruption in Software Systems

Paradigm shifts like main frame, client server, cloud have led to disruptive changes in software systems so as to fit into the prevailing architectural patterns. In most cases, the software systems are first retrofitted into the new architecture, and later re-designed to take advantage of the new architecture.

In this post, I discuss how relational databases were earlier retrofitted into the cloud architecture (many customers are still using the databases in this mode)  but now face a serious competition from databases designed for the cloud. I leave the question open as to what software systems are getting redesigned for cloud, especially when cloud-native innovations such as server-less are happening at a pace that makes it hard for software systems to keep pace with.

 Amazon Aurora is a database designed specifically for the cloud architecture. Amazon Aurora Serverless was added as an on-demand, auto-scaling configuration. Microsoft recently announced Azure SQL Database Hyperscale built for cloud.

 So what’s different about cloud that warrants re-architecture of software systems such as relational databases?

 In a cloud world, compute is decoupled from storage allowing cloud services to be resilient and scalable. Hosts can fail and be replaced independent of storage.  In this environment, for a database workload,  network becomes the IO bottleneck. This bottleneck manifests itself in many ways impacting database performance and adding complexities in commit protocols. For instance, multiphase sync protocols such as 2-phase commits are intolerant of failures (common in cloud systems) requiring complex recovery mechanisms.

There are examples of highly reliable, scalable and available systems (such as Google GFS) built on top of components where failure is a norm. Amazon Aurora’s key design idea was to minimize network IO (the new bottleneck in a cloud system where compute is decoupled from storage) between the compute and storage systems by offloading storage related functions of relational databases to the cloud storage layer. In Aurora, query processing, transactions, concurrency, buffer cache, and access methods are decoupled  from logging, storage, and recovery that are implemented as a scale out storage service.

This Sigmod paper covers Amazon Aurora design details but a few things to note:

  1. The log is the database. No checkpointing , no pages are ever written from DB to storage. Only thing that crosses the network are redo log record.
  2. Durability is achieved at the storage tier by a 4/6 quorum model across AWS Availability Zones (AZ). Log writes are issued in parallel and the first 4 out of 6 write completions mark the quorum requirement, avoiding outlier performance penalty.
  3. Foreground tasks are negatively correlated with background tasks (unlike traditional DBs). Aurora trades off CPU with disk to prioritize foreground tasks.

The net result is a cloud scale RDBMs that is five times faster than MySQL at 1/10th of the cost. Amazon.com has moved completely away from Oracle to Amazon Aurora and other purpose built databases on the cloud.

This begs the question – Like relational databases, what  other major software systems (e.g. SAP) , if any, are getting redesigned for cloud to deliver business benefits (lower cost, improved performance) ? Or are they waiting to be challenged by new systems (from startups or cloud vendors) designed purposely for cloud ?

We live in a world where cloud is the new normal, and pace of innovation is all time high.

Categories
Uncategorized

Customer Success Vs Customer Obsession

In this post, I talk about the challenges the traditional “on-prem” software companies face in transitioning to cloud and subscription based business model. While subscription business models are impacting every industry, I’d focus the post on B2B software companies only.

As traditional on-prem companies such as VMware, Microsoft, SAP, Oracle evolve to cloud and subscription based business  models, they have been creating new “Customer Success” functions in line with the prevalent model at SaaS companies such as Salesforce. In these companies, I have seen people talk about customer success and customer obsession interchangeably .  The way I look at it is – Customer Success is a function while Customer Obsession is a culture. Just having a customer success function in an organization doesn’t make you customer obsessed until you boot the culture.

Nobody knows this better than Jeff Bezos who’s formula for success, I’d argue, is rooted in the “customer obsessed” culture he has built across Amazon. More on Jeff and Amazon later.

Apple is another company that has built an “emotional loyalty” among its customer base  by how they build/design/package products, and drive the experience through its retail stores and Genius Bar.

Let’ start with the basics. If you aren’t familiar with customer success as a function prevalent in SaaS companies and the genesis of customer success at SalesForce, Gainsight folks have written a book on Customer Success that is an easy read. The key idea behind customer success for SaaS platforms, with subscription based business models, is to drive active usage and adoption of new features.  The assumption is – the more the usage, the less the risk of churn and the more value the vendor can charge by upselling new features/capabilities. The book does a pretty good analysis of why on-prem models didn’t need “customer success” function, and therefore the companies operating in the on-prem models optimized their culture and activities differently to serve shareholders the best. 

For companies that started with a SaaS model, it was relatively easier to have customer success as a foundational building block of the corporate culture. In other words, there was no cultural baggage they had to carry from on-prem world. However, for traditional on-prem companies, it isn’t going to be an easy transition as acknowledged by Satya Nadella as he perceives cultural transformation of Microsoft as his job#1.

So the real question is – if the traditional on-prem business model companies have to achieve greatness, what do they need to do? 

There is a lot of material out there that talks about “sustainable competitive advantage”. In one of my MBA classes many years ago, we reviewed the HBR case study of Southwest Airlines. The thing that stuck with me from that study was – Southwest Airlines’ sustainable competitive advantage doesn’t stem from a single plane type, short legs, fast gate turnaround etc. Anyone can copy that model and many airlines actually did try to copy. Yet, Southwest is still going strong. Why ? The sustainable advantage as the study concludes, was their HR system – the type of people they hire and their HR practices.

Jim Collins in his book Good to Great discusses the importance of “first who then what”. Get the right people on the bus and wrong people off the bus.  It is the people and the mindset they bring in, drives the company.

Now let’s look at Amazon, one of the most successful companies of this decade. What’s their sustainable competitive advantage ? Many articles such as this one will tout  – delivery network, technology powered warehouses, size and scale of user base etc. as pillars of sustainable competitive advantage.  I’d argue, it is the “customer obsessed” culture that Jeff has built from ground up, is their sustainable advantage. This applies to all the businesses Amazon is in,  as the same leadership principle #1 “start with customer and work backwards” applies to everything they do. They don’t need to have a “customer success” function, as customer obsession is ingrained in the culture.

So the first challenge leaders need to address, is the people.

Besides people, the leaders need to think about other elements of the culture. Just as DevOps is a cultural change in how software is built and deployed, breaking the silos between developers and operations, there is a new “FieldEng”(I just coined this term) culture that needs be built where the field people and engineering are collaborating to define, build, and deploy new capabilities in rapid iterations. Everyone in this new culture is measured only on one thing – how are they driving customers to succeed, and have similar or well aligned compensation plans. Quota driven incentive models which made sense in the on-prem world,  has to give way to a culture driven by measurable customer outcomes. This will motivate field folks to share ideas on improving products and engineering folks to drive adoption of features in a collaborative innovation cycle that starts with customer. This is the second challenge leaders need to address.

Why is having a customer success as a function not good enough to make you a “great company” in the subscription economy ? This is because customer success metrics are internal indicators of success but don’t necessarily map 100% to how customers measure their own success. As a leader of the company, you need to boot the culture and reward systems to truly measure success by how customers’ measure their success (or outcomes they aspire for) by utilizing your product/service. That is the only path to greatness.