harshc_ | Product & Growth

My goal with architecture discussions is always the same. I want the platform to scale without constant intervention from infrastructure teams. I want engineers to ship services quickly while still maintaining governance, cost visibility, and security. When organizations move from traditional on-prem environments to cloud platforms, architecture decisions become product decisions.

The diagrams below reflect how I typically frame that journey. I start by understanding the current state deeply. Then I define a target platform architecture that removes operational friction while introducing strong platform guardrails. Finally, I design a cost attribution and chargeback model so teams understand how their infrastructure usage translates into spend.

$ platform-transformation

┌──────────────────────────────────────────────┐
│ 1. CURRENT STATE DISCOVERY                   │
│----------------------------------------------│
│ VMware VMs        OpenShift Clusters         │
│ OpenStack Cloud   Hadoop / Legacy Data       │
│                                              │
│ Symptoms:                                     │
│ - Low utilization                            │
│ - Slow provisioning                          │
│ - Platform fragmentation                     │
└───────────────────────────┬──────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────┐
│ 2. TARGET CLOUD PLATFORM                     │
│----------------------------------------------│
│ Google Cloud Platform                        │
│                                              │
│  VPC Hub-Spoke Network                       │
│        │                                     │
│   ┌────┴───────────────┐                     │
│   │                    │                     │
│ GKE Platform     Managed Data Services       │
│ CI/CD Pipelines  BigQuery / Storage          │
│ IAM & Policies   Observability               │
│                                              │
│ Outcome: Standardized developer platform     │
└───────────────────────────┬──────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────┐
│ 3. COST ATTRIBUTION & FINOPS                 │
│----------------------------------------------│
│ GCP Billing Export                           │
│        │                                     │
│ Cost Attribution Engine                      │
│        │                                     │
│ Team / Service Mapping                       │
│        │                                     │
│ Chargeback Dashboards                        │
│                                              │
│ Outcome: Transparent infrastructure spend    │
└──────────────────────────────────────────────┘

pipeline:  discovery  →  platform design  →  cost accountability

Project context and vision

This platform transformation usually begins with a clear business goal. In this case the objective is to evolve a fragmented on-prem infrastructure into a unified cloud data platform that supports self service analytics, real time insights, and scalable product development.

The current environment reflects a typical enterprise pattern. Infrastructure has grown over time through multiple platforms, manual operational processes, and isolated data environments across many departments. As a result, infrastructure cost increases while developer productivity and data accessibility decline.

Current Challenge	Impact
Infrastructure cost exceeding $2M annually	High operational overhead and inefficient capacity usage
Average compute utilization around 35%	Significant idle infrastructure capacity
Software licensing consuming ~25% of IT budget	Vendor lock-in and high fixed costs
Data silos across 15+ departments	Limited cross-functional insights

Beyond cost, operational complexity also becomes a limiting factor. Environment provisioning can take multiple weeks, infrastructure scaling requires manual intervention, and platform operations depend on a small infrastructure team.

Operational Constraint	Typical Outcome
9 FTEs managing infrastructure	High operational burden
2-4 weeks environment provisioning	Slower product delivery
Manual scaling and backups	Increased operational risk

These issues create the foundation for the platform redesign.

The current state: fragmented infrastructure layers

Most enterprises I work with operate a hybrid on-prem environment that grew organically over time. Different teams adopt different infrastructure stacks depending on their needs.

Platform Layer	Technologies	Operational Reality
Virtualization	VMware (vCenter, ESXi hosts)	Traditional VM workloads and legacy services
Container Platform	OpenShift bare metal clusters	Modern containerized services
Private Cloud	OpenStack	Internal IaaS workloads

Each of these platforms solves a real problem, but they introduce fragmentation. Networking models differ. Provisioning workflows differ. Cost visibility is limited. Infrastructure teams spend significant time integrating systems rather than enabling developers.

This environment usually produces several systemic inefficiencies which become visible once we analyze the infrastructure platform by platform.

Cross-platform infrastructure utilization

A detailed assessment across VMware, OpenShift bare metal, and OpenStack environments typically reveals similar patterns of over-provisioning and inefficient resource usage.

Platform	Typical Observation
VMware	Large number of underutilized virtual machines
OpenShift	Containers requesting more CPU and memory than they use
OpenStack	Instance flavors larger than required for workloads

The result is consistent across environments. CPU utilization remains low, memory is frequently over-allocated, and infrastructure capacity remains idle while still generating cost.

Infrastructure platform fragmentation

Platform	Primary Role	Operational Characteristics
VMware	Legacy virtualization workloads	Traditional VM lifecycle management
OpenShift	Containerized application platform	Bare metal cluster operations
OpenStack	Internal private cloud	Self-managed infrastructure services

Operating multiple infrastructure control planes increases operational complexity significantly. Each platform requires separate expertise, monitoring tools, provisioning workflows, and capacity planning processes.

From a platform engineering perspective, the organization is effectively maintaining three separate infrastructure platforms instead of one unified cloud platform.

These architectural conditions lead to three common symptoms.

Symptom	What I observe
Slow environment provisioning	Teams wait days or weeks for infrastructure requests
Limited cost visibility	Costs exist but are difficult to attribute to teams or services
Platform duplication	Similar capabilities implemented multiple times

From a product perspective, the platform itself becomes difficult to operate and difficult to evolve.

The target state: a structured GCP platform

The goal of the target architecture is not simply cloud migration. The goal is to establish a clear platform model that developers can rely on.

At the infrastructure level, Google Cloud becomes the control plane for networking, compute, identity, and security.

Platform Layer	Key Components	Platform Outcome
Networking and Security	VPC hub-spoke architecture, Cloud Armor, IAM, Org Policies	Centralized network and policy control
Compute and Containers	GKE clusters and managed compute	Standardized runtime for applications
Data and Storage	Cloud SQL, BigQuery, object storage	Managed data infrastructure

This structure gives us several advantages immediately.

First, networking becomes predictable. The hub-spoke VPC model creates a centralized networking layer where shared services such as security inspection, connectivity, and logging can be managed once instead of repeatedly.

Second, identity and policy enforcement become consistent across the platform. IAM and organization policies allow governance to be expressed as platform rules rather than operational processes.

Third, infrastructure becomes programmable. Developers interact with the platform through infrastructure as code and automated pipelines instead of ticket-based workflows.

Designing the platform around developer experience

When I design a cloud platform, I treat the developer workflow as the primary interface.

Developer Need	Platform Capability
Fast environment creation	Automated project and namespace provisioning
Secure service deployment	Preconfigured networking and identity policies
Reliable runtime	Managed Kubernetes clusters and autoscaling

The goal is not simply to run workloads in the cloud. The goal is to create a platform where teams can build and operate services without needing to understand every infrastructure layer underneath.

Cost attribution and chargeback

Once infrastructure moves into the cloud, cost transparency becomes essential. Without a clear cost model, cloud adoption quickly leads to uncontrolled spending and difficult conversations with finance teams.

The foundation of the model starts with GCP billing exports.

Step	Description
Billing export	Raw usage data exported from GCP billing
Cost attribution engine	Logic that maps resource usage to services, teams, or environments
Chargeback reporting	Internal dashboards showing team level cost consumption

The cost attribution layer becomes the bridge between infrastructure usage and financial accountability.

Instead of treating cloud costs as a single operational expense, we distribute costs based on actual usage patterns.

Cost Dimension	Example Attribution
Project	Application or product team
Environment	Production, staging, development
Service	Individual microservice or platform capability

This model enables internal chargeback or showback depending on the organization's financial maturity.

Product delivery model

Large platform transformations require a clear product delivery structure. Instead of treating this purely as an infrastructure migration, I structure the initiative as a platform product with defined stakeholders and delivery teams.

Stakeholder model

Stakeholder	Responsibility
CFO	Financial oversight and ROI validation
VP Engineering	Execution ownership and delivery alignment
Product Owner	Strategy, prioritization, and roadmap management
Cloud Architect	Architecture design and migration patterns
Data Engineers	Data pipelines and modeling
FinOps Lead	Cost attribution and optimization
SRE / DevOps	Platform reliability and automation

The platform ultimately supports more than a thousand internal users across multiple business functions including network operations, finance, marketing, and product teams.

Product roadmap and epics

Once the architecture direction is defined, the work is broken into epics and features across infrastructure, data platform, governance, and developer enablement tracks.

Epic Category	Example Capabilities
Cloud foundation	Organization structure, networking, IAM
Platform services	Kubernetes platform, CI/CD, observability
Data platform	Data pipelines, warehouse, governance
FinOps	Cost attribution, chargeback dashboards
Developer experience	Self-service environments and automation

Implementation roadmap

The delivery roadmap typically progresses through multiple maturity stages as the platform capabilities expand.

Phase	Objective	Key Deliverables
Phase 1	Cloud foundation	Networking, IAM, project structure
Phase 2	Platform compute layer	GKE clusters and workload migration
Phase 3	Data platform	Analytics infrastructure and pipelines
Phase 4	FinOps and governance	Cost attribution and chargeback
Phase 5	Platform maturity	Self-service platform capabilities

How I roll this out

Large platform transformations rarely succeed if we try to redesign everything at once. I approach the rollout in structured phases.

Phase	Objective	Platform Focus
Phase 1	Establish cloud foundation	Networking, IAM, and project structure
Phase 2	Standardize compute platform	GKE clusters and workload migration
Phase 3	Introduce cost attribution	Billing export and attribution engine
Phase 4	Platform maturity	Automation, developer self-service, governance

Each phase delivers immediate value while setting up the next layer of the platform.

What I measure

Architecture only matters if it improves how the platform operates. I evaluate success through operational metrics rather than architectural diagrams.

Metric	Why it matters
Environment provisioning time	Measures developer productivity
Infrastructure utilization	Indicates platform efficiency
Cost visibility by team	Ensures financial accountability
Deployment frequency	Signals platform usability

If those metrics improve, the architecture is working.

Closing

Platform architecture should not be designed purely as infrastructure diagrams. It should be designed as an operational product that developers rely on every day.

By moving from fragmented on-prem environments to a structured cloud platform, we reduce operational complexity, improve developer velocity, and introduce clear cost accountability.

That combination ultimately turns infrastructure from a bottleneck into an enabler of product development.

How I think about moving from on-prem infrastructure to a GCP platform