How I think about moving from on-prem infrastructure to a GCP platform

CloudPlatform EngineeringGCPArchitectureFinOps

My goal with architecture discussions is always the same. I want the platform to scale without constant intervention from infrastructure teams. I want engineers to ship services quickly while still maintaining governance, cost visibility, and security. When organizations move from traditional on-prem environments to cloud platforms, architecture decisions become product decisions.

The diagrams below reflect how I typically frame that journey. I start by understanding the current state deeply. Then I define a target platform architecture that removes operational friction while introducing strong platform guardrails. Finally, I design a cost attribution and chargeback model so teams understand how their infrastructure usage translates into spend.

$ platform-transformation

┌──────────────────────────────────────────────┐
│ 1. CURRENT STATE DISCOVERY                   │
│----------------------------------------------│
│ VMware VMs        OpenShift Clusters         │
│ OpenStack Cloud   Hadoop / Legacy Data       │
│                                              │
│ Symptoms:                                     │
│ - Low utilization                            │
│ - Slow provisioning                          │
│ - Platform fragmentation                     │
└───────────────────────────┬──────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────┐
│ 2. TARGET CLOUD PLATFORM                     │
│----------------------------------------------│
│ Google Cloud Platform                        │
│                                              │
│  VPC Hub-Spoke Network                       │
│        │                                     │
│   ┌────┴───────────────┐                     │
│   │                    │                     │
│ GKE Platform     Managed Data Services       │
│ CI/CD Pipelines  BigQuery / Storage          │
│ IAM & Policies   Observability               │
│                                              │
│ Outcome: Standardized developer platform     │
└───────────────────────────┬──────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────┐
│ 3. COST ATTRIBUTION & FINOPS                 │
│----------------------------------------------│
│ GCP Billing Export                           │
│        │                                     │
│ Cost Attribution Engine                      │
│        │                                     │
│ Team / Service Mapping                       │
│        │                                     │
│ Chargeback Dashboards                        │
│                                              │
│ Outcome: Transparent infrastructure spend    │
└──────────────────────────────────────────────┘

pipeline:  discovery  →  platform design  →  cost accountability

Project context and vision

This platform transformation usually begins with a clear business goal. In this case the objective is to evolve a fragmented on-prem infrastructure into a unified cloud data platform that supports self service analytics, real time insights, and scalable product development.

The current environment reflects a typical enterprise pattern. Infrastructure has grown over time through multiple platforms, manual operational processes, and isolated data environments across many departments. As a result, infrastructure cost increases while developer productivity and data accessibility decline.

Current ChallengeImpact
Infrastructure cost exceeding $2M annuallyHigh operational overhead and inefficient capacity usage
Average compute utilization around 35%Significant idle infrastructure capacity
Software licensing consuming ~25% of IT budgetVendor lock-in and high fixed costs
Data silos across 15+ departmentsLimited cross-functional insights

Beyond cost, operational complexity also becomes a limiting factor. Environment provisioning can take multiple weeks, infrastructure scaling requires manual intervention, and platform operations depend on a small infrastructure team.

Operational ConstraintTypical Outcome
9 FTEs managing infrastructureHigh operational burden
2-4 weeks environment provisioningSlower product delivery
Manual scaling and backupsIncreased operational risk

These issues create the foundation for the platform redesign.

The current state: fragmented infrastructure layers

Most enterprises I work with operate a hybrid on-prem environment that grew organically over time. Different teams adopt different infrastructure stacks depending on their needs.

Platform LayerTechnologiesOperational Reality
VirtualizationVMware (vCenter, ESXi hosts)Traditional VM workloads and legacy services
Container PlatformOpenShift bare metal clustersModern containerized services
Private CloudOpenStackInternal IaaS workloads

Each of these platforms solves a real problem, but they introduce fragmentation. Networking models differ. Provisioning workflows differ. Cost visibility is limited. Infrastructure teams spend significant time integrating systems rather than enabling developers.

This environment usually produces several systemic inefficiencies which become visible once we analyze the infrastructure platform by platform.

Cross-platform infrastructure utilization

A detailed assessment across VMware, OpenShift bare metal, and OpenStack environments typically reveals similar patterns of over-provisioning and inefficient resource usage.

PlatformTypical Observation
VMwareLarge number of underutilized virtual machines
OpenShiftContainers requesting more CPU and memory than they use
OpenStackInstance flavors larger than required for workloads

The result is consistent across environments. CPU utilization remains low, memory is frequently over-allocated, and infrastructure capacity remains idle while still generating cost.

Infrastructure platform fragmentation

PlatformPrimary RoleOperational Characteristics
VMwareLegacy virtualization workloadsTraditional VM lifecycle management
OpenShiftContainerized application platformBare metal cluster operations
OpenStackInternal private cloudSelf-managed infrastructure services

Operating multiple infrastructure control planes increases operational complexity significantly. Each platform requires separate expertise, monitoring tools, provisioning workflows, and capacity planning processes.

From a platform engineering perspective, the organization is effectively maintaining three separate infrastructure platforms instead of one unified cloud platform.

These architectural conditions lead to three common symptoms.

SymptomWhat I observe
Slow environment provisioningTeams wait days or weeks for infrastructure requests
Limited cost visibilityCosts exist but are difficult to attribute to teams or services
Platform duplicationSimilar capabilities implemented multiple times

From a product perspective, the platform itself becomes difficult to operate and difficult to evolve.

The target state: a structured GCP platform

The goal of the target architecture is not simply cloud migration. The goal is to establish a clear platform model that developers can rely on.

At the infrastructure level, Google Cloud becomes the control plane for networking, compute, identity, and security.

Platform LayerKey ComponentsPlatform Outcome
Networking and SecurityVPC hub-spoke architecture, Cloud Armor, IAM, Org PoliciesCentralized network and policy control
Compute and ContainersGKE clusters and managed computeStandardized runtime for applications
Data and StorageCloud SQL, BigQuery, object storageManaged data infrastructure

This structure gives us several advantages immediately.

First, networking becomes predictable. The hub-spoke VPC model creates a centralized networking layer where shared services such as security inspection, connectivity, and logging can be managed once instead of repeatedly.

Second, identity and policy enforcement become consistent across the platform. IAM and organization policies allow governance to be expressed as platform rules rather than operational processes.

Third, infrastructure becomes programmable. Developers interact with the platform through infrastructure as code and automated pipelines instead of ticket-based workflows.

Designing the platform around developer experience

When I design a cloud platform, I treat the developer workflow as the primary interface.

Developer NeedPlatform Capability
Fast environment creationAutomated project and namespace provisioning
Secure service deploymentPreconfigured networking and identity policies
Reliable runtimeManaged Kubernetes clusters and autoscaling

The goal is not simply to run workloads in the cloud. The goal is to create a platform where teams can build and operate services without needing to understand every infrastructure layer underneath.

Cost attribution and chargeback

Once infrastructure moves into the cloud, cost transparency becomes essential. Without a clear cost model, cloud adoption quickly leads to uncontrolled spending and difficult conversations with finance teams.

The foundation of the model starts with GCP billing exports.

StepDescription
Billing exportRaw usage data exported from GCP billing
Cost attribution engineLogic that maps resource usage to services, teams, or environments
Chargeback reportingInternal dashboards showing team level cost consumption

The cost attribution layer becomes the bridge between infrastructure usage and financial accountability.

Instead of treating cloud costs as a single operational expense, we distribute costs based on actual usage patterns.

Cost DimensionExample Attribution
ProjectApplication or product team
EnvironmentProduction, staging, development
ServiceIndividual microservice or platform capability

This model enables internal chargeback or showback depending on the organization's financial maturity.

Product delivery model

Large platform transformations require a clear product delivery structure. Instead of treating this purely as an infrastructure migration, I structure the initiative as a platform product with defined stakeholders and delivery teams.

Stakeholder model

StakeholderResponsibility
CFOFinancial oversight and ROI validation
VP EngineeringExecution ownership and delivery alignment
Product OwnerStrategy, prioritization, and roadmap management
Cloud ArchitectArchitecture design and migration patterns
Data EngineersData pipelines and modeling
FinOps LeadCost attribution and optimization
SRE / DevOpsPlatform reliability and automation

The platform ultimately supports more than a thousand internal users across multiple business functions including network operations, finance, marketing, and product teams.

Product roadmap and epics

Once the architecture direction is defined, the work is broken into epics and features across infrastructure, data platform, governance, and developer enablement tracks.

Epic CategoryExample Capabilities
Cloud foundationOrganization structure, networking, IAM
Platform servicesKubernetes platform, CI/CD, observability
Data platformData pipelines, warehouse, governance
FinOpsCost attribution, chargeback dashboards
Developer experienceSelf-service environments and automation

Implementation roadmap

The delivery roadmap typically progresses through multiple maturity stages as the platform capabilities expand.

PhaseObjectiveKey Deliverables
Phase 1Cloud foundationNetworking, IAM, project structure
Phase 2Platform compute layerGKE clusters and workload migration
Phase 3Data platformAnalytics infrastructure and pipelines
Phase 4FinOps and governanceCost attribution and chargeback
Phase 5Platform maturitySelf-service platform capabilities

How I roll this out

Large platform transformations rarely succeed if we try to redesign everything at once. I approach the rollout in structured phases.

PhaseObjectivePlatform Focus
Phase 1Establish cloud foundationNetworking, IAM, and project structure
Phase 2Standardize compute platformGKE clusters and workload migration
Phase 3Introduce cost attributionBilling export and attribution engine
Phase 4Platform maturityAutomation, developer self-service, governance

Each phase delivers immediate value while setting up the next layer of the platform.

What I measure

Architecture only matters if it improves how the platform operates. I evaluate success through operational metrics rather than architectural diagrams.

MetricWhy it matters
Environment provisioning timeMeasures developer productivity
Infrastructure utilizationIndicates platform efficiency
Cost visibility by teamEnsures financial accountability
Deployment frequencySignals platform usability

If those metrics improve, the architecture is working.

Closing

Platform architecture should not be designed purely as infrastructure diagrams. It should be designed as an operational product that developers rely on every day.

By moving from fragmented on-prem environments to a structured cloud platform, we reduce operational complexity, improve developer velocity, and introduce clear cost accountability.

That combination ultimately turns infrastructure from a bottleneck into an enabler of product development.