Open to new opportunities

Hi, my name is

Suryateja Chalapati.

I build data platforms, ML & GenAI systems.

Data platform & GenAI engineer building scalable pipelines, LLM-powered agents, and cloud-native analytics on GCP.

01 — About

Data platforms, GenAI & cloud.

I'm a data platform and GenAI engineer with 8+ years of experience designing and building scalable data systems, cloud-native analytics, and LLM-powered applications.

My work spans end-to-end data platforms on GCP — from ingestion pipelines and orchestration, to modeling, observability, and productionizing multimodal LLM workloads. Currently at Ford Motor Company building production AI agents and large-scale data infrastructure.

Focus Areas

  • // Data platform & pipeline design
  • // GenAI / LLM agents on GCP
  • // Batch + streaming ingestion

Currently

  • // Building production AI agents
  • // Automating data ingestion workflows
  • // Exploring multimodal LLM systems

02 — Projects

Selected work in data platforms & GenAI.

A few representative projects across LLM systems, large-scale data migration, and production NLP pipelines.

AI Agents

Autonomous data ingestion agent @ Ford

Built a fully autonomous GCP-deployed agent that ingests batch files (CSV, fixed-width, TXT) end-to-end — raises its own PRs, monitors Tekton CI/CD pipelines, tracks BigQuery load status, and notifies the team. Reduced ingestion setup time from months to minutes.

GCP · Vertex AILLM AgentsBigQueryTekton
95% time reductionView deliverables →
AI Agents

ServiceNow incident routing agent @ Ford

Deployed an agentic AI system that triages ServiceNow incidents, enriches context autonomously, and delivers actionable recommendations to engineers via Microsoft Teams — with a live embedded chat interface for real-time resolution of complex issues.

GCPLLM AgentsServiceNowMS Teams
200+ tickets/week automatedView deliverables →
NLP

spaCy NLP pipeline @ XSELL — 93% accuracy

Built and productionized spaCy Transformer NLP models achieving 93% accuracy for customer conversation analysis. Delivered measurable precision/recall metrics that enabled $500K in new business and designed an MLOps pipeline improving model efficiency by 60%.

spaCyTransformersMLOpsAWS SageMaker
$500K new businessView deliverables →

03 — Tools

Tools & technologies I work with.

My day-to-day stack is centered on Python, GCP, and modern data / MLOps tooling — comfortable moving across ingestion, orchestration, and GenAI applications.

Languages

PythonPython
SQL
JavaJava

Cloud & Data

GCPGCP
BigQueryBigQuery
DataflowDataflow
Apache SparkApache Spark
Kafka

Orchestration & CI/CD

AirflowAirflow
TerraformTerraform
Tekton
DockerDocker

ML / GenAI

LLM Agents
Google ADKGoogle ADK
HuggingFaceHuggingFace
LangChainLangChain
AWS SageMakerAWS SageMaker

04 — Experience

Building data platforms & GenAI systems in the real world.

A snapshot of my recent roles working across data engineering, GenAI, and NLP—mostly on GCP.

  • Software Engineer – Data Platforms & GenAI

    Ford Motor Company · Detroit, MI

    Sep 2023 – Present

    • Built and deployed production AI agents on GCP — a ServiceNow incident routing agent (Google ADK, Cloud Run, Apigee, Power Automate) that triages 150–250 tickets/week across 4 business verticals, routing directly to engineers via Microsoft Teams with live chat. Zero missed SLAs since deployment.
    • Built a fully autonomous data ingestion agent handling batch formats (CSV, fixed-width, TXT) end-to-end — schema inference, ETL generation, Terraform provisioning, Tekton CI/CD, BigQuery load tracking. Reduced ingestion setup from 2–4 months to minutes across 35+ sources.
    • Developed scalable batch and streaming ingestion pipelines moving ~20 TB of data into BigQuery via Dataflow, Airflow, GCS, Kafka, and Pub/Sub for supply chain, manufacturing, and finance analytics.
    • Designed LLM and multimodal pipelines on GCP (Hugging Face, Google ADK) for entity recognition, summarization, and document classification at scale.
    GCPBigQueryDataflowAirflowKafkaPub/SubTektonCloud RunGoogle ADKLLM Agents
  • Data Scientist – NLP

    XSELL Technologies · Chicago, IL

    Jun 2022 – May 2023

    • Built spaCy Transformer NLP models and pipelines achieving 93% accuracy with custom components for entity extraction and semantic similarity on customer conversation data.
    • Delivered precision/recall metrics that drove process improvements, directly enabling $500K in new business for a major client.
    • Fine-tuned ALBERT Transformer models using AWS SageMaker and HuggingFace across 7 TB of de-identified transcript data.
    • Designed end-to-end MLOps pipeline that improved model operational efficiency by 60%.
    spaCyTransformersAWS SageMakerHuggingFaceMLOps
  • Data Scientist / Data Engineer

    University of South Florida · Tampa, FL

    Jan 2021 – May 2022

    • Deployed NLP models for similarity and semantic analysis on unstructured data — estimated Kickstarter page consistency at 7% using Cosine Similarity and identified that 36% of Reddit posts in 2020 related to health issues.
    • Built ETL pipelines on GCP using Airflow, migrated on-prem data to a cloud data warehouse, and applied Topic Modeling (LDA, TFIDF) yielding highest correlation of 38%.
    GCPAirflowPythonNLPLDA / TFIDF
  • Machine Learning Engineer

    Tampa General Hospital · Tampa, FL

    Feb 2020 – Dec 2020

    • Built a Random Forest model on patient data improving clinic operating efficiency by 38%.
    • Automated model training in AWS SageMaker and built Tableau dashboards for operational insights.
    • Extracted data from S3 and Redshift; created multiple databases in AWS Glue Catalog using Glue Crawlers.
    AWS SageMakerS3RedshiftGluePythonTableau
  • Junior Data Scientist

    Amazon · Hyderabad, India

    Sep 2017 – Jul 2019

    • Reduced freight delay risk by 15.3% via automation tools and risk assessment metrics.
    • Predicted labor productivity with 82% precision (ALPS) leading to EU freight optimization.
    • Increased annual shipments by 34% over 6 months via A/B Testing and process development.
    PythonTableauSVMA/B TestingAWS
  • Data Engineer

    Mahindra and Mahindra · Hyderabad, India

    Jun 2015 – Aug 2017

    • Optimized data imputation in S3, reducing costs by $90K and rollout times by 15.3% using AWS Athena.
    • Built ETL jobs on AWS Glue to load vendor data from multiple sources with cleaning and transformation.
    • Worked across AWS services: S3, EC2, Glue, Athena, Redshift, EMR, Kinesis.
    AWS GlueS3AthenaRedshiftKinesisETL

05 — Certifications

Amazon Web Services logo

AWS Machine Learning Specialty

Amazon Web Services · Dec 2022 – Dec 2025

Google Cloud Platform logo

GCP Professional Data Engineer

Google Cloud Platform · Dec 2022 – Dec 2024 · Renewal in progress

06 — Contact

Let's talk data platforms & GenAI.

I'm open to roles and collaborations around data platform engineering, LLM/GenAI systems, and cloud-native analytics. If you'd like to discuss an opportunity or a project, feel free to reach out.

Opens your email client