Skip to main content

cv

phone email linkedin
+55 (11) 95841-7956 rodrigohmorais@proton.me linkedin.com/in/rodrigohmorais

Data Engineer (4 YoE) with a background spanning network science, NLP, and cloud data infrastructure. Currently building LLM-powered information extraction and medallion architecture pipelines on Azure with Databricks, Airflow, and Fabric at CEJAM. Previously developed business-critical ETL systems with Airflow at Star Parks and conducted research on LLM applications for qualitative data analysis at the University of Illinois Urbana-Champaign.


Experience

Data Engineer | CEJAM | São Paulo, Brazil | January 2026 – Present

  • Architect LLM governance and access through a centralized Bifrost LLM gateway on Azure Cloud
  • Ensure conformity with the LGPD for data infrastructure and LLM operations
  • Deploy LLMs to the Azure Foundry, integrating them into Databricks pipelines for NLP (information extraction) on unstructured data
  • Create agentic workflows for automations, using Docker for agent containment and the OpenAI API for model access
  • Implement and maintain medallion data architecture for ingestion and transformation from various sources into databases ready for BI analytics
  • Create ETL pipelines with Databricks, Azure Data Factory and Azure Data Lake Gen 2
  • Manage Azure cloud infrastructure costs and resources

Data Scientist | Star Parks | São Paulo, Brazil | October 2024 – January 2026

  • Design and maintain ETL data pipelines on Airflow that handle business-critical processes
  • Implement modular SQL queries with SQLAlchemy and PyODBC on Python for HTTP API data integrations
  • Managed Airflow configuration and DAGs through Git and GitHub
  • Upgrade and deploy DAG base from Airflow 2 to 3 on Azure
  • Create PowerBI dashboards for data analysis, enabling crucial business insights

Researcher | University of Illinois Urbana-Champaign | Champaign, Illinois | August 2022 – August 2024

  • Researched innovative methods for data analysis on multi-modal educational data
  • Developed cutting-edge NLP methods using LLMs for qualitative coding of textual data
  • Employed the Ollama API with Python to communicate with LLMs

Computational Mathematics Researcher | Soka University of America | Aliso Viejo, California | Sept 2021 – May 2022

  • Developed TypeScript software to mine and transform data from currency markets APIs
  • Created novel algorithms for finding unbalanced trade cycles in currency exchange markets

Software Developer | PushStart Studio | São Paulo, Brazil | March 2021 – June 2021

  • Developed web-based educational software using JavaScript and PixiJS
  • Used Git, Slack, Discord and JIRA to coordinate with a development team

Work and Other Publications

Research

  • Machine-Learning for Supernovae Age Detection | Python, Pandas, NumPy, Machine Learning, SciKit, Jupyter Notebooks
  • Research in partnership with Astronomy department at the University of Illinois
  • Developed novel Machine Learning applications to detect the age of supernovae from light band readings
  • Graph Simulation of Food Banks | Python, Pandas, NumPy, NetworkX
  • Monte Carlo simulation on real data of food banks in the state of Illinois. Used to understand the impact of catastrophic events on food distribution
  • Unbalanced trade cycles in currency trade networks | Network Science, TypeScript
  • Research and development of novel cycle finding algorithms in split-graphs

Open Source Software

  • NeTS | TypeScript, Network Science, Deno
  • The first Network Science library published on Deno
  • PyDBCon | Python, Pandas, SQL
  • Python library to facilitate API integrations to SQL databases
  • ZenDBCon | Python, Pandas, SQL
  • Python library to integrate the Zendesk API into a SQL database
  • TIC-80 Injector | Lua
  • Tiny Lua script to inject code into a TIC-80 cartridge. Makes it easier to develop for the TIC-80 using an IDE
  • VecTIC | Lua
  • Vector maths library in Lua. Made for use with TIC-80 projects

Relevant Graduate-Level Courses (4.0 GPA)

Course Topics
Computational Math Machine Learning, Python, NumPy, scikit-learn
Educational Data Mining Data Mining, Classical Machine Learning
Data Structures & Algorithms Data Structures, Data Science, Data Analytics
Progr. & Quality in Analytics SQL, Python, Pandas, Data Analytics

Certifications & Honors

DS4A 2.0 Honors | Python, SQL, Data Science, Pandas, Business Intelligence | Sept 2021

  • Merit fellowship for highly selective data science program (4% acceptance) with professor Natesh Pillai (Harvard)
  • Worked with an EdTech company to draw insights on user behavior using their dataset of over 200 thousand users

SoftBank Endowed Scholar | Sept 2021

  • Based on exceptional work and notable professional premise

Skills

Languages and Tools: Python, Jupyter Notebooks, SQL, Airflow, Databricks, Apache Spark, Azure Data Factory, Azure Data Lake, PowerBI, DAX, Polars, Pandas, NumPy, Azure, Virtual Machines (VMs), Linux, Git, LaTeX, Lua, R, JavaScript, TypeScript, FastAPI

Knowledge: Data-Driven Decision-Making, Problem-Solving, Critical Thinking, Data Pipeline Architecture and Design, ETL, Medallion Architecture, LLM Integration, NLP, Classical Machine Learning, Clustering, Data Storytelling, Data Cleansing and Quality, Data Science, Data Analytics, Data Mining, Business Intelligence and Analytics, Research


Education

University of Illinois Urbana-Champaign | August 2022 – May 2024 PhD in Informatics (withdrawn)

Soka University of America | Aliso Viejo, California | August 2018 – May 2022 Bachelor's degree, Liberal Arts and Sciences, concentrations in Economics and International Studies

github personal website
github.com/rodigu rodigu.github.io