Data by Daniel

Professional Timeline

Below is a chronological look at my experience, covering key data roles, small-scale project coordination responsibilities, and an overall perspective on how analytics can transform practical problems into real solutions.

NationsBenefits (October 2024 - February 2025)

Senior Data Analyst

Venturing into the healthcare realm, I constructed SQL/Python pipelines that cleansed and modeled large-scale data for Elevance and Aetna, boosting data accuracy by ~20%. In my junior capacity, I shouldered day-to-day planning responsibilities—coordinating minor sprints, aligning tasks with the business team, and ensuring that each data milestone was appropriately validated. My Power BI and Tableau dashboards supported multi-million-dollar strategic initiatives, improving efficiency by ~15%.

I also implemented iterative transformations with dbt (a data build tool) to streamline data extraction, raising processing efficiency another ~15–20%. During this period, I learned to blend standard regression and hypothesis testing (pandas, scikit-learn, SciPy) into everyday workflows, accelerating data-driven decisions at the C-level. Although I was an associate-level contributor, I actively planned pipeline enhancement tasks to ensure on-time delivery and a coherent data journey from ingestion to final insights.

CDI Advisors (January 2024 - September 2024)

Senior Data Analyst (Contract)

As a contract Senior Data Analyst working alongside a tight-knit team, I focused on forecasting solutions using Python libraries like pandas, scikit-learn, and XGBoost, cutting forecast errors by about 10%. I introduced SAS-based accuracy checks to refine existing forecasting processes, and used dbt in conjunction with Airflow on a modest AWS/Snowflake environment for ~20–25% overall workflow efficiency gains.

My responsibilities included running short weekly stand-ups, setting small deadlines, and verifying deliverables matched stakeholder expectations. This experience taught me how iterative improvements and consistent communication can elevate forecast fidelity—even in a local or mid-scale environment.

Royal Caribbean Group (September 2023 - December 2023)

Senior Data Analyst (Contract)

At Royal Caribbean Group, I devised SQL/Python revenue models for a fleet of 26 ships, elevating demand projection accuracy by ~15–20%. I also scripted Python-based ticket pricing updates, reducing manual input by ~30–40% and enhancing operational efficiency by ~10–15%. In my associate role, I tracked each deliverable's progress (like implementing advanced window functions) in short sprints, ensuring code quality and data integrity before deployment.

This environment prioritized fast decisions, so I coordinated with operations to confirm pipeline readiness each week. By bridging data engineering tasks and simple project management duties, I helped deliver near real-time fare adjustments and quickly capitalized on new revenue opportunities.

CVS Health (May 2022 - January 2023)

Analytics Consultant

Taking on a consulting role at CVS Health, I worked with SQL (including advanced window functions, CTEs) and Python-based libraries (pandas, scikit-learn) to optimize multi-team data workflows on local or minimal hardware setups. My duties involved clarifying each ingestion or transformation task in a backlog and verifying final outputs within specified deadlines.

By deploying scalable data pipelines (Airflow, dbt), I reduced manual data handling, allowing teams to focus on strategic data usage. I further integrated ML models (TensorFlow, XGBoost) for forecasting. While others determined the vision, I collaborated closely with them, ensuring my small-scale management approach kept daily tasks on target, culminating in a better overall pipeline for timely analytics.

Qvest.US (July 2021 - March 2022)

Consulting Analyst

At Qvest.US, I stepped into an associate-level data analyst role supporting SQL-driven data pipelines for cross-department Tableau/Power BI dashboards. These pipelines fostered real-time insights for sales and operations. My self-organized “micro-projects” for each enhancement ensured tasks remained bite-sized and trackable, so stakeholders saw incremental gains every two weeks.

I also ran market and competitive analyses with Python scripts, enabling data-backed strategy formation. Although I juggled typical junior analytics tasks, I found that minimal but structured project planning (like short stand-ups and Gantt charts) significantly boosted visibility and maintained progress across concurrent tasks.

Project 1: Sentiment Analysis on 1.6 Million Tweets

Project Overview: In this personal exercise, I aimed to uncover emotional trends in a massive collection of tweets. By meticulously cleaning text data and employing logistic regression, I explored how sentiment fluctuates over time in response to key events or viral topics.

Process: After normalizing tweets (removing special characters, tokenizing words, applying TF-IDF), I trained the model with cross-validation to optimize accuracy. This approach highlighted how strongly negative tweets often spiked around controversial subjects. Additionally, an overlay of retweet volume revealed that viral negativity often garners outsized engagement.

Results & Insights: The final model exceeded 90% accuracy on validation data, confirming its reliability in gauging overall sentiment. These insights pointed to a correlation between polarizing content and higher user engagement, underscoring how emotional language shapes social media dynamics.

Visualization: Average Sentiment & Retweet Volume
The left axis here displays sentiment (-1 to +1), while the right axis captures total retweet volume. Spikes in negativity consistently align with heavy retweet activity, illustrating how emotive content proliferates faster online.

Project 2: Time Series Forecasting for Retail Demand

Project Overview: Using detailed sales data covering multiple product lines over two years, I developed a forecasting system to anticipate monthly demand shifts. The goal was to reduce last-minute rush shipping and better align promotional timing.

Process: I cleaned the dataset (removing anomalies), introduced features like promotional flags and day-of-week indicators, and tested multiple modeling strategies. An ensemble of Prophet (for seasonality) and RandomForestRegressor (for non-linear interactions) outperformed baseline ARIMA, validated via rolling window back-testing.

Results & Insights: The final model cut mean absolute error by ~17% over naive methods, enabling managers to reorder more precisely. Detailed error analysis revealed certain categories exhibited irregular surges around holidays or marketing pushes, highlighting the need for real-time forecast updates during those periods.

Visualization: Actual & Forecasted Monthly Sales (Confidence Bands)
Bars show actual sales across three product categories, while lines depict predicted volumes. The shaded areas around Category A’s forecast illustrate a 95% confidence interval, highlighting potential variance in demand.

Project 3: Advanced Customer Segmentation in E-Commerce

Project Overview: Merging session logs with transaction histories, I aimed to cluster users based on their buying patterns, frequency, and average order values. The resulting segments guided marketing teams toward more effective loyalty and upsell strategies.

Process: I engineered features like recency, cart abandonment rates, and total spend, then compared algorithms (K-Means, DBSCAN, hierarchical) using silhouette scores. K-Means with five clusters offered the best interpretability. Each segment was labeled by characteristic behaviors—for instance, “Mid-Freq, Mid-Spend” or “High-Freq, High-Spend,” enabling targeted approaches.

Results & Insights: Marketing campaigns tailored to each cluster boosted email open rates by ~18%. Observing user migration across segments also helped detect churn risk or ascendant buying patterns. The net outcome: a more nuanced view of consumer habits, fueling data-driven retargeting and retention efforts.

Visualization: Five Clusters by Purchase Frequency & Average Spend
A color-coded scatter plot reveals the distinct groups, with each cluster demonstrating unique spending and frequency metrics. Labeled centroids help pinpoint typical buyer behaviors, informing more nuanced marketing.

Project 4: Real-Time IoT Anomaly Detection

Project Overview: I developed a pipeline to identify and visualize anomalies in IoT sensor data in near real-time. By proactively spotting erratic readings, the system provided early warnings of potential hardware failures or hazardous conditions in connected devices (e.g., smart thermostats or industrial machinery).

Process: I configured a local environment using Python for ingestion and quick ETL tasks, alongside Kafka for data queuing. Each sensor feed transmitted temperature, vibration, and humidity metrics. An isolation forest model (an algorithm that flags outliers in multi-dimensional data) identified unusual spikes or dips. A threshold-based approach triggered Slack alerts when sensor values breached normal operating ranges.

Results & Insights: The pipeline flagged incremental deviations that manual checks would likely miss, cutting mean response time to abnormal events by 40%. Cluster-based analysis revealed that anomalies often involved simultaneous temperature and vibration jumps, reinforcing the importance of correlating sensor variables. Overall, this system helped detect mechanical issues earlier, reducing downtime and repair costs.

Visualization: Temperature & Vibration with Threshold Lines
The chart above shows both temperature (left axis) and vibration (right axis), with dashed lines marking upper and lower thresholds. Any data points that cross these thresholds or exhibit extreme combined behaviors are flagged as anomalies, highlighting critical events in real-time.

Project 5: Credit Card Fraud Detection

Project Overview: In this self-directed project, I explored classification techniques to identify potentially fraudulent transactions. Since fraud typically accounts for a tiny percentage of all credit card activity, my primary challenge was dealing with this highly imbalanced dataset to ensure legitimate transactions weren’t frequently misclassified while still catching true fraud.

Process: First, I split the transaction data into training and test sets. I then introduced specialized methods for imbalanced learning, such as SMOTE (Synthetic Minority Over-Sampling Technique) to replicate fraud examples more evenly, and undersampling of legitimate transactions to maintain a workable ratio. I ran multiple experiments using RandomForestClassifier and XGBoost with a heavy focus on the ROC–AUC (Receiver Operating Characteristic–Area Under the Curve) as my primary evaluation metric, because it illustrates how well the model distinguishes fraud from legitimate transactions at various thresholds.

Results & Insights: My final ensemble model achieved ~98% ROC–AUC, reducing missed fraud (false negatives) significantly. I also analyzed false positives (legitimate transactions flagged as fraud), finding patterns like legitimate overseas travel or sporadic big-ticket purchases. By applying domain knowledge to these borderline cases, I refined the pipeline to avoid inconveniencing users who legitimately exhibit “abnormal” patterns. Overall, this approach showcased how careful handling of minority classes can deliver high-confidence fraud alerts without overwhelming call centers or negatively impacting customer experiences.

Visualization: Confusion Matrix & ROC Curve
Here, you can see a confusion matrix (a table comparing predicted vs. actual classes), which helps measure how many transactions were correctly or incorrectly categorized. A False Positive occurs when a legitimate transaction is wrongly flagged as fraud, while a False Negative happens if a fraudulent transaction is incorrectly labeled as legitimate. On the right is the ROC curve (Receiver Operating Characteristic curve), plotting the True Positive Rate (the fraction of all fraud that is correctly caught) against the False Positive Rate (the fraction of all legitimate transactions that are incorrectly flagged). The further the ROC curve pushes toward the top-left, the better the model is at distinguishing fraud from legitimate transactions across different thresholds. This curve’s Area Under the Curve (AUC) is a consolidated measure of the model’s overall separating power.

Crafting Insights from Everyday Data

Professional Timeline

NationsBenefits (October 2024 - February 2025)

CDI Advisors (January 2024 - September 2024)

Royal Caribbean Group (September 2023 - December 2023)

CVS Health (May 2022 - January 2023)

Qvest.US (July 2021 - March 2022)

Education

Bachelor of Science in Statistics & Bachelor of Arts in History

Master of Science in Business Analytics

Technical Skills

🖥 Programming

📊 Data Visualization

💾 Big Data & Cloud

🔗 Data Engineering

🧠 Machine Learning

Certifications

Salesforce Certified Administrator

Salesforce Certified Advanced Administrator

AWS Certified Cloud Practitioner

Project 1: Sentiment Analysis on 1.6 Million Tweets

Project 2: Time Series Forecasting for Retail Demand

Project 3: Advanced Customer Segmentation in E-Commerce

Project 4: Real-Time IoT Anomaly Detection

Project 5: Credit Card Fraud Detection

Contact