Publications
2023
Published Book:
Modern Data Pipelines Testing Techniques: A Visual Guide (book page)
Conference Talks:
Building Low Latency ML Systems for Real-Time Model Predictions at Xandr. (P99 Conference 2023)
Modern Data Pipelines Testing Techniques: A Visual Guide. (PyData NYC 2023)
Blog posts:
Evolving a Data Pipeline Testing Plan (blog post)
2022
Work in progress Book:
Modern Data Pipelines Testing Techniques: A Visual Guide (book page)
Blog posts and Conference Talks:
PyData NYC 2022: ML Latency No More: Common Ways to Reduce ML Prediction Latency to Sub X ms
2021
Video Course:
Clean Machine Learning Code. (video course page) (book page)
Blog posts:
2020
Blogposts:
Book:
Clean Machine Learning Code. (book page)
2019
MRR vs MAP vs NDCG: Rank-Aware Evaluation Metrics And When To Use Them. (blog post)
Clean Machine Learning Code: Practical Software Engineering Principles for ML Craftsmanship. Towards Data Science Publication. (blog post)
Testing ML Code: How Scikit-learn Does It. Analytics Vidhya Publication. (blog post)
Avoiding the “Automatic Hand-off” Syndrome in Data Science Products. Towards Data Science Publication. (blog post)
Deep Learning for Recommendation Systems circa 2018–19: A Navigation Map. (blog post)
k8s-workqueue: Simplified Kubernetes Batch Jobs For Data Science Use Cases. Xandr Tech Publication. (blog post)
Presented in the 2019 International Conference on Machine Learning, Predictive Applications and APIs (PAPIs 2019)
2018
Moussa, Taifi, et al. "Lessons Learned from Building Scalable Machine Learning Pipelines", 2018, International Conference on Machine Learning, Predictive Applications and APIs, (blog post, recording) (to appear in PMLR)
Structuring a “Docker for Data Science” Training Journey. Appnexus Tech Publication. (blog post)
Introduction to PyTorch Model Compression Through Teacher-Student Knowledge Distillation (blog post)
Individual Team Contribution paper:
Sanzgiri, Ashutosh, et al. "Classifying Sensitive Content in Online Advertisements with Deep Learning", 2018, The 5th IEEE International Conference on Data Science and Advanced Analytics.
2016
Individual Team Contribution paper:
Austin, Daniel, et al. "Reserve price optimization at scale." 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2016.
2015
Moussa Taifi, Justin Y. Shi, Yasin Celik , JENERGY: A Fault Tolerant Stateless Architecture for High Performance Computing, in Proc. of the 9th IEEE International Symposium on Service Oriented System Engineering (SOSE 15), March 2015
PhD thesis
M. Taifi, "Stateless Parallel Processing Architecture for Exascale and Auction-based HPC Clouds", PhD Thesis, Temple University, 2013
2013
M. Taifi, Taking the Elephant to the Market: Improving Hadoop Market Awareness for Auction-based Clouds, 22nd ACM Symposium on High-Performance Parallel and Distributed Computing, Poster, (HPDC 2013), June 2013.
M. Taifi, "Banking on Decoupling: Budget-driven Sustainability for HPC Applications on EC2 Spot Instances", accepted to appear in ACM Journal of Operating Systems Reviews (OSR 13) .
M. Taifi and J. Y. Shi, "Performance and Reliability Effects of Multi-tier Bidding on MapReduce in Auction-based Clouds", accepted to appear in Proc. of the 7th IEEE International Symposium on Service Oriented System Engineering (SOSE 13), March 2013.
M. Taifi, A. Khreishah, and J. Y. Shi, "Building a Private HPC Cloud for Compute and Data-intensive Applications", accepted to appear in the International Journal on Cloud Computing: Services and Architecture, (IJCCSA), 2013.
2012
J. Y. Shi, M. Taifi, A. Khreishah, and J. Wu, "Tuple Switching Network--When Slower May be Better", in Elsevier Journal of Parallel and Distributed Computing (JPDC), 2012.
M. Taifi, "Banking on Decoupling: Budget-driven Sustainability for HPC Applications on EC2 Spot Instances", (Workshop version) in Proceedings of 31st IEEE International Symposium on Reliable Distributed Systems (SRDS), IEEE Workshop on Dependability Issues in Cloud Computing (DISCCO 2012), October 2012.
M. Taifi and J. Y. Shi, "MapReduce Performance Evaluation on a Private HPC Cloud", in Proceedings of The 41st International Conference on Parallel Processing, Poster, (ICPP 2012), September 2012.
M. Taifi, J. Y. Shi, A. Khreishah, "Towards Auction-Based HPC Computing in the Cloud", in the Journal of Computer Technology and Application, Invited Paper ,(JCTA), 2012.
J. Y. Shi, M. Taifi and A. Khreishah, "Program Scalability Analysis for HPC Cloud: Applying Amdahl’s Law to NAS Benchmarks", in Proceedings of 24th Supercomputing Conference (SC12), IEEE International Workshop on Sustainable HPC Computing (SHPCLOUD12)
M. Taifi, "HPC Cloud: Pros, Cons, Challenges and Big Data Potentials", in Proceedings of 1st Annual Computational Research on Owlsnest (CROO 2012) Symposium.
2011
M. Taifi, J. Y. Shi and A. Khreishah, "SpotMPI: A Framework for Auction-based HPC Computing Using Amazon Spot Instances" , in Proc of the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011/ICA3PP), October 2011.
M.Taifi, "Auction-based High Performance Cloud Computing", Finalist Poster in the ACM Student Research Competition at Supercomputing Conference 2011 (SC11), November 2011.
J. Y. Shi, M. Taifi, and A Khreishah, "Resource Planning for Parallel Processing in the Cloud", in Proc of the 1st International Workshop on Sustainable High Performance Cloud Computing (SHPCC 2011), September 2011.
M. Taifi, A. Khreishah, J. Y. Shi, and J. Wu, "Sustainable GPU Computing at Scale," in Proc. of the 14th IEEE International Conference on Computational Science and Engineering (CSE 2011), August 2011.
M. Taifi, A. Khreishah and J. Y. Shi, "Natural HPC Substrate: Exploitation of Mixed Multicore CPU and GPUs", in Proc. of the 14th IEEE International Conference on High Performance Computing & Simulation (HPCS 2011), July 2011.
2010
M. Taifi and Y.Shi, "GPU-CPU High Performance Computing Through Fault Tolerant Decoupling: Preliminary Results", Poster, Future of Computing Symposium,Temple University, March 2010.
2009
M. Taifi and Y.Shi, "How to achieve a 47000x speed up on the GPU/CUDA using matrix multiplication," Technical Report, Amax corporation, June 2009.
M. Taifi and Y.Shi, "Performance Prediction and Evaluation of a Solution Space Compact Parallel Program using the Steady State Timing Model", Poster, CST Student Research Symposium,Temple University, March 2009.
2008
Master Thesis
M. Taifi, "Opinion Mining", Master Thesis, Lappeenranta University of Technology, 2008
Journals
J. Y. Shi, M. Taifi, A. Khreishah, and J. Wu, "Tuple Switching Network--When Slower May be Better", in Elsevier Journal of Parallel and Distributed Computing (JPDC), 2012.
M. Taifi, "Banking on Decoupling: Budget-driven Sustainability for HPC Applications on EC2 Spot Instances", in ACM Journal of Operating Systems Reviews (OSR 13) .
M. Taifi, J. Y. Shi, A. Khreishah, "Towards Auction-Based HPC Computing in the Cloud", in the Journal of Computer Technology and Application ,(JCTA), 2012.
M. Taifi, A. Khreishah, and J. Y. Shi, "Building a Private HPC Cloud for Compute and Data-intensive Applications", in the International Journal on Cloud Computing: Services and Architecture, (IJCCSA), 2013.
Conferences
Moussa, Taifi, et al. "Lessons Learned from Building Scalable Machine Learning Pipelines", 2018, International Conference on Machine Learning, Predictive Applications and APIs, (blog post, recording) (to appear in PMLR)
Sanzgiri, Ashutosh, et al. "Classifying Sensitive Content in Online Advertisements with Deep Learning", 2018, The 5th IEEE International Conference on Data Science and Advanced Analytics.
Austin, Daniel, et al. "Reserve price optimization at scale." 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2016.
Publications: M. Taifi, J. Y. Shi, Y. Celik , JENERGY: A Fault Tolerant Stateless Architecture for High Performance Computing, in Proc. of the 9th IEEE International Symposium on Service Oriented System Engineering (SOSE 15), March 2015
M. Taifi and J. Y. Shi, "Performance and Reliability Effects of Multi-tier Bidding on MapReduce in Auction-based Clouds", accepted to appear in Proc. of the 7th IEEE International Symposium on Service Oriented System Engineering (SOSE 13), March 2013.
M. Taifi, A. Khreishah, J. Y. Shi, and J. Wu, "Sustainable GPU Computing at Scale," in Proc. of the 14th IEEE International Conference on Computational Science and Engineering (CSE 2011), August 2011.
M. Taifi, A. Khreishah and J. Y. Shi, "Natural HPC Substrate: Exploitation of Mixed Multicore CPU and GPUs", in Proc. of the 14th IEEE International Conference on High Performance Computing & Simulation (HPCS 2011), July 2011.
Workshops
M. Taifi, "Banking on Decoupling: Budget-driven Sustainability for HPC Applications on EC2 Spot Instances (Workshop version)", in Proceedings of 31st IEEE International Symposium on Reliable Distributed Systems (SRDS), IEEE Workshop on Dependability Issues in Cloud Computing (DISCCO 2012), October 2012.
J. Y. Shi, M. Taifi and A. Khreishah "Program Scalability Analysis for HPC Cloud: Applying Amdahl’s Law to NAS Benchmarks", in Proceedings of 24th Supercomputing Conference (SC12), IEEE International Workshop on Sustainable HPC Computing (SHPCLOUD12)
M. Taifi, J. Y. Shi and A. Khreishah, "SpotMPI: A Framework for Auction-based HPC Computing Using Amazon Spot Instances" , in Proc of the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011/ICA3PP), October 2011.
J. Y. Shi, M. Taifi, and A Khreishah, "Resource Planning for Parallel Processing in the Cloud", in Proc of the 1st International Workshop on Sustainable High Performance Cloud Computing (SHPCC 2011), September 2011.
Extended abstracts
M. Taifi, "Taking the Elephant to the Market: Improving Hadoop Market Awareness for Auction-based Clouds", 22nd ACM Symposium on High-Performance Parallel and Distributed Computing, Poster, (HPDC 2013), June 2013.
M. Taifi and J. Y. Shi, "MapReduce Performance Evaluation on a Private HPC Cloud", in Proceedings of The 41st International Conference on Parallel Processing, Poster, (ICPP 2012), September 2012.
M.Taifi, "Auction-based High Performance Cloud Computing", Finalist Poster in the ACM Student Research Competition at Supercomputing Conference 2011 (SC11), November 2011.
M. Taifi, "HPC Cloud: Pros, Cons, Challenges and Big Data Potentials", in Proceedings of 1st Annual Computational Research on Owlsnest (CROO 2012) Symposium.
M. Taifi and Y.Shi, "GPU-CPU High Performance Computing Through Fault Tolerant Decoupling", Poster, Future of Computing Symposium,Temple University, March 2010.
M. Taifi and Y.Shi, "Performance Prediction and Evaluation of a Solution Space Compact Parallel Program using the Steady State Timing Model", Poster, CST Student Research Symposium,Temple University, March 2009.