Research and Development

Where the fun stuff happens...

2023

  • Data Science Engineering in AdTech@Microsoft

    • Data Science Development/workflows:

      • Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.

    • Machine Learning System Design and Optimization

    • Databases, CI/CD and Cluster Runtimes:

      • Docker, Kubernetes, Airflow Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, MySQL, Postgres, Jenkins, AWS, Azure.

    • Azure Cloud Platform

  • Publications:

    • Work in progress Book:

      • Modern Data Pipelines Testing Techniques: A Visual Guide. (book page)

    • Blog posts:

      • Modern Data Pipelines Testing Techniques: Why Bother? 1/3 (blog post)

      • Modern Data Pipelines Testing Techniques: Why Bother? 2/3 (blog post)

      • Modern Data Pipelines Testing Techniques: Why Bother? 3/3 (blog post)

2022

  • Data Science Engineering in AdTech@Microsoft

    • Azure Cloud Platform

    • Data Science Development/workflows:

      • Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.

    • Machine Learning System Design and Optimization

    • Databases, CI/CD and Cluster Runtimes:

      • Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS, Azure.

  • Publications:

    • Work in progress Book:

      • Modern Data Pipelines Testing Techniques: A Visual Guide. (book page)

    • Blog posts:

      • ML Latency No More: Common Ways to Reduce ML Prediction Latency to Sub X ms (blog post)

  • Tech Conferences:

      • PyData NYC 2022: ML Latency No More: Common Ways to Reduce ML Prediction Latency to Sub X ms (conference talk)

  • Service:

    • Program committee member:

      • Workshop on advances in artificial intelligence for computational advertising 2022. (AdKDD 2022)

2021

  • Data Science Engineering in AdTech@Xandr-ATT

    • Data Science Development/workflows:

      • Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.

    • Machine Learning System Design and Optimization

    • Databases, CI/CD and Cluster Runtimes:

      • Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS, Azure.

  • Publications:

Other writings:

    • Data Mesh: The On-Going Evolution (blog post)

    • Co-author: Ad Tech Defi (Ad Tech on Crypto/Blockchain) (blog post)

  • Service:

    • Program committee member:

      • Applied Data Science track at the Knowledge Discovery and Data Mining Conference 2021. (KDD 2021)

      • Workshop on advances in artificial intelligence for computational advertising 2021. (AdKDD 2021)

  • Certifications:

    • Microsoft Certified: Azure Fundamentals AZ-900

    • Microsoft Certified: Azure Data Fundamentals DP-900

2020

  • Data Science Engineering in AdTech

    • Data Science Development/workflows:

      • Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.

    • Machine learning:

      • PyTorch, Tensorflow, Keras, Scikit-Learn.

      • Deep learning in recommender systems.

      • Applications of feature embeddings in Ad Tech.

    • Databases, CI/CD and Cluster Runtimes:

      • Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS.

  • Publications:

    • Book: Clean Machine Learning Code. (book page)

Other writings:

    • ML Feature Stores: A Casual Tour. Part 1: (blog post)

    • ML Feature Stores: A Casual Tour. Part 2: (blog post)

    • ML Feature Stores: A Casual Tour. Part 3: (blog post)

    • Contributor to: DataSketches for Fast Computation. (blog post)

    • Seven Signs You Might Be Creating ML Technical Debt (blog post)

    • KDD 2020 Conference Highlights: (blog post)

  • Service:

    • Program committee member:

      • Applied Data Science track at the Knowledge Discovery and Data Mining Conference 2020. (KDD 2020)

      • Workshop on advances in artificial intelligence for computational advertising 2020. (AdKDD 2020)

2019

  • Data Science Engineering in AdTech

    • Data Science Development/workflows:

      • Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.

    • Machine learning:

      • PyTorch, Tensorflow, Keras, Scikit-Learn.

      • Deep learning in recommender systems.

      • Applications of feature embeddings in Ad Tech.

    • Databases, CI/CD and Cluster Runtimes:

      • Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS.

  • Certifications:

  • Publications:

      • MRR vs MAP vs NDCG: Rank-Aware Evaluation Metrics And When To Use Them. (blog post)

      • Clean Machine Learning Code: Practical Software Engineering Principles for ML Craftsmanship. Towards Data Science Publication. (blog post)

      • Testing ML Code: How Scikit-learn Does It. Analytics Vidhya Publication. (blog post)

      • Avoiding the “Automatic Hand-off” Syndrome in Data Science Products. Towards Data Science Publication. (blog post)

      • Deep Learning for Recommendation Systems circa 2018–19: A Navigation Map. (blog post)

      • k8s-workqueue: Simplified Kubernetes Batch Jobs For Data Science Use Cases. Xandr Tech Publication. (blog post)

        • Presented in the 2019 International Conference on Machine Learning, Predictive Applications and APIs (PAPIs 2019)

  • Service:

    • Program committee member:

      • Applied Data Science track at the Knowledge Discovery and Data Mining Conference 2019. (KDD 2019)

      • Workshop on advances in artificial intelligence for computational advertising 2019. (AdKDD 2019)

2017-2018

  • Data Science Engineering in AdTech

    • Data Science Development/workflows:

      • Scientific Python Stack, Jupyter.

    • Machine learning libraries:

      • SparkML, Scikit-Learn, PyTorch, Tensorflow, Keras, Logistic-regression-L1, R-GLM, L-BFGS, XGBoost.

    • Databases, CI/CD and Cluster Runtimes:

      • Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS-GPU.

  • Publications:

    • Taifi, Moussa, et al. "Lessons Learned from Building Scalable Machine Learning Pipelines", 2018, International Conference on Machine Learning, Predictive Applications and APIs, (blog post, recording) (to appear in PMLR)

    • Structuring a “Docker for Data Science” Training Journey (blog post)

    • Introduction to PyTorch Model Compression Through Teacher-Student Knowledge Distillation (blog post)

  • Individual Contribution to team publications:

    • Sanzgiri, Ashutosh, et al. "Classifying Sensitive Content in Online Advertisements with Deep Learning", 2018, The 5th IEEE International Conference on Data Science and Advanced Analytics.

  • Completed deeplearning.ai 5 Course Specialization:

  • Completed the National Research University Higher School of Economics Course:

Fall-Spring 2016-2017

Spring 2016

  • Data Science Engineering in AdTech

    • Data Science Development/workflows:

      • Scientific Python Stack, Scala, Jupyter, R, SQL

    • Machine learning libraries:

      • SparkML, Scikit-Learn, Logistic-regression-L1 library, XGBoost

    • Databases and Cluster Runtimes:

      • Spark, Hadoop, Yarn, Hive, Kafka, Vertica, Mysql, Postgres

  • Recommended Books:

    • Scala for Data Science (Bugnion)

    • Test-Driven Machine Learning (Bozonier)

    • Mastering Machine Learning with Scikit-learn (Gavin Hackeling)

  • Johns Hopkins University Design and Interpretation of Clinical Trials:

Summer-Fall 2015

  • Data Science Engineering in AdTech

    • Data Science Development/workflows:

      • Scientific Python Stack, Jupyter, R, SQL

    • Machine learning libraries:

      • SparkML, Scikit-Learn, XGBoost

    • Databases and Cluster Runtimes:

      • Spark, Hadoop, Yarn, Hive, Vertica, Mysql, Postgres

  • Recommended Books:

    • Mastering Apache Spark (Frampton)

    • Mastering Object-oriented Python (Lott)

    • Machine Learning with Spark (Pentreath)

Spring 2015

  • Cloud resource recommendation and cost optimization engine:

    • Java, SQL, Python, Linux.

    • Postgresql, Hibernate.

    • AWS EC2, S3, EBS, Cloudwatch

    • Spark Core, PySpark, Spark SQL, Yarn, Hadoop HDFS, CDH 5.

  • Data analysis tools and references:

    • SciKit-Learn, R, Rshiny.

    • Recommended books:

        • Learning Spark: Lightning-Fast Big Data Analysis

        • Mastering Machine Learning with scikit-learn

        • Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2

        • Building Machine Learning Systems with Python

  • The occasional blog post:

  • Publications: Moussa Taifi, Justin Y. Shi, Yasin Celik , JENERGY: A Fault Tolerant Stateless Architecture for High Performance Computing, in Proc. of the 9th IEEE International Symposium on Service Oriented System Engineering (SOSE 15), March 2015

Fall 2014

  • Scalable monitoring of virtual infrastructures:

    • Java, SQL,Python, Linux.

    • Postgresql, Hibernate, SqlAlchemy

    • EC2, S3, EBS, Cloudwatch

    • Vmware Vcenter, ESXI Hypervisor Performance monitoring

  • Data analysis:

    • R, Rshiny, Kaggle

    • Recommended books:

      • Practical Data Science with R

      • MapReduce Design Patterns

  • Certification: Cloudera Certified Developer for Apache Hadoop (CCDH): License: 100-011-295

Summer 2014

  • Data analysis:

    • Recommended books:

      • Machine learning with R

      • An Introduction to Statistical Learning: with Applications in R

      • Hadoop in Practice

      • Head first Object-oriented Analysis and Design

Spring 2014

  • Cloud resource recommendation engine:

    • Vmware Vcenter, ESXI Hypervisor performance monitoring and recommendations

    • Java, Python, Pandas, SqlAlchemy, Hibernate, Postgresql

    • EC2, S3, EBS, PIOPS, Elastic IP, Cloudformation, Custom enterprise networking and storage analysis/recommendation

    • Hyper-V performance monitoring and cloud migration recommendations.

    • System Center Virtual Machine Manager, Operations Manager

    • Active directory(DS,CS) and SQL server 2012 administration

    • Powershell, C#.

    • Data analysis:

      • Recommended books:

        • An Introduction to Statistical Learning: with Applications in R

        • Data Smart: Using Data Science to Transform Information into Insight

    • AWS Detailed billing and forecasting

  • Certification: MCSA SQL Server - 70-461 Querying Microsoft SQL Server 2012

Fall 2013

  • Scalable monitoring of virtual infrastructures:

    • Java, Hibernate, Sql, pgsql, Postgresql.

    • EC2, S3, EBS, PIOPS, Elastic IP, Custom enterprise networking.

    • AWS Java SDK.

    • Vmware Vcenter, ESXI Hypervisor Performance monitoring

    • Timeseries and Pandas.

    • Python, SqlAlchemy.

    • Elasticsearch, Logstash.

  • Certification: Amazon Web Services Solution Architect - Associate Level: License AWS-ASA-1803

Spring/Summer 2013


Fall 2012

Spring/Summer 2012

Fall/Summer 2011

  • Publication: M. Taifi, J. Y. Shi and A. Khreishah, "SpotMPI: A Framework for Auction-based HPC Computing Using Amazon Spot Instances ", in Proc of the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011/ICA3PP), October 2011.

  • Publication: M.Taifi, "Auction-based High Performance Cloud Computing", Finalist poster in the ACM Student Research Competition at Supercomputing 2011 SC11, November 2011.

  • HPC on the cloud

    • Xen 4.0

    • GPU computing research

    • Nimbus/cumulus cloud set up

    • Eucalyptus, Open stack evaluation

    • Hadoop, hdfs, map reduce research

    • Fault tolerance in the cloud

    • Amazon spot instances

  • Administration of an HPC Private Cloud 12 nodes part of the TCLOUD project

  • Service: Session chair at the International Conference of Algorithms and Architectures for Parallel Processing (ICA3PP11)

  • Service: Student volunteer at SuperComputing Conference (SC11)

  • Student travel grant to SC11 sponsored by Microsoft Research

Spring 2011

  • Publication: J. Y. Shi, M. Taifi, and A Khreishah, "Resource Planning for Parallel Processing in the Cloud", in Proc of the 1st International Workshop on Sustainable High Performance Cloud Computing (SHPCC 2011), September 2011.

  • Publication: M. Taifi, A. Khreishah, J. Y. Shi, and J. Wu, "Sustainable GPU Computing at Scale," in Proc. of the 14th IEEE International Conference on Computational Science and Engineering (CSE 2011), August 2011.

  • Publication: M. Taifi, A. Khreishah and J. Y. Shi, "Natural HPC Substrate: Exploitation of Mixed Multicore CPU and GPUs", in Proc. of the 14th IEEE International Conference on High Performance Computing & Simulation (HPCS 2011), July 2011.

  • HPC on the cloud, resource planning

  • Amazon EC2

  • EMC and SAN storage configuration and optimization

  • Received Bronze Award for our research poster related to Fault Tolerant GPU computing at the future of computing 2011 conference, Temple University.

  • Service: Organizing committee for the SHPCC 2011 conference http://monitor.cis.temple.edu/SHPCC11/

Fall 2010

  • Multi-GPU failure tolerance through failure containment (CheCuda and VCuda)

  • Reliability of HPC software (open mpi and blcr)

  • Amazon ec2 compute cloud with GPU instances

  • Startcluster MIT project for cloud cluster construction on the ec2

  • Map reduce and Hadoop intro

Summer 2010

  • I was part of the Temple Team that won the first place at the Teragrid 10 conference programming competition the article relating the event is here. with NSF student support

  • Participated in HPDC conference and NCSI workshop at Kean university with Northwestern University and NCSI Shodor student support

  • Participated to the UIUC parallel programming summer school

Spring 2010

  • Harnessing the power of mixed GPU CPU environments through fault tolerant decoupling: Experimenting with the D2P2 substrate.

  • My newest poster at the future of computing 2010:

    • Publication: M. Taifi and Y.Shi, "GPU-CPU High Performance Computing Through Fault Tolerant Decoupling: Preliminary Results", Poster, Future of Computing,Temple University, March 2010

Fall 09

  • Automatic code parallelization using OpenCl and the Pml tagging technology

Summer 09

  • Participation to the IEEE Cluster 2009 conference in New Orleans with a NSF student travel support.

  • I am doing research about the New GPU/CUDA and we just submitted a paper to the cluster 2009 conference.

  • Publication: M. Taifi and Y.Shi, "How to achieve a 47000x speed up on the GPU/CUDA using matrix multiplication," Technical Report, Amax corporation, June 2009.

Spring 09

  • Publication: M. Taifi and Y.Shi, "Performance Prediction and Evaluation of a Solution Space Compact Parallel Program using the Steady State Timing Model", Poster, Future of Computing,Temple University, March 2009.

  • Investigating the validity of the Timing model for predicting parallel programs performances

  • I am participating in two competitions with a new research effort that deals with the prediction of parallel speed ups using the Timing Model link

  • Our research poster link was accepted/presented at the The future of computing 09 conference and the CST Student research symposium

Fall 08

  • Parallel processing research

  • Parallel algorithm classification and evaluation

  • Introduction to challenges and research directions for parallel processing

Previous Graduate Research

I have worked as a research scholar on a number of very interesting projects :)

Summer 2008

    • I worked with Prof. Juha Puustjarvi in the Communication engineering lab of Lappeenranta University of technology about Improving the sales of a company using Opinion Mining on their system. You can find the writeup here https://oa.doria.fi/handle/10024/42452

Spring 2008

  • I worked on my master thesis, Opinion Mining (check the attachments at the bottom of the page for a brief presentation), under the supervision of Prof. Juha Puustjarvi and Prof Jari Porras at Lappeenranta University of Technology.

July 2007

    • I worked with Prof. Yuan Shi, Chair of the CS department at Temple University. I helped with an ongoing project that focused on performance evaluation of the Stateless Parallel Processing.

June 2007

    • I worked with Dr. Ville Kyrki and doctoral student Olli Alkkiomäkki in the machine vision and pattern recognition laboratory of the University of Lappeenranta, Finland. My project consisted of updating the linux driver for the Matrox II framegrabber.

June-July 2006

    • I worked for Prof. Steven Lindell in the CS department at Haverford College. I assisted with technical support in the Summer Cascade Mentoring Program which provides opportunities for Philadelphia high school teachers and high school students to participate in an active research lab during the summer months.

August 2006