Research and Development

Where the fun stuff happens...

2024

Exchange Demand Quality and Data Science Engineering in AdTech@Microsoft
- Leadership, Coaching, and Management.
- Data Science Development/workflows:
  - Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.
- Machine Learning System Design and Optimization
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Airflow Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, MySQL, Postgres, Jenkins, AWS, Azure.
- Azure Cloud Platform
- Service:
  - Program committee member:
    - Workshop on advances in artificial intelligence for computational advertising 2024. (AdKDD 2024)

2023

Data Science Engineering in AdTech@Microsoft
- Data Science Development/workflows:
  - Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.
- Machine Learning System Design and Optimization
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Airflow Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, MySQL, Postgres, Jenkins, AWS, Azure.
- Azure Cloud Platform
Publications:
- Industry Talks:
  - Building Low Latency ML Systems for Real-Time Model Predictions at Xandr. (P99 Conference 2023)
  - Modern Data Pipelines Testing Techniques: A Visual Guide. (PyData NYC 2023)
- Published Book:
  - Modern Data Pipelines Testing Techniques: A Visual Guide. (book page)
- Blog posts:
  - Evolving a Data Pipeline Testing Plan (blog post)
  - Modern Data Pipelines Testing Techniques: Why Bother? 1/3 (blog post)
  - Modern Data Pipelines Testing Techniques: Why Bother? 2/3 (blog post)
  - Modern Data Pipelines Testing Techniques: Why Bother? 3/3 (blog post)
- Service:
  - Program committee member:
    - Workshop on advances in artificial intelligence for computational advertising 2023. (AdKDD 2023)

2022

Data Science Engineering in AdTech@Microsoft
- Azure Cloud Platform
- Data Science Development/workflows:
  - Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.
- Machine Learning System Design and Optimization
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS, Azure.
Publications:
- Work in progress Book:
  - Modern Data Pipelines Testing Techniques: A Visual Guide. (book page)
- Blog posts:
  - ML Latency No More: Common Ways to Reduce ML Prediction Latency to Sub X ms (blog post)
Tech Conferences:
- - PyData NYC 2022: ML Latency No More: Common Ways to Reduce ML Prediction Latency to Sub X ms (conference talk)

Service:
- Program committee member:
  - Workshop on advances in artificial intelligence for computational advertising 2022. (AdKDD 2022)

2021

Data Science Engineering in AdTech@Xandr-ATT
- Data Science Development/workflows:
  - Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.
- Machine Learning System Design and Optimization
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS, Azure.
Publications:
- Video Course: Clean Machine Learning Code. (Video Course Page) (book page)

Other writings:

- Data Mesh: The On-Going Evolution (blog post)
- Co-author: Ad Tech Defi (Ad Tech on Crypto/Blockchain) (blog post)

Service:
- Program committee member:
  - Applied Data Science track at the Knowledge Discovery and Data Mining Conference 2021. (KDD 2021)
  - Workshop on advances in artificial intelligence for computational advertising 2021. (AdKDD 2021)
Certifications:
- Microsoft Certified: Azure Fundamentals AZ-900
- Microsoft Certified: Azure Data Fundamentals DP-900

2020

Data Science Engineering in AdTech
- Data Science Development/workflows:
  - Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.
- Machine learning:
  - PyTorch, Tensorflow, Keras, Scikit-Learn.
  - Deep learning in recommender systems.
  - Applications of feature embeddings in Ad Tech.
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS.
Publications:
- Book: Clean Machine Learning Code. (book page)

Other writings:

- ML Feature Stores: A Casual Tour. Part 1: (blog post)
- ML Feature Stores: A Casual Tour. Part 2: (blog post)
- ML Feature Stores: A Casual Tour. Part 3: (blog post)
- Contributor to: DataSketches for Fast Computation. (blog post)
- Seven Signs You Might Be Creating ML Technical Debt (blog post)
- KDD 2020 Conference Highlights: (blog post)

Service:
- Program committee member:
  - Applied Data Science track at the Knowledge Discovery and Data Mining Conference 2020. (KDD 2020)
  - Workshop on advances in artificial intelligence for computational advertising 2020. (AdKDD 2020)

2019

Data Science Engineering in AdTech
- Data Science Development/workflows:
  - Scientific Python Stack, Pandas, Numpy, PyCharm, Jupyter.
- Machine learning:
  - PyTorch, Tensorflow, Keras, Scikit-Learn.
  - Deep learning in recommender systems.
  - Applications of feature embeddings in Ad Tech.
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS.
Certifications:
- Recommendation Systems Specialization from the University of Minnesota
  - Introduction to Recommender Systems: Non-Personalized and Content-Based
  - Nearest Neighbor Collaborative Filtering
  - Recommender Systems: Evaluation and Metrics
  - Matrix Factorization and Advanced Techniques
  - Recommender Systems Capstone
Publications:
- - MRR vs MAP vs NDCG: Rank-Aware Evaluation Metrics And When To Use Them. (blog post)
  - Clean Machine Learning Code: Practical Software Engineering Principles for ML Craftsmanship. Towards Data Science Publication. (blog post)
    - Presented at PyData 2019 Conference.
  - Testing ML Code: How Scikit-learn Does It. Analytics Vidhya Publication. (blog post)
  - Avoiding the “Automatic Hand-off” Syndrome in Data Science Products. Towards Data Science Publication. (blog post)
  - Deep Learning for Recommendation Systems circa 2018–19: A Navigation Map. (blog post)
  - k8s-workqueue: Simplified Kubernetes Batch Jobs For Data Science Use Cases. Xandr Tech Publication. (blog post)
    - Presented in the 2019 International Conference on Machine Learning, Predictive Applications and APIs (PAPIs 2019)
Service:
- Program committee member:
  - Applied Data Science track at the Knowledge Discovery and Data Mining Conference 2019. (KDD 2019)
  - Workshop on advances in artificial intelligence for computational advertising 2019. (AdKDD 2019)

2017-2018

Data Science Engineering in AdTech
- Data Science Development/workflows:
  - Scientific Python Stack, Jupyter.
- Machine learning libraries:
  - SparkML, Scikit-Learn, PyTorch, Tensorflow, Keras, Logistic-regression-L1, R-GLM, L-BFGS, XGBoost.
- Databases, CI/CD and Cluster Runtimes:
  - Docker, Kubernetes, Concourse, Spark, HDFS, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Jenkins, AWS-GPU.
Publications:
- Taifi, Moussa, et al. "Lessons Learned from Building Scalable Machine Learning Pipelines", 2018, International Conference on Machine Learning, Predictive Applications and APIs, (blog post, recording) (to appear in PMLR)
- Structuring a “Docker for Data Science” Training Journey (blog post)
- Introduction to PyTorch Model Compression Through Teacher-Student Knowledge Distillation (blog post)
Individual Contribution to team publications:
- Sanzgiri, Ashutosh, et al. "Classifying Sensitive Content in Online Advertisements with Deep Learning", 2018, The 5th IEEE International Conference on Data Science and Advanced Analytics.
Completed deeplearning.ai 5 Course Specialization:
Completed the National Research University Higher School of Economics Course:
- Practical Reinforcement Learning

Fall-Spring 2016-2017

Data Science Engineering in AdTech
- Data Science Development/workflows:
  - Scientific Python Stack, Scala, Jupyter, R
- Machine learning libraries:
  - SparkML, Scikit-Learn, Tensorflow, Keras, Logistic-regression-L1, R-GLM, L-BFGS, XGBoost
- Databases and Cluster Runtimes:
  - Spark, Hadoop, Yarn, Hive, Kafka, Presto, Vertica, Mysql, Postgres, Airflow
Recommended Books:
- Numerical Optimization (Nocedal et al)
- Convex Optimization (Boyd et al)
- Practical Methods of Optimization (Fletcher)
Kaggle Competitions:
- https://www.kaggle.com/farmiurl:
- Top 14% Dstl Satellite Imagery Feature Detection
- Top 12% Avito Duplicate Ads Detection
Completed University of Washington Machine Learning 4 Course Specialization:
Keras and Tensorflow Contributor
Individual Contribution to team publications:
- Austin, Daniel, et al. "Reserve price optimization at scale." 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2016.

Spring 2016

Data Science Engineering in AdTech
- Data Science Development/workflows:
  - Scientific Python Stack, Scala, Jupyter, R, SQL
- Machine learning libraries:
  - SparkML, Scikit-Learn, Logistic-regression-L1 library, XGBoost
- Databases and Cluster Runtimes:
  - Spark, Hadoop, Yarn, Hive, Kafka, Vertica, Mysql, Postgres
Recommended Books:
- Scala for Data Science (Bugnion)
- Test-Driven Machine Learning (Bozonier)
- Mastering Machine Learning with Scikit-learn (Gavin Hackeling)
Johns Hopkins University Design and Interpretation of Clinical Trials:
- Course Certificate

Summer-Fall 2015

Data Science Engineering in AdTech
- Data Science Development/workflows:
  - Scientific Python Stack, Jupyter, R, SQL
- Machine learning libraries:
  - SparkML, Scikit-Learn, XGBoost
- Databases and Cluster Runtimes:
  - Spark, Hadoop, Yarn, Hive, Vertica, Mysql, Postgres
Recommended Books:
- Mastering Apache Spark (Frampton)
- Mastering Object-oriented Python (Lott)
- Machine Learning with Spark (Pentreath)

Spring 2015

Cloud resource recommendation and cost optimization engine:
- Java, SQL, Python, Linux.
- Postgresql, Hibernate.
- AWS EC2, S3, EBS, Cloudwatch
- Spark Core, PySpark, Spark SQL, Yarn, Hadoop HDFS, CDH 5.
Data analysis tools and references:
- SciKit-Learn, R, Rshiny.
- Recommended books:
  - - Learning Spark: Lightning-Fast Big Data Analysis
    - Mastering Machine Learning with scikit-learn
    - Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2
    - Building Machine Learning Systems with Python
The occasional blog post:
- datamize.wordpress.com
Publications: Moussa Taifi, Justin Y. Shi, Yasin Celik , JENERGY: A Fault Tolerant Stateless Architecture for High Performance Computing, in Proc. of the 9th IEEE International Symposium on Service Oriented System Engineering (SOSE 15), March 2015

Fall 2014

Scalable monitoring of virtual infrastructures:
- Java, SQL,Python, Linux.
- Postgresql, Hibernate, SqlAlchemy
- EC2, S3, EBS, Cloudwatch
- Vmware Vcenter, ESXI Hypervisor Performance monitoring
Data analysis:
- R, Rshiny, Kaggle
- Recommended books:
  - Practical Data Science with R
  - MapReduce Design Patterns
Certification: Cloudera Certified Developer for Apache Hadoop (CCDH): License: 100-011-295

Summer 2014

Data analysis:
- Recommended books:
  - Machine learning with R
  - An Introduction to Statistical Learning: with Applications in R
  - Hadoop in Practice
  - Head first Object-oriented Analysis and Design

Spring 2014

Cloud resource recommendation engine:
- Vmware Vcenter, ESXI Hypervisor performance monitoring and recommendations
- Java, Python, Pandas, SqlAlchemy, Hibernate, Postgresql
- EC2, S3, EBS, PIOPS, Elastic IP, Cloudformation, Custom enterprise networking and storage analysis/recommendation
- Hyper-V performance monitoring and cloud migration recommendations.
- System Center Virtual Machine Manager, Operations Manager
- Active directory(DS,CS) and SQL server 2012 administration
- Powershell, C#.
- Data analysis:
  - Recommended books:
    - An Introduction to Statistical Learning: with Applications in R
    - Data Smart: Using Data Science to Transform Information into Insight
- AWS Detailed billing and forecasting
Certification: MCSA SQL Server - 70-461 Querying Microsoft SQL Server 2012

Fall 2013

Scalable monitoring of virtual infrastructures:
- Java, Hibernate, Sql, pgsql, Postgresql.
- EC2, S3, EBS, PIOPS, Elastic IP, Custom enterprise networking.
- AWS Java SDK.
- Vmware Vcenter, ESXI Hypervisor Performance monitoring
- Timeseries and Pandas.
- Python, SqlAlchemy.
- Elasticsearch, Logstash.
Certification: Amazon Web Services Solution Architect - Associate Level: License AWS-ASA-1803

Spring/Summer 2013

Publications: M. Taifi, "Taking the Elephant to the Market: Improving Hadoop Market Awareness for Auction-based Clouds", 22nd ACM Symposium on High-Performance Parallel and Distributed Computing, Poster, (HPDC 2013), June 2013.
Publications: M. Taifi, "Banking on Decoupling: Budget-driven Sustainability for HPC Applications on EC2 Spot Instances", accepted to appear in ACM Journal of Operating Systems Reviews (OSR 13) .
Publications: M. Taifi, A. Khreishah, and J. Y. Shi, "Building a Private HPC Cloud for Compute and Data-intensive Applications", in the International Journal on Cloud Computing: Services and Architecture, (IJCCSA), 2013.

Publications: M. Taifi and J. Y. Shi, "Performance and Reliability Effects of Multi-tier Bidding on MapReduce in Auction-based Clouds", in Proc. of the 7th IEEE International Symposium on Service Oriented System Engineering (SOSE 13), March 2013
HPC and Big Data on the cloud
- Market-aware and Auction-based Cloud Infrastructures
- Fault tolerance of HPC applications in the cloud
- Reliability of Cloud storage
- Cassandra Cloud Database reliabilty evaluation
- Hadoop MapReduce fault tolerance
Github projects pages: https://github.com/moutai
HPC Private Cloud administration part of the TCLOUD project
- More information available at http://tec.hpc.temple.edu or https://sites.google.com/a/temple.edu/tcloud/home

Fall 2012

J. Y. Shi, M. Taifi and A. Khreishah "Program Scalability Analysis for HPC Cloud: Applying Amdahl’s Law to NAS Benchmarks", in Proceedings of 24th Supercomputing Conference (SC12), IEEE International Workshop on Sustainable HPC Computing (SHPCLOUD12)
Publications: M. Taifi, "Banking on Decoupling: Budget-driven Sustainability for HPC Applications on EC2 Spot Instances", in Proceedings of 31st IEEE International Symposium on Reliable Distributed Systems (SRDS), IEEE Workshop on Dependability Issues in Cloud Computing (DISCCO 2012)
Publications: M. Taifi, "HPC Cloud: Pros, Cons, Challenges and Big Data Potentials", in Proceedings of 1st Annual Computational Research on Owlsnest (CROO 2012) Symposium.
HPC and Big Data on the cloud
- Amazon spot instances
- Fault tolerance of HPC applications in the cloud
- Reliability of Cloud storage
- Xen/ KVM/Qemu, Eucalyptus, Open stack evaluation
- Hadoop/Pig/Hbase fault tolerance
- Hpcfy project on github: Automatic Clustering of Virtual Machine: https://github.com/moutai/hpcfy
Github projects pages: https://github.com/moutai
HPC Private Cloud administration part of the TCLOUD project
- More information available at http://tec.hpc.temple.edu or https://sites.google.com/a/temple.edu/tcloud/home

Spring/Summer 2012

Publications: J. Y. Shi, M. Taifi, A. Khreishah, and J. Wu"Tuple Switching Network--When Slower May be Better", Accepted to appear in Elsevier Journal of Parallel and Distributed Computing (JPDC), 2012
Publications: M. Taifi and J. Y. Shi, "MapReduce Performance Evaluation on a Private HPC Cloud", in Proceedings of The 41st International Conference on Parallel Processing, Poster, (ICPP 2012), September 2012.
Publications: M. Taifi, J. Y. Shi, A. Khreishah, "Towards Auction-Based HPC Computing in the Cloud", in the Journal of Computer Technology and Application, Invited Paper, (JCTA), 2012.
HPC and Big Data on the cloud
- Puppet configuration management
  - Hpcfy project on github: Automatic Clustering of Virtual Machine: https://github.com/moutai/hpcfy
- Fault tolerance of HPC applications in the cloud
- Xen/ KVM/Qemu, Eucalyptus, Open stack evaluation
- Amazon spot instances
- Hadoop/Pig/Hbase fault tolerance
Github projects pages: https://github.com/moutai
HPC Private Cloud administration part of the TCLOUD project
- More information available at http://tec.hpc.temple.edu or https://sites.google.com/a/temple.edu/tcloud/home
Service: Program committee for the MICBECT '12 conference.
- www.masaumnet.com/micbect12/index.html
Service: Program committee for the InterCloud-HPC 2012 symposium.
- http://hpcs2012.cisedu.info/2-conference/symposia/symposium-01-intercloudhpc

Fall/Summer 2011

Publication: M. Taifi, J. Y. Shi and A. Khreishah, "SpotMPI: A Framework for Auction-based HPC Computing Using Amazon Spot Instances ", in Proc of the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011/ICA3PP), October 2011.
Publication: M.Taifi, "Auction-based High Performance Cloud Computing", Finalist poster in the ACM Student Research Competition at Supercomputing 2011 SC11, November 2011.
HPC on the cloud
- Xen 4.0
- GPU computing research
- Nimbus/cumulus cloud set up
- Eucalyptus, Open stack evaluation
- Hadoop, hdfs, map reduce research
- Fault tolerance in the cloud
- Amazon spot instances
Administration of an HPC Private Cloud 12 nodes part of the TCLOUD project
- More information available at https://sites.google.com/a/temple.edu/tcloud/home
Service: Session chair at the International Conference of Algorithms and Architectures for Parallel Processing (ICA3PP11)
Service: Student volunteer at SuperComputing Conference (SC11)
Student travel grant to SC11 sponsored by Microsoft Research

Spring 2011

Publication: J. Y. Shi, M. Taifi, and A Khreishah, "Resource Planning for Parallel Processing in the Cloud", in Proc of the 1st International Workshop on Sustainable High Performance Cloud Computing (SHPCC 2011), September 2011.
Publication: M. Taifi, A. Khreishah, J. Y. Shi, and J. Wu, "Sustainable GPU Computing at Scale," in Proc. of the 14th IEEE International Conference on Computational Science and Engineering (CSE 2011), August 2011.
Publication: M. Taifi, A. Khreishah and J. Y. Shi, "Natural HPC Substrate: Exploitation of Mixed Multicore CPU and GPUs", in Proc. of the 14th IEEE International Conference on High Performance Computing & Simulation (HPCS 2011), July 2011.
HPC on the cloud, resource planning
Amazon EC2
EMC and SAN storage configuration and optimization
Received Bronze Award for our research poster related to Fault Tolerant GPU computing at the future of computing 2011 conference, Temple University.
Service: Organizing committee for the SHPCC 2011 conference http://monitor.cis.temple.edu/SHPCC11/

Fall 2010

Multi-GPU failure tolerance through failure containment (CheCuda and VCuda)
Reliability of HPC software (open mpi and blcr)
Amazon ec2 compute cloud with GPU instances
Startcluster MIT project for cloud cluster construction on the ec2
Map reduce and Hadoop intro

Summer 2010

I was part of the Temple Team that won the first place at the Teragrid 10 conference programming competition the article relating the event is here. with NSF student support
Participated in HPDC conference and NCSI workshop at Kean university with Northwestern University and NCSI Shodor student support
Participated to the UIUC parallel programming summer school

Spring 2010

Harnessing the power of mixed GPU CPU environments through fault tolerant decoupling: Experimenting with the D2P2 substrate.
My newest poster at the future of computing 2010:
- Publication: M. Taifi and Y.Shi, "GPU-CPU High Performance Computing Through Fault Tolerant Decoupling: Preliminary Results", Poster, Future of Computing,Temple University, March 2010

Fall 09

Automatic code parallelization using OpenCl and the Pml tagging technology

Summer 09

Participation to the IEEE Cluster 2009 conference in New Orleans with a NSF student travel support.
I am doing research about the New GPU/CUDA and we just submitted a paper to the cluster 2009 conference.
Publication: M. Taifi and Y.Shi, "How to achieve a 47000x speed up on the GPU/CUDA using matrix multiplication," Technical Report, Amax corporation, June 2009.

Spring 09

Publication: M. Taifi and Y.Shi, "Performance Prediction and Evaluation of a Solution Space Compact Parallel Program using the Steady State Timing Model", Poster, Future of Computing,Temple University, March 2009.
Investigating the validity of the Timing model for predicting parallel programs performances
I am participating in two competitions with a new research effort that deals with the prediction of parallel speed ups using the Timing Model link
Our research poster link was accepted/presented at the The future of computing 09 conference and the CST Student research symposium

Fall 08

Parallel processing research
Parallel algorithm classification and evaluation
Introduction to challenges and research directions for parallel processing

Previous Graduate Research

I have worked as a research scholar on a number of very interesting projects :)

Summer 2008

- I worked with Prof. Juha Puustjarvi in the Communication engineering lab of Lappeenranta University of technology about Improving the sales of a company using Opinion Mining on their system. You can find the writeup here https://oa.doria.fi/handle/10024/42452

Spring 2008

I worked on my master thesis, Opinion Mining (check the attachments at the bottom of the page for a brief presentation), under the supervision of Prof. Juha Puustjarvi and Prof Jari Porras at Lappeenranta University of Technology.

July 2007

- I worked with Prof. Yuan Shi, Chair of the CS department at Temple University. I helped with an ongoing project that focused on performance evaluation of the Stateless Parallel Processing.

June 2007

- I worked with Dr. Ville Kyrki and doctoral student Olli Alkkiomäkki in the machine vision and pattern recognition laboratory of the University of Lappeenranta, Finland. My project consisted of updating the linux driver for the Matrox II framegrabber.

June-July 2006

- I worked for Prof. Steven Lindell in the CS department at Haverford College. I assisted with technical support in the Summer Cascade Mentoring Program which provides opportunities for Philadelphia high school teachers and high school students to participate in an active research lab during the summer months.

August 2006

- Undergraduate Research Assistantship:
  - I worked with Dr. David Kasunic at Princeton University in the music Department. My research focused on Mozart and the history of the concept of "prodigy".
- Developed Online Arabic Courses at AUI.
- Participated in the development of the Website: www.alakhawayn.ma
- School Work and projects from the past (the links stopped working so contact me if you need some more information ) update: some of the files can be found right at the bottom of this page.

Marketing projects:

Google Sites

Report abuse