PAKDD2020 Alibaba AI Ops Competition

Program Schedule

Session Chair: Pinghui Wang
2:00pm - 2:05pm
Overview Introduction
Pinghui Wang (Chair)
2:05pm - 2:20pm
An Introduction to PAKDD CUP 2020 Dataset
Yi Liu (Alibaba Organizer)
2:20pm - 2:35pm
First place solution of PAKDD Cup 2020
Jie Zhang (Competitor)
2:35pm - 2:50pm
Large-scale Disk Failure Prediction: Third Place
Bo Zhou (Competitor)
2:50pm - 3:05pm
Characterizing and Modeling for Proactive Disk Failure Prediction to Improve Reliability of Data Centers
Xinhui Liang (Competitor)
3:05pm - 3:20pm
SHARP: SMART HDD Anomaly Risk Prediction
Wei Liu (Competitor)
3:30pm - 3:50pm
Toward Adaptive Disk Failure Prediction via Stream Mining
Shujie Han (Invited talk)
3:50pm - 4:05pm
PAKDD2020 Alibaba AI Ops Competition: Large-Scale Disk Failure Prediction
Run-Qing Chen (Competitor)
4:05pm - 4:20pm
Anomaly Detection of Hard Disk Drives based on Multi-scale Feature
Xiandong Ran (Competitor)
4:20pm - 4:35pm
A Voting-based Robust Model for Disk Failure Prediction
Manjie Li (Competitor)
4:35pm - 4:55pm
Summary of PAKDD CUP 2020-From Organizers' Perspective
Cheng He (Alibaba Organizer)


In the large-scale data centers, the number of hard disk drive(HDD) and solid-state drive (SSD) has reached millions. According to statistcs, disk failures account for the largest proportion of all failures. The frequent occurrence of disk failures will affect the stability and reliability of the server and even the entire IT infrastructure, which have a negative impact on business SLAs (Service-Level Agreement). Thus, prediction of disk failures has been an important topic for IT or big data company.

However, the topic has several challenging data characteristics, such as high data noise, extremely imbalanced classification, and time-varying features. And contestants should spend more time on solving these problems. Meanwhile, since stability overwhelms everything, the effectiveness and stability of prediction model will also be considered. If contestants can discover reasonable approaches to predict disk failures in such a large-scale system, all IT and big data companies might be able to adopt such approaches to boost prosperity of the cloud computing.

This contest consists of three phases: the qualification, the semi-finals, and the finals. Specific timelines and regulations are as follows:

1.1. The Qualification (7 February, 2020 - 18 March, 2020 UTC+8)
  1. After successful registration, contestants can download data from Tianchi platform, debug the algorithms locally, and submit the results online. If a contestant submits results multiple times within a day, new results will overwrite old ones.

  2. From 12 February, 2020 to 25 February, 2020, the system will carry out an evaluation and ranking every day. The evaluation starts from 12:00 a.m. and contestants will be ranked based on evaluation scores. The ranking list will show the best result of the contestant within the present phase.

  3. From 26 February, 2020 onwards, the system will carry out two evaluations, which start from 10:00 a.m and 18:00 p.m, respectively.

  4. The test set is divided into two sets: A and B. The leader board will initially show the ranking and score on the set A to help players adjust the model. In the last two days – 17 and 18 March 2020, we will switch to set B for the ranking. The final result on the test set B shall prevail. Please be noted that contestants are required to submit new results of test set B during 17 and 18 March 2020. Contestants will have two evaluation opportunities every day, and the evaluations start from 10:00 a.m and 18:00 p.m, respectively. The set B will be provided in 16 March 2020. Specially, test dataset A focus on model 1, and test dataset B focus on model 2;

  5. The top 100 teams of test set B will win the qualification to the next phase.

  6. All the contestants are required to complete the identity verification and then get the qualification to the next phase. For international participants who do not use Alipay, please upload an ID card for verification. (Verification Guide:

  7. Test dataset A is available.

1.2. The Semi-Finals(20 March, 2020 - 12 April, 2020 UTC+8)
  1. The test data of the rematch phase cannot be downloaded, and the docker image submission method will be used. The container image submission instructions will be announced in the semi-finals.

  2. The evaluation system provides an evaluation twice a day. The leaderboards are sorted according to the evaluation indicators from high to low. The evaluation time is 12:00 everyday during the rematch. The leaderboard will select the historical best results of the participating teams at this stage for ranking display.

  3. At the end of the rematch, the organizing committee will review the models and codes (including data processing and model training) of the TOP 20 teams. Teams that have not submitted, reproduced unsuccessfully, or failed the review will be disqualified from the finals and competition rewards. The TOP 10 teams that passed the final review will be invited to participate in the finals. The code review submission time node is 12th at 12 o'clock, and the final notice will be time node 13th.

1.3. The finals(17 April, 2020 - 18 April, 2020 UTC+8)
  1. The final contest will be conducted in the form of online or offline defense. The team who is promoted to the final contest should prepare defense materials in advance, including defense PPT, competition summary and core code.

  2. For the defense materials, the first draft of the defense PPT should be submitted before 17:00 on April 15th, the organizing committee will review and feed back the modification suggestions. And the final draft of the defense PPT should be submitted before 17:00 on April 16th. The specific arrangement of final defense will be notified separately.

  3. The form of defense, includes 15-minute presentation for each team and another 15-minute questions and answers. The judges will give a comprehensive score according to each team's technical ideas, theoretical depth and defense performance.

  4. The final score will be weighted according to the algorithm score and defense score of the participating team. Scoring weight: 70% of the performance of semi-finals and 30% of the final defense results. According to the final scores, the competition awards will be selected, and no less than 5 excellent players will be selected to participate in pakdd2020.

Important Dates


The Competition is open to all individuals, companies, colleges, research institutions. One account per participant. You may form a team with at most 3 members. <0l class="normalist">

  • Advisory board members (and their immediate families and members of the same household) of the Competition Sponsor, Tianchi and their respective affiliates, subsidiaries, contractors, agents, judges and advertising and promotion agencies are not eligible to participate in the Competition.
  • Members of Alibaba Group can participate in the Competition, but are excluded from the Ranking and the Awards.
  • Registration

    1. Registration time: from February 7, 2020 to March 17, 2020 UTC+8. The registration entrance will be closed and team change will be disabled at 10:00 a.m. on April 17, 2020 UTC+8.

    2. Registration rules: A contestant team may include one to three members. A contestant can ONLY join one team. The registration information must be correct and valid. Any false information or cheating behavior will lead to disqualification for ranking and awarding.

    3. Registration method: Contestants can log on to the Tianchi platform with the Alibaba Cloud account.

    4. The committee has built a discussion group on DingTalk to keep in touch with contestants. Please scan the QR code to join.

    enter image description here


    The following prizes are made out to the top 5 teams of this contest:
    First prize: one team, $15,000 USD;
    Second prize: one team, $8,000 USD;
    Third prize: one team, $5,000 USD;
    Winning award: two teams, $1000 USD for each.

    Travelling Support: Top teams will also receive travelling support conditioned on that they present their solutions on PAKDD2020, which will be held in Singapore in May, 2020.

    Priority recruitment
    The top 20 contestants will have the priority in Alibaba Campus Recruitment.


    Xiaoxue Zhao, Principal Engineer/Researcher, Alibaba
    Patrick. P. C. Lee, Associate Professor, CUHK
    Pinghui Wang, Professor, Xi'an Jiaotong University
    Shiwen Wang, Staff Engineer, Alibaba
    Zengyi Lu, Staff Engineer, Alibaba
    Cheng He, Staff Algorithm Engineer, Alibaba