I am

Dongqing Xiao

A graduated Ph.D @ Worcester Polytechnic Institute
Involving with meta-data management, uncertain graph privacy protection.

About Me

  • Name: Dongqing Xiao
  • Date of birth: 30 Jan 1989
  • Address: FL 319, 100 Institute Road,
    Worcester, MA
  • Nationality: Chinese
  • Email: dxiao@wpi.edu

What I Do ?

I am working under the supervision of Professor Mohamed Y. Eltabakh. My research interests are in areas of privacy protection for uncertain graph data, meta-data management and distributed analytic algorithm.

  • Metadata Managment
  • Query Optimization
  • Distributed Graph Mining



  • 2012-2017

    Doctor of Philosophy

    Computer Science, Worcester Polytechnic Insitute,MA, USA

    Focus on metadata managment, query optimization, distributed graph mining, privacy preserving graph publishing.

  • 2010-2012

    Master of Engineering (MEng)

    Computer Science, Harbin Insitute of Technology, China

    Focused on information retrieval search result diversification based on user logs and machine translation evaluation base on post-editing.

  • 2006-2010

    Bachelor of Science

    Education Science, East China Normal University,China

Recent Projects

  • 2016-2017


    Privacy preserving Uncertain Graph Publishing

    In revision

  • 2015-2016


    Exploiting Soft and Hard Correlations in Big Data Query Optimization

    Unlike relational databases in which discovering and exploiting the correlations in query optimization have been extensively studied, in big data infrastructures, such important data properties and their utilization have been mostly abandoned. The key reason is that domain experts may know many correlations but with a degree of uncertainty (fuzziness or softness). Since the data is big, it is very challenging to validate such correlations, judge their worthiness, and put strategies for utilizing them in query optimization. We propose the EXORD system to fill in this gap by exploiting the data’s correlations in big data query optimization. EXORD supports two types of correlations; hard correlations—which are guaranteed to hold for all data records, and soft correlations—which are expected to hold for most, but not all, data records. We introduce a new three-phase approach for (1) Validating and judging the worthiness of soft correlations, (2) Selecting and preparing the soft correlations for deployment by specially handling the violating data records, and (3) Deploying and exploiting the correlations in query optimization.

  • 2014-2015


    An Efficient MapReduce Triangle Listing Algorithm
    for Web-Scale Graphs

    The triangle listing problem has been studied in several distributed infrastructures including MapReduce. However, existing algorithms suffer from generating and shuffling huge amounts of intermediate data, where interestingly, a large percentage of this data is redundant. Inspired by this observation, we present the “Bermuda” method, an efficient MapReduce based triangle listing technique for massive graphs. Different from existing approaches, Bermuda effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. As a result, Bermuda achieves orders-of-magnitudes of speedup and enables processing larger graphs that other techniques fail to process under the same resources.

  • 2012-2014


    Large-Scale Annotation Management

    We address the challenges that arise from the growing scale of annotations in scientific databases. On one hand, end-users and scientists are incapable of analyzing and extracting knowledge from the large number of reported annotations, e.g., one tuple may have hundreds of annotations attached to it over time. On the other hand, current annotation management techniques fall short in providing advanced processing over the annotations beyond just propagating them to end-users. To address this limitation, we propose the InsightNotes system, a summary-based annotation management engine in relational databases. InsightNotes integrates data mining and summarization techniques into annotation management in novel ways with the objective of creating and reporting concise representations (summaries) of the raw annotations.


  1. Dongqing Xiao, Mohamed Y. Eltabakh, Xiangnan Kong.
    Bermuda: An Efficient MapReduce Triangle Listing Algorithm for Web-Scale Graphs. SSDBM 2016.
  2. Hai Liu, Dongqing Xiao, Pankaj Didwania, Mohamed Y. Eltabakh:
    Exploiting Soft and Hard Correlations in Big Data Query Optimization. PVLDB 9(12): 1005-1016 (2016)
  3. Karim Ibrahim, Dongqing Xiao, Mohamed Y. Eltabakh.
    Elevating Annotation Summaries To First-Class Citizens In InsightNotes. EDBT 2015: 49-60
  4. Dongqing Xiao, Armir Bashllari, Tyler Menard, Mohamed Y. Eltabakh.
    Even Metadata is Getting Big: Annotation Summarization using InsightNotes. SIGMOD Conference 2015: 1409-1414.
  5. Dongqing Xiao, Mohamed Y. Eltabakh:
    InsightNotes: summary-based annotation management in relational databases. SIGMOD Conference 2014: 661-672






Data Mining




Distributed System


More skills


Get in touch

Address/Street 100 Institute Road,
Worcester, USA
Phone Number 508-XXX-0377