Mingzhu Wei

Ph.D. candidate
Department of Computer Science
Worcester Polytechnic Institute

Office: Fuller Lab 319
Phone: 508-831-5857
Fax: 508-831-5776
E-Mail: samanwei 'at' wpi.edu

[HOME] [RESEARCH INTERESTS] [DISSERTATION] [RESEARCH EXPERIENCE] [INTERN EXPERIENCE] [PUBLICATIONS]

Biography

I am a Ph.D. candidate in Department of Computer Science at Worcester Polytechnic Institute. My advisor is Prof. Elke A. Rundensteiner. I received my B.S. and M.S. degree in Computer Science from  University of Science and Technology Beijing. I joined Department of Computer Science at WPI for a Ph.D. in 2004.

My research field is in stream query processing, especially in approximate query processing in XML streams.  I have also worked in the area of recursive XQuery processing over XML streams and complex event processing. My Ph.D. dissertation title is Continuously Providing Approximate Results under Limited Resources: Load Shedding and Data Spilling in XML Streams


Research interests:
 Dissertation

Because of the high volume and unpredictable arrival rates, stream processing systems may not always be able to keep up with the input data streams - resulting in buffer overflow and uncontrolled loss of data. To tackle this problem,  my dissertation is focusing on continuously supplying online results through load shedding and load spilling in the context of XML stream systems. 

First, I explore the new opportunities of load shedding in XML streams. Since XML data is hierarchical, subelements, extracted from different positions of the XML tree structure, may vary in their importance. Further, dropping different subelements may save different amounts of storage and computation. Hence I propose to design structure-oriented load shedding, which provides a new opportunity for selectively shedding XML subelements. I propose to develop a preference model that enables users to specify the relative importance of preserving different subpatterns within the XML result structure. This transforms shedding into the problem of rewriting the user query into shed queries that return approximate answers with utility as measured by the given user preference model. The goal is to find the appropriate shed queries to maximize the output utility driven by our structure-based preference model under the limitation of available computation resources when load shedding is needed. This part of work has been published in WWW 2008.

 Second, I address load spilling in XML streams. I introduce ``structure-based spilling'', a spilling technique customized for XML streams by considering the partial spillage of their possibly complex XML elements. The processing of the disk resident data would then be postponed until a later time. Such structure-based spilling brings a lot of new challenges. When a path is spilled, multiple paths in the query may be affected. I describe possible spilling effects on the query paths and how to execute the “reduced” query to produce partial results. We formulate our structure-based spilling problem into an optimization problem, namely, to find the reduced query that maximizes the output quality based on our structure-based quality model and cost model for XML streams. To select the reduced query that maximizes output quality, we develop three optimization strategies. In addition, we also examine the clean-up stage to guarantee that an entire result set is eventually generated by producing supplementary results. The experimental results demonstrate that our proposed solutions consistently achieve higher quality results compared to the state-of-the-art techniques. This part of work has been submitted to EDBT2010.

Research Experience

Proposing a structure-based spilling framework for XML streams. Transforming our structure-based spilling problem into an optimization problem to find the reduced query that maximizes the output quality. Designing an output quality model for evaluating partial results for different reduced queries. Examining the side-effect on different paths in the query for a particular spilled path. Developing three optimization strategies to select the reduced query that maximizes output quality.  Implementing this work in RAINDROP system developed by WPI.

Worked on multiple sequence patterns containing complex negative events. Instead of building a runtime stack for each sequence pattern query individually, we exploited sharing common positive components in runtime stack among multiple queries to reduce the memory usage. To facilitate complex negative event pattern detection, we proposed a PMark approach to recording the negation evaluation results for candidate results.

Worked on utility-driven load shedding in XML stream systems under CPU constraints.  Proposed a generalized CPU cost model for XML streams.  Designed a preference model for XQuery which allows users to specify preferences on different XML elements.  Proposed two algorithms on choosing an approximate query to achieve maximum utility on data.  Implemented this work in RAINDROP system developed by WPI.

Worked on how to handle recursive XQuery for XML stream systems.  Proposed a new class of stream algebra operators for efficient recursive XQuery stream processing.  In particular, I designed two strategies for implementing structural joins.  This work was published on XSDM workshop joined with ICDE2006.

 

Intern Experience

 

Publications

Book chapters

Papers

samanwei at wpi.edu
Last modified: Sep. 10th, 2009