![]() |
Mingzhu Wei Ph.D. candidate Department of Computer Science Worcester Polytechnic Institute Office: Fuller Lab 319 Phone: 508-831-5857 Fax: 508-831-5776 E-Mail: samanwei 'at' wpi.edu |
| [HOME] | [RESEARCH INTERESTS] | [DISSERTATION] | [RESEARCH EXPERIENCE] | [INTERN EXPERIENCE] | [PUBLICATIONS] |
Biography
I am a Ph.D. candidate in Department of Computer Science at Worcester Polytechnic Institute. My advisor is Prof. Elke A. Rundensteiner. I received my B.S. and M.S. degree in Computer Science from University of Science and Technology Beijing. I joined Department of Computer Science at WPI for a Ph.D. in 2004.
My research field is in stream query processing, especially in approximate query processing in XML streams. I have also worked in the area of recursive XQuery processing over XML streams and complex event processing. My Ph.D. dissertation title is Continuously Providing Approximate Results under Limited Resources: Load Shedding and Data Spilling in XML Streams
| Research interests: |
| Dissertation |
Because of the high volume and unpredictable arrival rates, stream processing systems may not always be able to keep up with the input data streams - resulting in buffer overflow and uncontrolled loss of data. To tackle this problem, my dissertation is focusing on continuously supplying online results through load shedding and load spilling in the context of XML stream systems.
First, I explore the new opportunities of load shedding in XML streams. Since XML data is hierarchical, subelements, extracted from different positions of the XML tree structure, may vary in their importance. Further, dropping different subelements may save different amounts of storage and computation. Hence I propose to design structure-oriented load shedding, which provides a new opportunity for selectively shedding XML subelements. I propose to develop a preference model that enables users to specify the relative importance of preserving different subpatterns within the XML result structure. This transforms shedding into the problem of rewriting the user query into shed queries that return approximate answers with utility as measured by the given user preference model. The goal is to find the appropriate shed queries to maximize the output utility driven by our structure-based preference model under the limitation of available computation resources when load shedding is needed. This part of work has been published in WWW 2008.
Second, I address load spilling in XML streams. I introduce ``structure-based spilling'', a spilling technique customized for XML streams by considering the partial spillage of their possibly complex XML elements. The processing of the disk resident data would then be postponed until a later time. Such structure-based spilling brings a lot of new challenges. When a path is spilled, multiple paths in the query may be affected. I describe possible spilling effects on the query paths and how to execute the “reduced” query to produce partial results. We formulate our structure-based spilling problem into an optimization problem, namely, to find the reduced query that maximizes the output quality based on our structure-based quality model and cost model for XML streams. To select the reduced query that maximizes output quality, we develop three optimization strategies. In addition, we also examine the clean-up stage to guarantee that an entire result set is eventually generated by producing supplementary results. The experimental results demonstrate that our proposed solutions consistently achieve higher quality results compared to the state-of-the-art techniques. This part of work has been submitted to EDBT2010.
| Research Experience |
Load Spilling in XML Streams Jan. 2008 – Present
Proposing a structure-based spilling framework for XML streams. Transforming our structure-based spilling problem into an optimization problem to find the reduced query that maximizes the output quality. Designing an output quality model for evaluating partial results for different reduced queries. Examining the side-effect on different paths in the query for a particular spilled path. Developing three optimization strategies to select the reduced query that maximizes output quality. Implementing this work in RAINDROP system developed by WPI.
Multiple Sequence Detection in Complex Event Processing Oct. 2007 – Dec. 2007
Worked on multiple sequence patterns containing complex negative events. Instead of building a runtime stack for each sequence pattern query individually, we exploited sharing common positive components in runtime stack among multiple queries to reduce the memory usage. To facilitate complex negative event pattern detection, we proposed a PMark approach to recording the negation evaluation results for candidate results.
Load Shedding in XML Streams May 2006 – Oct. 2007
Worked on utility-driven load shedding in XML stream systems under CPU constraints. Proposed a generalized CPU cost model for XML streams. Designed a preference model for XQuery which allows users to specify preferences on different XML elements. Proposed two algorithms on choosing an approximate query to achieve maximum utility on data. Implemented this work in RAINDROP system developed by WPI.
Recursive XQuery Processing in XML Streams May 2005 – Dec. 2005
Worked on how to handle recursive XQuery for XML stream systems. Proposed a new class of stream algebra operators for efficient recursive XQuery stream processing. In particular, I designed two strategies for implementing structural joins. This work was published on XSDM workshop joined with ICDE2006.
| Intern Experience |
Research Intern at HP Labs, Palo Alto, CA May 2007 – Aug. 2007
Built a Complex Event Processing (CEP) engine to detect complex events over data streams. The developed system is able to support multiple continuous running queries including aggregation queries and sequence pattern detection. We exploit operator sharing among different queries and propose a new split operator to optimize query plan. The system showed flexibility and reusability by demonstrating different uses cases in retail and banking environments.
| Publications |
Book chapters
Papers
Mingzhu Wei, Elke A. Rundensteiner, Murali Mani. Achieving High Output Quality under Limited Resources through Structure-based Spilling in XML Streams. Submitted to EDBT 2010. ( pdf )
Mingzhu Wei, Elke A. Rundensteiner, Murali Mani. Achieving High Output Utility under Limited Resources through Structure-based Shedding and Spilling in XML Streams. SIGMOD IDAR 2009. ( pdf )
Mingzhu Wei, Mo Liu, Denis Golovnya, Ming Li, Elke A. Rundensteiner, Kajal Claypool. Supporting Conservative to Aggressive Methodology for Out-of-order Event Processing. SIGMOD 2009. pp.1031-1034. ( pdf )
Mingzhu Wei, Ming Li, Elke A. Rundensteiner, Murali Mani. Utility-driven Load Shedding for XML Stream Processing. WWW 2008. pp.855-864. ( pdf )
Mingzhu Wei, Ming Li, Elke A. Rundensteiner, Murali Mani. Processing Recursive XQuery over XML Streams: The Raindrop Approach. Data and Knowledge Engineering 65(2): 243-265. 2008. ( pdf )
Mingzhu Wei, Ismail Ari, Jun Li, Mohamed Dekhil. ReCEPtor: Sensing Complex Events in Data Streams for Service-Oriented Architectures. HPL Tech Report, HPL-2007-176.
Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang, Drew Ditto, Elke A. Rundensteiner and Murali Mani, R-SOX: Runtime Semantic Query Optimization over XML Streams. VLDB 2006. pp.1207-1210.( pdf )
Mingzhu Wei, Ming Li, Elke A. Rundensteiner, Murali Mani: Processing Recursive XQuery over XML Streams: The Raindrop Approach. XSDM Workshop joint with ICDE2006. ( pdf )
| samanwei at wpi.edu |
| Last modified: Sep. |