WPI Worcester Polytechnic Institute

Computer Science Department
------------------------------------------

DS504/CS586 - Big Data Analytics - Fall 2016

Version: Aug 24th 2016

------------------------------------------

Home Class Info Schedule Projects
Grading Reviews Presentation Resources

------------------------------------------

Class Information:

When/where: THUR, 6:00pm - 8:50pm, AK 232
Web: http://wpi.edu/~yli15/courses/DS504Fall16/

Mailing list:

    cs586-staff@cs.wpi.edu (reaches instructor)
    cs586-all@cs.wpi.edu (reaches students and instructor)

Instructor:

    Prof. Yanhua Li
    Office: AK130
    Email: yli15 at wpi.edu
    Website: http://wpi.edu/~yli15/
    Office hour: M, T, R, & F, 10:30-11:00am; Others by appointments

    TA: Zhou, Chong
    Office: AK 013
    Email: czhou2@wpi.edu
    Office hour: Friday 2:00-4:00pm; Others by appointments

Course Description:

    [Topics.] We are living in the age of big data, where data is measured in terabytes and zetabytes, streamed in real-time, and derived at unprecedented speeds in diverse forms. Big data promises to impact the world as we know it, from increased productivity at our workplace to how we live our daily social lives. However, it also presents tremendous challenges as entities from individuals, companies, organizations, political groups, to governments strive to gain insights from vast torrents of complex data. This course covers computational techniques and algorithms for measuring, analyzing and mining patterns in large-scale datasets. Techniques studied may include data analysis issues related to large-scale data sampling and estimation, data cleaning, management, clustering, etc. Real-world applications using these techniques, for instance urban computing, social media analysis and recommender systems, are selectively discussed. As part of this course, we will read literature and try our hands on this technology by conducting course projects.

    [Recommended background.] This is an *advanced* graduate course which is primarily targeted for second (or higher) year Ph.D/MS graduate students. The priority for enrollment will be given to CS/DS Ph.D students who are working in big data analytics and related areas; then other Ph.D students or MS students who have taken course(s) in databases and/or in data mining, or equivalent knowledge. Sufficient programming experience and knowledge of data analytics (e.g., data mining, machine learning, optimization, or control theory) is expected so that you are comfortable to undertake a course project. The course will focus on developing skills to solve real-world bigdata / data-driven problems, rather than introducing basics of data mining/machine learning techniques. If you are in doubt, please talk to the instructor.

    [Course structure.] Please note that this is not a regular lecture-based course, but more a seminar- and project-oriented course, with student presentations, classroom discussions as well as student research projects. More specifically, we will operate the course in two "parallel" tracks: In one track, we will read, study and discuss research papers on Big Data Analytics. This track will consist of some presentations by the instructor, and but also by the students. Active participation in class discussion is required! The presenter functions primarily as the lead to facilitate discussion! In parallel, we will group students into "research teams" to study and investigate a selected research topic of interest. With the help of the instructor, each research team will identify research problems they want to study, write up a project proposal, make their case in front of the class, and throughout the course, make presentations on their "findings" and proposed solutions, etc. Amble time will be reserved for these purposes. Each research team is required to submit a final project report in the format of a research paper by the end of the course. In addition to the presentation and term project, each student is expected to read every research paper, write up paper reviews/critiques, occasionally answer a few questions/solve problems related to the papers, participate in classroom discussion, and take part in the "peer review" process.

Textbook:

    The topic is evolving. Thus no one comprehensive text book exists that would contain the material we will study in this course. Instead we will be utilizing a variety of sources, including publications from the primary literature and book chapters. These manuscripts will be provided to the class and/or linked into our schedule.

Coursework and Evaluation:

    The grading system for this course is A,B,C,D,F (without +/-).
    Oral Work: 30%.
    Written Work: 30%.
    Class projects: 40% (Project 1 for 10% and Project 2 for 30%)
    Note:Please see more details of the breakdowns for each part in the grading page, and the "Important Dates" for the timing of Critiques, presentation slides, and projects in the projects page.

Course Objectives:

    The course is organized as series of presentations and discussions on state-of-art techniques in various topics. The professor as well as each of the students (in teams) will present seminars. For a given seminar, all non-presenting students are expected to have read the presented paper before class, to submit a critique of the paper at the start of class that demonstrates that they have read and thought about the reading, and then to actively engage in the discussions of the material in class. Specific objectives include:

    Gain knowledge in fundamental principles, algorithms and technological advances in the field of big data analytics.
    Develop skills needed to critically read and make use of technical literature.
    Get practice designing a project or research agenda related to big data analytics.
    Learn to identify and acquire new knowledge on a chosen subject of interest.
    Practice your skills to communicate your ideas to an audience in a presentation or a scientific discussion.

Learning Outcomes:

    Upon completion of this course, students should be able to:

    Explain challenges and advances in the state-of-art in big data analytics.
    Design, develop and fully execute a big data analytics project.
    Demonstrate skills to critically review technical literature and assess technological advances in big data analytics.
    Communicate their ideas effectively in the form of a presentation and written documents to a technical audience.



yli15 at wpi.edu