

DS504/CS586 - Big Data Analytics - Fall 2016
Version:
Aug 24th 2016


Self-and-cross-evaluation form.
Project 1:
Project 1: Collecting and Measuring Online Data
Projects will be in groups!
Week 3 (9/8 R), Starting date
Week 4 (9/15 R), Proposal Due: 2 pages roughly (upload it to discussion board)
Week 5 (9/22 R), Methodology due (upload it to discussion board)
Week 6 (9/29 R), Results due (upload it to discussion board)
Week 7 (10/6 R), Conclusion due (upload it class discussion board)
Week 8 (10/13 R), Final Report (roughly 8 pages) due at 11:59pm EST & Self and Cross-evaluation due at 11:59pm EST
Week 8 (10/13 R and 10/27 R), In-class Presentation (20 min including Q&A)
Programming in C/C++ (can be Java, Python, and etc).
Students will read a previously published research paper. The students will then replicate the basic project architecture, with occasional modifications or improvements, and perform similar analysis to the research authors. However, students must make at least one significant design change in the architecture and be able to defend this decision.
1. Estimating Online Site Statistics
Required Readings: [ACM IMC 2011] Counting YouTube Videos via Random Prefix Sampling..
Please feel free to choose one online site/service with APIs to download data. You can propose to analyze the collected data to (1) estimate site statistics, or (2) applying machine learning methods to predict future trends, or (3) perform time-series analysis to capcure dynamic patterns, or something else, as long as your work can potentially bring research value to the community.
Each project will culminate in a term paper that is fashioned like the papers found in the big data analytics research literature. To make this problem tractable, it has been deconstructed into the following set of required deliverables (all page counts estimates assume dense page formatting). These deliverables must be completed in order (note, the "Introduction" section comes towards the end):
Proposal: The proposal will describe the work to be performed along with a detailed NABC (Needs, Approach, Benefits, Competition) analysis on the work. As part of the proposal, students are expected to ensure the work is novel or the required design deviation from the example paper. In the proposal, students are expected to describe their basic methodology and the resources needed to complete the work. The research proposal will likely be at least two pages.
Methodology: The methodology write-up will describe the experiments, in detail, that the students will perform as part of the project. The methodology must be articulated to the extent that another researcher in the field could replicate the methodology without prior knowledge of the project. The methodology section will likely be at least one page.
Empirical Results: Students should describe the results of conducting their research experiments. This section should identify what exactly the outcomes are and whether the results are significant.
Conclusion: Students should write a conclusion for the work, summarizing the contributions, the impact, and potential for follow-on work. This section will likely be at least half a page.
Introduction: Once the project is nearly finished, the students should write an introduction to the work, describing the motivations, the intended goals (from the proposal), highlights of the methodology (from the methodology section) and the key results of the work (from the results section). The introduction will likely be at least a page.
Abstract: The abstract will summarize the motivation, contributions, and key results of the work in a concise manner. The abstract will likely be at most three paragraphs.
Project 2:
Information concerning the semester-long course projects for CS586/DS504 is given below. The intent of the project is for students to select from a wide spectrum of possible projects depending on your interests and skills. You are highly encouraged to take on a project in a topic that you are not yet familiar with. This assures that you learn something new. The project typically corresponds to an implementation of an idea, application of a technology soving a big data problem, or to basic research in big data analytics. That is, projects may involve the utilization of existing big data analytics tools to extract knowledge from data sets; the development of new or the extension of existing tools with advanced features; the realization of a research idea or data mining algorithms that you find in the literature into software as proof of concept; or a comparative study of multiple alternative approaches for tackling a given problem.
You are expected to produce and submit a professional report on your project, complete with a clear motivation, full description with examples, and careful evaluation of the effectiveness of the solution.
Teaming:
Typically, you would work in groups of two to four students. In order to find teammates for your project, you are free to prepare a short 2 minute presentation on your proposed project (to be given in the course). But start thinking now what interests you most. You can also advertise your project idea to the cs586 discussion board on mywpi.wpi.edu to find potential partners. The teams will be asked to present their projects to the instructor at various stages throughout the project development.
Some Project Types:
There are no restrictions on the 'type' of project as long as you can justify that it is related to big data analytics. You need to get your project approved by the instructor. Project types include:
The project typically corresponds to an implementation of an idea, application of a technology to solve a big data problem, or to basic research in big data analytics. That is, projects may involve the utilization of existing big data analytics tools to extract knowledge from data sets; the development of new or the extension of existing tools with advanced features; the realization of a research idea or data mining algorithms that you find in the literature into software as proof of concept; or a comparative study of multiple alternative approaches for tackling a given problem.
- Software development of a particular subcomponent or software tool of a big data analytics system, such as, better scale-out of a data mining function to support big data, addition of an additional machine learning method into the system, etc. This should allow you to gain exposure to existing systems, and to experience challenges faced with their extensions.
- Software development of some open-ended idea or algorithms into a working system. This could be to implement an advanced strategy you have identified in the literature. Experimentation with your chosen algorithms to test their effectiveness. Comparison of your results to those in the literature is encouraged.
- Development of an application utilizing big data analytics technology for solving a practical problem. Here, your focus would be to select a technology, to locate useful (real) data sources (say on-line weather feeds for the area, news reports, twitter data, and so on), to transform such data into a format so that your tool can consume it. Thereafter, you would use the tool to analyze your data to extract meaning relevant to your application. The main work would go into understanding the requirements of your chosen applications, working with real data, analyzing the data and extracted relevant knowledge. You may also develop tasks-specific interfaces and other components so that the application is useful at the end. You should comment on the adequacy or lack of required features current the technology to solve your targeted problem based on your experience.
- A thorough comparative study of two or more mining or machine learning methods applied to the same data set to compare their relative effectiveness, quality of results found, speed, robustness, etc.
Example Projects
- One example is a traffic pattern learner that would apply data mining techniques to both traffic archives (taxi dataset) and online traffic patterns (California traffic) to learn about typical routes of taxis, congestions in the form of clusters, open parking spaces, outliers in traffic, or to recommend routes.
- Another example is a social media barometer tool that performs analytics on twitter data, aggregated by location and time, to determine periods or places with heavy flu patterns. Heterogenous data from mobile phones or other sensors that also provides information about the number of people in a given space could be utilized to further enhance the quality of the predicted results.
- A third example is a movie recommender, that based on the UCI movie data set maybe together with a second data set from what is playing right now in the theatres would perform analytics to learn about what features tend to lead to block busters, e.g., certain combinations of actors, certain topics, for certain age groups, etc.
- Lastly, a fourth example is an application that analyzes the DBES soccer sensor data set 2013 to learn about different features prevalent in such a real-life game, e.g., what situation lead to a goal, who had the most chances for a goal, what team dominated in the play, etc.
- Instead, do put your own much cooler project idea here that really rocks ...
Project Stages and Deliverables.
The project has four stages:
- STAGE 1: Project Intent.
The key is that you will need to a team partner. Each group should come with a idea of what they may want to pursue. Finally, the written project intent can be very brief. Mostly, it lists the intended type of project and its goals. A single paragraph defining the project should be sufficient. The instructor will provide you with a go, no-go, or other feedback. Do not forget to list your team name, your member names, and project title.
Note: Do discuss your ideas on a project with me before finalizing your proposal. Your project intent will be approved, disapproved or some modifications of direction will be suggested. You will not receive a grade for this stage zero.
- STAGE 2: Project Proposal.
Each group will turn in a typed proposal document (about 2 to 5 pages) defining the finalized project. You should have thought about your proposed work, and possibly tried to solve some initial part of it. Make sure to have dug in enough to understand the feasibility of your proposed direction, and/or to have looked at the necessary background knowledge and skills needed to pursue the project. This proposal should explain the proposed work to be done. You should also list relevant environment (tools) you have set up or papers you have read in order to succeed to complete the project. Do provide an expected schedule for your planned completion, including a list of tasks to be undertaken week by week, and the deliverables for the end of the course.
Note: This will count for 10% of your final project score.
- STAGE 3: Project Progress.
This progress report should clearly state the current status of the project. This report must be typed. Typically it is between 5 to at most 10 pages long. By this time, I would expect you to be already half way towards completing your project. Teams routinely loose points for this stage, if they have not succeeded to make sufficient progress. I would expect you to have conducted all necessary background work including establishing a bibliography and reading relevant manuals and literature, installing and testing all necessary software, resolving specific design issues, and refining the project plan or possibly re-directing the effort based on your background studies. Also, there needs to be a clear plan of what will be accomplished by the end of the project. You must develop a precise schedule and task list for the remainder of the project.
Note: This report will carry 30% of your final project score.
- Stage 3: Final Project.
Finally, the complete project is delivered. You are expected to produce and submit a professional report on your project, complete with a clear motivation, full description with examples, and careful evaluation of the effectiveness of the solution. In addition, the presentation and demonstration of your solution, as appropriate, will be done in class by each team. We'll determine the exact time allowed per project later. See below to learn about what aspects should be stressed in this presentation.
This presentation must be supplemented with a final project report due at the same time. This report should be a well-written technical report describing your project. It is fine and in fact likely that this report will be a direct extension of your progress report. Depending on the nature of your project, it will need to contain a detailed description of your data set, of your system, key technologies used, experimental charts, sample runs, a detailed analysis of the results. The report typically is between 10 to 20 pages.
Note: This part of the course project will count towards 60 % of the project score.
Grading
By default, one score is assigned to the team as a whole. If this happens to be not appropriate in your particular context due to members not contributing equal effort, then you must communicate this concern to the instructor as early on as you can to have time to rectify any concerns. All team members will also be grading every one else on their team.
Grading of Project Progress
The grade of this assignment will be given using the following as guideline:
the progress made towards your project thus far (i.e., are you about half-way into the project by now?)
the difficulty and size of the project you have chosen to work on,
your overall solution approach and techniques utilized,
the project presentation (was it well-organized, clear, and informative?)
the written documentation of your project progress in the form of a report (which you will make accessible to other students in the class),
the time plan for the remainder of the project.
Project Grading in General
The final grade of the team project (which may not necessarily be the same for each member of the team) will be based on :
- the difficulty of the project you have chosen to work on, and the effort you have put into formulating your problem and identifying relevant techniques to solve your problems,
- your solution approach and techniques utilized and the quality and effort of execution,
- the oral project presentation (covering the problem your group tackled, your solution approach, and your results at the end of the course),
- the written documentation of your project in the form of a report (which you will make accessible to other students in the class),
- the demonstration of your system (including successful example runs), as applicable and
- the understanding of each team member of his or her part of the project, as well as of the overall group product.
yli15 at wpi.edu
|