BDS
Big Data Science
Course Description
This course will take a practical approach to solving challenges in the public and private sectors using data analytics. A number of different themes will be explored as case studies in order to demonstrate how data-driven decision-making has widespread applications. The course will examine how the question being posed, the available data and the selected modelling approach all come together to arrive at a feasible solution. A range of quantitative techniques, involving both linear and nonlinear methods will be presented for dealing with numerical structured datasets. Substantial emphasis will be placed on the process of delivering data analytics via a dashboard to facilitate decision-making and policy-making. The course content will be structured to provide a roadmap for carrying out the necessary procedures and will be illustrated using case studies, reading material and previously published models. Participants will obtain hands-on experience by working on specific challenges with real-world data through a carefully structured set of assignments. The full course syllabus contains additional information and can be found in PDF format HERE.
Outcomes
After completing this course, students should be able to:
Design a data analytics project in response to a specific challenge
Download and organize data for addressing the challenge
Explore the dataset using visualization techniques
Apply a range of quantitative techniques
Discuss the advantages and disadvantages of different models
Select an approach that is optimal for meeting the objective
Present conclusions and recommendations
Communicate model output to decision-makers
Textbooks
There will be no required textbooks, though we suggest the following to help you to study (all available online):
(ESL): Elements of Statistical Learning Trevor Hastie, Robert Tibshirani and Jerome Friedman.
(TM): Machine Learning, Tom Mitchell.
(PRML): Pattern Recognition and Machine Learning, Christopher M. Bishop
(MJ): Machine Learning Algorithms, The SAS Data Science Blog
Piazza
We will use Piazza for class discussions. Please go to the course Piazza site to join the course forum (note: you must use a cmu.edu email account to join the forum). We strongly encourage students to post on this forum rather than emailing the course staff directly (this will be more efficient for both students and staff). Students should use Piazza to:
Ask clarifying questions about the course material.
Share useful resources with classmates (so long as they do not contain homework solutions).
Look for students to form study groups.
Answer questions posted by other students to solidify your own understanding of the material.
Academic Intergrity
The course Academic Integrity Policy must be followed when doing assignments and on the message boards at all times. Details on ECE's Academic Integrity Policy can be found in the course syllabus and HERE.
Grading
The grades for this course will be based on students’ performance on seven homework assignments, a final exam and class participation. Homework assignments will be done individually and turned in via Canvas by the designated due date. Late work will be acceptable until 24 hours past the deadline, but it will lose 10%. The assignments will be graded based on both a writing report and code used to achieve results presented in the report. Class participation will be evaluated based on student’s contribution to discussions both in-class and on the Piazza Discussion Board. When posting or reacting to online discussion threads, students are expected to use their own words and the post should be relevant to the topic under discussion. Make sure to introduce, summarize and explain the article in your own words to enlighten the audience on the point the article is making.
The following is the weight distribution of the grades:
Class participation 5%
Kahoot Quizzes 2.5%
Piazza Participation 2.5%
Homework Assignment 1 20%
Homework Assignment 2 25%
Homework Assignment 3 30%
Final Exam (Multiple Choice) 15%