3,786

Views

CrossRef citations to date

Altmetric

Articles

Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming

Paulo BliksteinSchool of Education and (by courtesy) Computer Science Department, Stanford UniversityCorrespondencepaulob@stanford.edu

Marcelo WorsleySchool of Education, Stanford University

Chris PiechComputer Science Department, Stanford University

Mehran SahamiComputer Science Department, Stanford University

Steven CooperComputer Science Department, Stanford University

Daphne KollerComputer Science Department, Stanford University

Abstract

New high-frequency, automated data collection and analysis algorithms could offer new insights into complex learning processes, especially for tasks in which students have opportunities to generate unique open-ended artifacts such as computer programs. These approaches should be particularly useful because the need for scalable project-based and student-centered learning is growing considerably. In this article, we present studies focused on how students learn computer programming, based on data drawn from 154,000 code snapshots of computer programs under development by approximately 370 students enrolled in an introductory undergraduate programming course. We use methods from machine learning to discover patterns in the data and try to predict final exam grades. We begin with a set of exploratory experiments that use fully automated techniques to investigate how much students change their programming behavior throughout all assignments in the course. The results show that students’ change in programming patterns is only weakly predictive of course performance. We subsequently hone in on 1 single assignment, trying to map students’ learning process and trajectories and automatically identify productive and unproductive (sink) states within these trajectories. Results show that our process-based metric has better predictive power for final exams than the midterm grades. We conclude with recommendations about the use of such methods for assessment, real-time feedback, and course improvement.

Notes

¹ Modification of a line was defined to have taken place anytime a new line was 30% different from the same line in the previous snapshot. The designation characters modified refers to the absolute value of the difference between the original and the modified lines. The 30% value was selected to strike a balance between detecting modifications and not having any modifications. When we looked at the data, using values closer to 40% resulted in the inclusion of lines that did not appear to truly be modifications (low precision). In contrast, using 20% resulted in poor recall, such that many modifications were overlooked. Hence, we chose a similarity score of 30% as the criterion.

² We attempted to use k-means as well, and the algorithms converged to the same clusters.

³ The normalization was meant to avoid the fact that absolute values of the changes skew the analysis or erase assignment-specific characteristics that we would want to account for (instead of attributing to individual differences).

⁴ The teaching team varied between the spring and summer quarters, but the curriculum remained unchanged.

⁵ See the supplemental material for details.

⁶ Because the similarity metric could be biased toward assigning low similarity scores to snapshots of assignments by the same student, we modified our algorithms to never use the similarity value computed from the same student.

⁷ A support vector machine is a sorting machine that learns by example. See the supplemental material for details.

⁸ For space considerations and because the data were more robust and well-defined, we only show Alpha and Gamma.

FIGURE 10 Visualization of finite state machines for the Alpha and Gamma clusters of students. The size of the circles is proportional to the approximate number of code snapshots in the various states. Note that this is not equivalent to the number of students in each state, as the same student could have multiple snapshots within any given state.

⁹ This means that our approximation of using a save/compile event as a marker for when students conclude a unit of work was correct.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming

Information for

Open access

Opportunities

Help and information

Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming

Abstract

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature