How These Analyses Work
This page documents exactly how each analysis is computed, derived directly from the source code.
Use it to interpret results correctly and spot anomalies.
1. Building a Student Group
File: R/branches/population.R
Functions: build_population() →
get_focal_programs(),
get_ongoing_ids(), get_graduated_ids(),
get_switched_out_ids(), get_never_declared_ids(),
get_entry_pathways(), classify_origin(),
classify_entry_method(), classify_entry_status(),
build_demographic_population()
A student population is built in three stages: (1) identify candidates
— any student who ever appeared in the focal major; (2) classify outcomes
— determine what happened to each candidate relative to the major; (3) filter
and label — include the desired outcome groups and assign labels.
The result (a tibble with student_id, population_label,
outcome, entry_pathway, origin,
entry_method, entry_status, relevant_until)
is passed to every downstream analysis.
Outcomes
- ongoing — still declared in the focal major in the most recent data term.
- graduated — received a degree in the focal major in their last focal term.
- switched_out — left the focal major but remained at UNM. Detected two ways: (1) a formal declaration of another major after their last focal term in
cedar_programs; (2) any enrollment record in cedar_students after their last focal term, even without a re-declaration.
- stopped_out — all declared candidates not accounted for by ongoing, graduated, or switched_out. No UNM enrollment or major record after their last focal term.
- chose_elsewhere — appeared only as a pre-major; never declared the focal major, but did declare a different major afterward.
- left_undeclared — appeared only as a pre-major; never declared any major. Left without committing.
Entry pathway (<code>entry_pathway</code>)
How the student arrived at the focal major — computed by
get_entry_pathways()
:
- direct — first major at UNM was the focal major (no prior declared major or pre-major).
- switched_in — had a non-focal declared major before declaring the focal major.
- pre_major — appeared as a focal pre-major before (or instead of) declaring.
Entry classification columns
- entry_method (
classify_entry_method()) — first_program: no prior major record of any kind before this unit; switched_in: had at least one prior major record; unclear: first unit record is at the earliest available term, so prior history is unobservable.
- entry_status (
classify_entry_status()) — whether the student’s first record in this unit was as a pre_major or a declared major.
Enrollment window (<code>relevant_until</code>)
Each non-ongoing population student carries a relevant_until term: their
last_declared_term (last term with a declared, non-pre-major focal record).
Course enrollments after that term are excluded from all analyses. A student
who was History for 2 terms, then switched to Business for 8 terms, contributes only
the 2 History terms to the analysis. Ongoing students have
relevant_until = NA (no restriction).
Worked example — dept = HIST, default scope (declared majors)
| student_id |
program_name |
program_type |
is_pre_major |
outcome |
result |
| S001 |
History |
Major |
FALSE |
ongoing |
✓ included |
| S002 |
History |
Second Major |
FALSE |
ongoing |
✓ included (Second Major counts) |
| S003 |
History |
Major |
TRUE |
chose_elsewhere / left_undeclared |
— excluded by default (pre-major only) |
| S004 |
English |
Major |
FALSE |
— |
— excluded (different dept) |
| S005 |
History |
Minor |
FALSE |
— |
— excluded (Minor) |
What programs belong to a department? The lookup uses
cedar_programs$dept_code, which is populated during transformation
via a three-tier lookup: major_dept_map → subj_dept_map → major_code identity.
If a program’s dept_code is missing or wrong, it won’t appear in the dropdown.
2. Roadblocks — DFW as a Predictor of Leaving
File: R/cones/stopout.R
Functions: get_stopout(),
classify_outcomes(), compute_stopout_for_group()
For each course, compares the fraction of group students who did not return
the following term among those who got a DFW grade versus those who passed.
The gap between those rates is the key signal.
Step 1: Classify outcomes (per student per course per term)
| registration_status_code |
final_grade |
classified as |
| DR (early drop) |
any |
dfw — non-completion regardless of grade |
| RE / RS / RR |
D, D+, D–, F, W, RD, RF |
dfw |
| RE / RS / RR |
A–C, CR, P, S, RA–RC, RCR |
pass |
| RE / RS / RR |
I, AUD, NR, or other |
excluded — ungraded, no signal |
| WL / other |
any |
excluded |
Step 2: Determine whether each student returned the following term
For each student in each term, we check whether they appear in
cedar_students in the next fall or spring.
Summer is not counted — skipping summer is normal and not a stop-out.
Graduate correction
Students who earned a degree in term T are not counted as stopped out
for that term, even though they don’t appear in term T+1. Without this correction,
every graduate who finished their program would be misclassified as a stop-out.
The correction uses cedar_degrees$term to identify graduation terms.
⚠ Partial coverage:
Graduate correction only applies to degrees recorded in CEDAR. Students who
transferred out or completed credentials not in cedar_degrees will still appear as stop-outs.
Step 3: Compute rates and gap
| student_id |
BIOL 2310 outcome |
returned next term? |
| S001 |
pass (A) |
yes |
| S002 |
pass (B) |
no |
| S003 |
dfw (F) |
yes |
| S004 |
dfw (W) |
no |
| S005 |
dfw (W) |
no |
pass_stopout_rate = 1/2 = 0.500 (S002 didn’t return)
dfw_stopout_rate = 2/3 = 0.667 (S004, S005 didn’t return)
stopout_gap = 0.667 − 0.500 = 0.167
p_value: chi-squared test on the 2×2 contingency table (outcome × returned).
Skipped if either group has fewer than 5 students — result is NA.
⚠ Known anomalies to watch for:
- The most recent term in the data has no visible ‘next term,’ so all students
in that term appear as stopped out. This inflates stop-out rates for recently
active courses.
- Rows where
pop_n_dfw is very small (1–4) produce
unreliable rates. The Min group DFW students filter (default 5) removes these.
- The baseline is ALL non-group students in the same courses.
- Stop-out is measured as ‘returned to UNM,’ not ‘continued in the program.’
4. Course Timing — When Students Take Each Course
File: R/cones/pathway.R
Functions: get_course_timing(),
plot_curriculum_map()
Computes the fraction of group students who took each course in their
1st, 2nd, 3rd… enrolled term. Uses relative terms so students who
started in different calendar years are aligned on the same axis.
How “term 1” is defined
Relative term 1 is the first term in which the student has a registered
course record in cedar_students — not their first semester
at UNM, not their first semester in the program, and not any self-reported
start date. It is row_number() over their distinct enrolled terms,
sorted chronologically by UNM term code.
Skipped semesters
The counter only increments for terms with actual registered enrollment.
Gaps are invisible. A student enrolled in Fall, absent in Spring, enrolled in Fall
has relative terms 1 and 2 — not 1 and 3. There is no concept of
“missed term 2” in this model.
Summer terms
By default, summer does not advance the counter.
Summer courses are pinned to the relative term of the immediately preceding
fall or spring. A student taking a summer course between their 2nd and 3rd
fall/spring semesters has those summer courses recorded as relative term 2.
If “Include summer” is enabled, summer gets its own slot in the sequence.
Denominator
For each relative term, the denominator is the number of group students who
reached that term — i.e., students whose enrollment record extends
to at least that relative term. Students with only 3 terms of data are excluded
from relative terms 4–8. This prevents the percentage from being artificially
deflated for later terms.
⚠ Left-truncation artifact — Freshman filter is applied automatically:
Students who were already enrolled when CEDAR data begins (Fall 2018) have
relative term 1 set to Fall 2018, regardless of how long they had actually
been at UNM. A senior in Fall 2018 looks like a first-semester student, which
makes the chart meaningless. This is called left truncation.
To prevent this, the app automatically restricts the relative-term axis
to first-time freshmen — students whose first enrollment record in CEDAR
is classified as Freshman. These are the only students whose term 1 is genuinely
their first semester. You can override this by selecting a different
Starting Classification in the filters.
This filter does not apply to the Classification, Inst. Credits, or
Overall Credits x-axis modes — those use actual Banner values recorded at the
time of enrollment and are unaffected by when the data window starts.
5. Course Pairs — Common Sequences
File: R/cones/pathway.R
Function: get_course_pairs()
Finds ordered pairs (A → B) where group students took Course A in one
relative term and Course B in a later term, within a configurable term gap.
Exact computation
- Self-join enrolled records on
student_id where
term_B > term_A and
term_B − term_A ≤ max_term_gap and
course_A ≠ course_B.
- Count distinct students per (course_A, course_B) pair.
- pct_a_to_b = students who took both ÷ students who took A.
⚠ This is correlation, not causation.
A high pct_a_to_b means students who took A commonly went on to take B.
It does not mean A is a prerequisite for B or that taking A causes students to take B.
6. Major Changes
Files:
R/cones/major-changes.R (detection and summarization),
R/branches/population.R (group building),
R/modules/pathways.R (focal program derivation and display)
Key functions: detect_major_changes(),
major_change_pathways()
Detects when a student’s primary declared major changed from one term to the next,
then summarizes those transitions for the selected student group.
Step 1: Detect change events
Source: detect_major_changes() in R/cones/major-changes.R.
- Filter
cedar_programs to program_type == “Major”
rows for the population students only.
- Sort by
student_id, term. Use lag() to get
each student’s program in the prior term (prev_major) and their prior
academic level (prev_level).
- Flag a change when
program_name != prev_major AND
(is.na(prev_level) | student_level == prev_level). The level
check excludes transitions between undergraduate and graduate programs —
a History BA student enrolling in Law School is not a “major change”
in the undergraduate sense. is.na(prev_level) passes the first
record per student through since there is no prior level to compare.
- Each flagged row becomes one change event with:
student_id,
change_term, from_major, to_major,
credits_at_change (institutional credits attempted at the time).
Worked example — History student program history
| student_id |
term |
program_name |
student_level |
prev_major |
result |
| S001 |
202310 |
Psychology |
Undergraduate |
(none) |
— first term, no change |
| S001 |
202380 |
Psychology |
Undergraduate |
Psychology |
— same major |
| S001 |
202410 |
History |
Undergraduate |
Psychology |
✓ change event: Psych → History |
| S001 |
202480 |
History |
Undergraduate |
History |
— same major |
| S001 |
202710 |
Juris Doctor |
Graduate/GASM |
History |
— level changed (UG→GR), excluded |
Step 2: Derive focal majors
Source: mc_data reactive in R/modules/pathways.R.
Focal majors are the majors that define the selected student group —
not all majors ever held by group members. A History cohort student who also
declared Political Science should not make PolSci a focal major.
- Dept mode (e.g., HIST): all majors where
dept_code == “HIST” and program_type %in%
c(“Major”, “Second Major”) in cedar_programs.
- Specific majors mode: exactly the majors the user selected
in the population filters.
- Preset mode: the
program_names list from
the population opt.
Step 3: Filter to focal changes
From the full set of change events, keep only rows where
from_major %in% focal_programs OR to_major %in% focal_programs.
This means a History cohort sees:
- Psychology → History (arriving to History) ✓
- History → Political Science (leaving History) ✓
- Political Science → Law (made by a History student, but neither side is History) ✗ excluded
Step 4: Build summary outputs
Source: major_change_pathways() in R/cones/major-changes.R.
- Inflow / Outflow table: count distinct
to_major
(arrivals) and from_major (departures) in focal changes, then filter
to rows where the major is in focal_programs. Net = arrivals − departures.
- Common Pathways table: group focal changes by
(from_major, to_major), count events, compute
avg_credits =
average inst_credits_attempted at the moment of the switch.
Minimum threshold (default 3) removes rare pairs.
- Avg credits is a proxy for timing: 30 credits ≈ freshman year,
60 ≈ sophomore, 90 ≈ junior. A History → Political Science pair at 75 credits
means students are switching in their junior year on average.
- Trend sparkline: per-term count of arrivals
(
to_major %in% focal, green) and departures
(from_major %in% focal, red).
- Donuts: “Leaving for” = top non-focal
to_major
values among departures. “Arriving from” = top non-focal from_major
values among arrivals. Students cycling between focal majors are excluded
from the donuts to avoid self-referential loops.
Worked example — Inflow / Outflow for a History dept cohort
| major |
students arriving to |
students leaving for elsewhere |
net |
| History |
47 |
31 |
+16 |
| History / Pre-Law |
5 |
12 |
−7 |
Only History-dept programs appear. The 47 arriving students came from other majors;
the 31 departures went to other majors (shown in the “Leaving for” donut).
⚠ Known edge cases:
- A student who switched History → PolSci → History generates two change events.
Both appear in the tables. The net can mask churn.
- Pre-major → declared transitions within the same program are not flagged
as changes (same program_name, different is_pre_major flag).
- The minimum event threshold filter removes pairs with fewer than N events.
Rare pathways that may still be meaningful are hidden. Lower the threshold to see them.
Reading this with Claude or GitHub Copilot:
Each section above names the exact file and function that implements it.
To go deeper, open the file in your editor, select the function body, and ask
“explain this function” or “what does this do step by step?”
All functions have parameter descriptions in the header comment.
For a fuller picture, paste the function into Claude along with a specific question —
for example: “Why does get_switched_out_ids() use last_focal_term + 100
as an upper bound?” or “What edge cases does the enrollment-based switch detection handle
that the program-record check misses?” The code is designed to be readable; the AI fills
in the reasoning.
Methodology reflects: R/branches/population.R (group builder),
R/cones/stopout.R,
R/cones/pathway.R, R/cones/major-changes.R,
and R/modules/pathways.R (display logic).
Update this panel when cone logic changes.