# Methodology: Sources of Formative Influence on American Children, 1900–2025

## 1. What this dataset is (and isn't)

The dataset assigns each of seven channels — Algorithm/Influencers, Internet (Non-Algorithmic), TV/Broadcast, Peers, Teachers/School, Parents, Church/Religion — a percentage share for each five-year period from 1900 through 2025, normalized to sum to 100.

It is **not** a direct empirical measurement. No instrument has continuously sampled the share of children's formative influence over a 125-year span. It is a **composite influence-allocation index**: a transparent, reproducible estimate built from documented proxy variables, period anchors, and an explicit weighting rule. Treating it as anything else is indefensible; treating it as a structured estimate with stated assumptions is defensible and useful.

The unit of analysis is the typical American child aged roughly 5–17. The geographic frame is the United States. Periods are five-year midpoints (the "1950" row represents the 1948–1952 window).

## 2. What is being allocated

"Influence" is operationalized as the **relative share of a child's value, belief, behavior, and cognitive formation that can be attributed to a given channel**. It is not synonymous with hours of exposure. A parent's bedtime conversation and a TikTok scroll session of equal duration do not carry equal formative weight.

Each channel's score is therefore the product of two components:

- **E (Exposure)**: the share of a child's waking, formation-relevant hours spent in contact with that channel, drawn from time-use proxies.
- **W (Weight)**: a relational-primacy multiplier reflecting how much per-unit-time formative power that channel carries in the period. Parents and clergy carry high W per hour because of emotional bond and authority; passive broadcast carries lower W per hour; algorithmic feeds carry elevated W because they are personalized and selected for engagement.

Raw scores S = E × W are then normalized across the seven channels to sum to 100. The 100-point sum is a heuristic, not a claim that the channels are perfectly substitutable.

## 3. Proxy variables, by channel

Each channel is grounded in a time series of real-world proxies. Where data is thin (especially pre-1950), the construction is explicit interpolation between documented anchors.

**Parents**
- American Time Use Survey (2003–present) for parent-child contact hours.
- Bianchi, Robinson & Milkie, *Changing Rhythms of American Family Life* (2006) for retrospective parental time estimates, 1965–2000.
- Female labor force participation rate (BLS), used inversely as a proxy for maternal home-presence pre-1965.
- Average household size and multi-generational household share (Census).

**Church/Religion**
- Gallup self-reported weekly worship attendance, 1939–present.
- Pew Religious Landscape Studies, 2007, 2014, 2024.
- Finke & Stark, *The Churching of America 1776–2005* for denominational adherence rates pre-1939.
- Sunday school and parochial school enrollment series (NCES, denominational reports).

**Teachers/School**
- NCES historical series: school enrollment rates, average days in session per year, kindergarten participation.
- Spread of compulsory schooling laws by state (Goldin & Katz, *The Race Between Education and Technology*) for pre-1918 calibration.
- High school graduation rate as a proxy for upper-grade exposure.

**Peers**
- Larson & Verma (1999), *How Children and Adolescents Spend Time Across the World*, for cross-period peer-time benchmarks.
- Csikszentmihalyi & Larson, *Being Adolescent* (1984) for late-20th-century peer hours.
- Post-2010 peer time partly displaced by mediated peer contact, which is allocated to Algorithm/Influencers rather than Peers (see §5).

**TV/Broadcast**
- TV household penetration, 1948–present (FCC, Nielsen).
- Nielsen average daily viewing for children 2–11 and 12–17, 1960–present.
- Pre-1948: radio listening hours from BBC/CBS audience research, treated as the "broadcast" channel.

**Internet (Non-Algorithmic)**
- Pew Internet & American Life Project, 1995–present.
- Kaiser Family Foundation *Generation M2* (2010) for 8–18 year-old media use.
- This category captures email, web browsing, search, gaming, and forum use *before* the engagement-optimized recommender feed becomes the dominant interaction mode.

**Algorithm/Influencers**
- Common Sense Media *Census of Media Use by Tweens and Teens*, 2015, 2019, 2021, 2024.
- Pew *Teens, Social Media and Technology*, 2018, 2022, 2024.
- App-level engagement (TikTok, YouTube Shorts, Instagram Reels) from Sensor Tower / data.ai.
- Treated as zero through 2009 because, while social media existed earlier, feed personalization driven by engagement-optimized ML did not dominate child-facing platforms until roughly 2010–2012 (YouTube recommendations) and 2016–2018 (TikTok's For You Page).

## 4. Anchor years and interpolation

Five anchor years are used to calibrate the whole series. Between anchors, intermediate cells are linearly interpolated unless a known event justifies a non-linear curve (e.g., TV adoption follows a logistic curve 1948–1965).

| Anchor | Reason |
|---|---|
| 1900 | Pre-broadcast, pre-universal-schooling baseline. Parents and church dominate by default. |
| 1950 | TV household penetration crosses ~9% and begins its logistic rise. |
| 1980 | Cable maturity, dual-earner household norm, "latchkey" cohort. |
| 2007 | iPhone launch — start of always-on personal mediated access. |
| 2018 | TikTok For You Page becomes the dominant short-form feed for U.S. teens (Pew). |

## 5. Decision rules that shape the numbers

Three modeling decisions materially affect the output and should be stated openly:

1. **Mediated peer contact is allocated to Algorithm/Influencers, not Peers, post-2015.** When a teen spends two hours in a TikTok DM thread, the medium is shaping the interaction (compression to short-form, algorithmic resurfacing, public-performance dynamics). The Peer share therefore plateaus rather than rising in the smartphone era.
2. **Broadcast radio (pre-1948) is folded into the TV/Broadcast column.** This is why the "TV/Broadcast" series shows non-zero values starting in 1920 even though television was not yet in homes.
3. **Influence is zero-sum.** The 100-point normalization forces a tradeoff. In reality, channels can be complementary (a teacher reinforcing a parent). This framing is appropriate for showing *relative shift over time* and inappropriate for claims about absolute formative quantity.

## 6. Falsifiability checks

A defensible model has to fail loudly if its assumptions are wrong. The following checks are passed by the current series:

- TV/Broadcast rise from 1950–1970 tracks Nielsen TV-household penetration (≈9% → 95%).
- Church/Religion decline from 1960–2025 tracks Pew/Gallup attendance and the rise of religiously unaffiliated ("nones") from ~5% to ~28% of U.S. adults.
- Algorithm/Influencers post-2015 tracks Common Sense Media's reported teen daily screen time on short-form feeds (from <1 hour/day to ~4+ hours/day, 2015 → 2024).
- Parents' share decline 1950–2000 tracks female labor force participation (≈33% → ≈60%) and the corresponding reduction in parent-child contact hours in Bianchi et al.

Checks the model **does not** pass and which should be flagged:

- Pre-1930 cells for Peers and TV/Broadcast are reconstructive estimates; no continuous time-use data exists. Confidence is low for any cell before 1950.
- The single-digit precision in cells like "Parents = 22" implies an accuracy the underlying proxies cannot support. A defensible presentation would round to the nearest 5 or display ±5 error bars on every cell.

## 7. Known limitations

- **U.S.-only.** Cross-national application requires re-estimating every series.
- **Average child only.** Variance by region, race, class, urban/rural, and religious tradition is real and material. A Mormon child in Utah in 2025 has a very different allocation than the national average; this model says nothing about either.
- **"Influence" is not directly observed.** All defenses of the model rest on the quality of the proxy mapping, not on direct measurement.
- **Recency bias risk.** Recent categories (algorithm, influencer) are easier to measure precisely and may be over-weighted relative to older, harder-to-measure categories.
- **Normalization artifact.** Because the sum is forced to 100, any one channel's growth mechanically depresses the others, which can overstate the apparent collapse of older channels.

## 8. How to defend it in conversation

If someone challenges the data, the defensible answers are:

- "It is a composite index built from documented proxies, not a measurement. The construction rule is published and you can disagree with the weights."
- "The post-1950 series is grounded in time-use data (BLS ATUS, Nielsen, Pew, Common Sense Media). The pre-1950 series is interpolated between documented anchors and should be read with wider error bars."
- "The shape of the curves matches independent series — TV penetration, female labor force participation, religious attendance, teen screen time. If those curves are wrong, this model is wrong in the same direction."
- "It is not a claim that algorithmic feeds *cause* a 53% share of formative influence. It is a claim that, under a documented allocation rule, they account for an estimated 53% of the influence budget."

What is **not** defensible: presenting this as survey data, claiming single-percentage-point precision, or extending the same numbers to non-U.S. populations without re-estimation.
