Mental Health Apps: An Evaluation Framework Based on What the Evidence Actually Shows

73 popular mental health apps audited; 67% transmit user data to third parties. The APA framework for app evaluation. What 'evidence-based' actually means in marketing copy vs RCT data.

By Maya Reyes

Most app reviews you’ll find compare features. We don’t think that’s the right frame. The mental-health app category is unusual in that the implementation quality and the privacy practices matter at least as much as the feature list — and the marketing claims usually run several steps ahead of the actual evidence base.

This page is the framework we use to evaluate apps in this niche. The framework is built from three sources: the comparative-efficacy meta-analyses, the APA’s evaluation rubric (Torous et al., 2018), and the privacy audit literature. If you only read one section, read the privacy one.

What Apps Actually Do for Outcomes

Linardon and colleagues’ 2019 meta-analysis is the cleanest aggregate evidence we have on mental-health app effects. Across 66 RCTs, n=8,665, comparing app-supported smartphone interventions to control conditions:

Depression: Hedges’ g = 0.28 (small-to-moderate effect)
Anxiety: g = 0.30
Stress: g = 0.35

Effects were larger when the app delivered explicit CBT content (g~~0.40+), included a human support element (g~~0.49), and was used over ≥4 weeks. Pure self-guided apps without human contact showed smaller effects (g~0.15–0.20) [Source: https://pubmed.ncbi.nlm.nih.gov/31333840/].

Firth and colleagues’ 2017 meta-analysis focused specifically on apps for depressive symptoms (18 RCTs, n=3,414). Same pattern: g = 0.38 overall; standalone self-guided g = 0.27; with human feedback g = 0.49. Apps featuring CBT components outperformed apps using only mindfulness or pure mood tracking [Source: https://pubmed.ncbi.nlm.nih.gov/28645133/].

The headline interpretation: apps work, modestly, and they work substantially better when there’s a human in the loop or when they implement a specific evidence-based protocol (CBT structured modules, scheduled exposure exercises, behavioral activation calendars). They don’t work, or don’t work durably, as standalone treatment for moderate-to-severe presentations.

The other binding constraint is engagement. Lecomte and colleagues’ 2020 umbrella review of mental-health app meta-analyses noted that median 4-week active retention across consumer apps falls below 5% — meaning the real-world effect sizes are likely smaller than the RCT effects (where adherence is artificially boosted by trial structure) [Source: https://pubmed.ncbi.nlm.nih.gov/32213470/]. App reviews should weight long-term engagement design heavily: notification discipline, modular short sessions, gamification quality.

The APA Evaluation Framework

Torous and colleagues (2018) — writing for the American Psychiatric Association — established a 5-level evaluation framework that’s now the standard in academic app reviews [Source: https://pubmed.ncbi.nlm.nih.gov/29565049/]. Most consumer apps fail at level 2 or 3.

Level 1: Access, cost, data ownership. Free apps are very rarely actually free in the data sense. Paid apps usually have less misaligned business models. Open-source / cooperative apps are rare but exist (Cura, Daylio’s predecessors).

Level 2: Privacy, security, data sharing. Where most apps fail. (See next section.)

Level 3: Clinical foundation, evidence base. This is where the marketing-vs-reality gap is widest. Wasil and colleagues (2020) audited the 10 most-downloaded mental-health apps for depression/anxiety against published RCT evidence. 0 of 10 had RCT-grade evidence supporting their core mechanism in their actual app form; 3 had RCTs of related but not identical components; 7 made therapeutic claims unsupported by direct app-version trials [Source: https://pubmed.ncbi.nlm.nih.gov/32066997/]. “Evidence-based” in marketing copy frequently means “uses CBT terminology” rather than “RCT of this specific product.”

Level 4: Usability, engagement. Friction is the killer feature. The app you actually open at month four is the one that opens fastest and asks for the least input. Apps with onboarding flows or daily check-in screens before you can log a mood get abandoned by week two. If you can’t open it and complete a useful interaction in 15 seconds, you won’t open it at month four.

Level 5: Data integration with care system. The strongest evidence-based pattern in mental health apps right now: human therapist + scaffolding app + clinician sees the data on a shared dashboard. The combination is qualitatively different from either alone. Apps like Lyra Health, Headway, and the NHS Talking Therapies digital pathway implement this. Standalone consumer apps without this loop are inherently limited.

Privacy: The Unmentioned Variable

This is where consumer mental-health apps are genuinely underregulated in 2026, and where the typical app review fails its readers. If you’re going to type your worst thoughts into a piece of software, you should know what happens to that data.

Marshall and colleagues (2020) audited 73 popular depression/anxiety apps for privacy compliance:

67% transmitted user data (mood entries, journal text, location, device identifiers) to third-party servers for analytics or advertising
Only 21% had a clear data-deletion option
Only 36% encrypted user data at rest and in transit
Free apps were significantly more likely to share sensitive mental-health data with advertisers than paid apps [Source: https://pubmed.ncbi.nlm.nih.gov/30876818/]

Mozilla’s “Privacy Not Included” annual report continues to find similar patterns; the situation has slowly improved since 2020 but the floor is still low.

A practical 10-minute due diligence checklist before downloading:

Does the privacy policy say data stays on-device? Look for explicit language. Vague “we may share with partners” is a red flag.
Is data shared with advertising or analytics partners? Search the privacy policy for “advertising,” “marketing partners,” “third party,” “Facebook,” “Google Analytics,” “AppsFlyer,” “Mixpanel.”
Is there a data-deletion option, and how long does it take? Privacy policies that don’t mention deletion are not GDPR-compliant for EU users and are unsafe in any jurisdiction.
Free or freemium? If free, the business model relies on either ads (data as inventory) or upselling (engagement metrics over outcomes). Paid apps are usually less misaligned.
Has a researcher or journalist independently audited the app? Mozilla, the Mozilla Privacy Not Included database, and journalism in outlets like The Markup are useful sources.

Apps we generally rate well on privacy: those with on-device processing only (Daylio, Reflectly’s local-only mode), apps from researcher-led groups (Sleepio, Wysa’s CBT modules), and NHS-approved digital therapeutics (the bar there is meaningfully higher).

Apps we rate poorly: most “free” emotional support chatbots without explicit privacy commitments, apps that integrate with Facebook/Apple Health by default, anything that prompts for journal-style free text without explaining server-side storage.

For specific app comparisons — Headspace vs Calm, Daylio vs Moodfit, Sleepio for CBT-I, Wysa for CBT-style chat — see the comparison and long-tail articles indexed from this site’s homepage. None of this is medical advice. If you’re in crisis, call 988 (US) or 116 123 (UK Samaritans).