Research at Parts Companion · An Evolve.org company

The IFS evidence base deserves real measurement.
We intend to build it.

Parts Companion is developing the first validated, observer-rated, automatable process measure for Internal Family Systems therapy — built on what we believe is the largest corpus of de-identified and research-released IFS session recordings in existence, and validated in the open, with the field.

Why

We start from a conviction, not a market.

Parts Companion is built by Evolve, an organization dedicated to the evolution of human consciousness — to the reduction of inner suffering and the facilitation of genuine personal transformation. We believe the great inner technologies — disciplined, teachable methods for working with the interior life — deserve the same infrastructural seriousness the world grants its outer technologies.

Internal Family Systems is among the most promising of these. It gives people a workable map of their own inner world, and it gives clinicians a humane, non-pathologizing method that practitioners and clients love. But a modality only reaches the people who need it when institutions trust it — and institutions trust evidence.

So our research program is not a side project or a marketing exercise. It is the most direct contribution we can make to the mission: help IFS earn the institutional standing its clinical reality deserves, by building the measurement infrastructure that every future trial, grant application, and training program will need.

The state of the field

IFS is widely practiced, deeply loved — and absent from every list that allocates legitimacy.

0
peer-reviewed IFS studies identified by the 2025 scoping review
0
randomized controlled trials of IFS, ever
0
appearances on current clinical guidelines or EST lists
0
items in the field's only fidelity scale — volunteer-built in 2014
Where IFS stands with the institutions that matter
VA/DoD PTSD Guideline (2023)
Absent — first-line: CPT, PE, EMDR
APA PTSD Guideline (2025)
Absent — recommends CPT, PE, CT, TF-CBT
NICE (UK)
Absent from PTSD and depression guidance
APA Division 12 EST list
Not listed at any tier
SAMHSA NREPP
Listed 2015; registry discontinued 2018 — the designation is orphaned
Evidence accumulated since each modality's first RCT
ACTFirst RCT 1986 · ~1,000 RCTs by 2022
EMDRFirst RCT 1989 · ~44 adult RCTs · first-line since 2017–18
DBTFirst RCT 1991 · program-level adoption across systems
IFSFirst RCT 2013 · 2 RCTs total
IFS today is roughly where EMDR stood in 1992 — a decade before its first guideline listing. EMDR closed that gap with trials. Every credible trial requires validated process and fidelity measurement.
The measurement gap
IFS has a therapist-fidelity checklist (the 17-item IFS Adherence Scale, 2014) and client self-report instruments. It has no validated observer-rated, session-level process measure of the constructs the model itself defines as its mechanism of change — unblending, Self-energy, the Self-to-part relationship, progression through the 6 Fs, unburdening and its durability. No research group anywhere has published NLP- or transcript-based process measurement for IFS. The niche is empty, and the field's own institutions have said so.
The precedent

Fields move when measurement arrives.

Motivational Interviewing did not earn its trial base by charisma. The MITI coding system became the de facto standard for MI trial integrity, and the modality's evidence co-evolved with its measurement. CBT's competence measures played the same standardizing role. Fidelity reporting is now effectively required for a fundable psychotherapy trial.

And observational coding is the documented pain point: slow, expensive, dependent on extensively trained human raters — "impractical for routine mental health care," in the literature's own words. The last five years proved automation can close that gap: LLM- and ML-based fidelity scoring, benchmarked against human consensus, published in venues from Implementation Science to JAMA Psychiatry.

IFS never got its MITI. We intend to build it — automated from day one, and validated to the standards of the process-research literature: human inter-rater reliability first, machine agreement second, outcome linkage last, in that order, with each claim gated on the evidence behind it.

What we bring

A data asset and an instrument the field has never had.

Parts Companion records, transcribes, and structures real IFS sessions as its core product. Years of that work have produced what we believe is the largest repository of de-identified and research-released IFS therapy session recordings anywhere — longitudinal, naturalistic arcs of real clients over months of work, not lab vignettes. On top of it, we have built the apparatus a measurement program needs.

A longitudinal corpus
Full courses of therapy in sequence — the substrate for trajectory research no cross-sectional dataset can support. Research use restricted to consented or Safe-Harbor de-identified sessions, always.
A three-track instrument
An event schema descending from the Experiencing Scale, APES, and NEPCS traditions: experiential movement, IFS process, and clinical-functional change — with required disconfirmers and durability tracking.
An annotation portal
Working infrastructure where IFS clinicians review transcripts turn-by-turn, verify or reject machine-detected events, and produce the blind human ground truth that reliability studies require.
An automated pipeline
LLM detectors covering the full adherence-scale construct space and beyond — capable of 100% session coverage at a per-session cost that makes fidelity measurement routine instead of prohibitive.
The research program

Four studies, in the order the literature demands.

Our protocol follows the fifty-year playbook of psychotherapy process research — and the published sequencing of the groups that did this credibly for other modalities: measurement validity first, outcomes later. Nothing is claimed ahead of its study.

01Human reliability
Can trained IFS clinicians code these process markers consistently?
Independent double-coding of a stratified session set by IFS-trained clinicians, trained to criterion against a shared codebook, with pre-specified agreement thresholds (κ ≥ .60, ICC ≥ .60).
02AI-rater validation
Does the machine measure what clinicians measure?
LLM extraction benchmarked against blind human consensus on a sealed, held-out session set never used in development — the Atkins/Imel and ieso design, applied to IFS for the first time.
03Longitudinal trajectories
Do reliably-coded markers show systematic change across a course of therapy?
Within-client trajectories across long naturalistic arcs: client-initiated unblending, somatic access, enacted between-session agency — with pre-registered directional predictions.
04Criterion validity
Does measured process track independent clinical outcomes?
Gated, by design, on independent outcome anchors — the study we intend to run with institutional partners whose trials carry outcome measurement we cannot and should not produce alone.
Where we are today — stated precisely
We built an extraction instrument, hardened it across three internal validation rounds, and established convergent consistency on a multi-client pilot spanning more than a hundred sessions. We are now beginning formal reliability and validity studies. We do not claim a validated measure of therapeutic progress, and we will not until the studies that could support that claim are done.

We publish what didn't work

Our pilot's most instructive results were negative. Dramatic in-session "breakthroughs" did not corroborate against clients' own later reports — while quiet signals did: enacted between-session behavior change, growing somatic access, sustained experiential depth. Raw "Self-energy" estimates inflate for IFS-fluent clients. We regard findings like these as assets, and they will appear in anything we publish.

Fluent clients can't fool the good signals

The hardest problem in IFS process measurement is that every marker is forgeable in a single session by a fluent, Self-like managerial part. Our instrument is built around that problem: claims are coded as claims, confirmation happens only against the arc, and our pilot's most promising session-level measure discriminated genuine deepening from fluent performance in both low- and high-fluency test clients.

How we hold the data

Therapy transcripts are among the most sensitive data that exist. We act like it.

The research program runs under a formal protocol whose ethics section is binding on every study. These are its load-bearing commitments:

Consent or de-identification — nothing else
Research data enters through exactly two lawful paths: an affirmative client-level research release, or HIPAA Safe Harbor de-identification (45 CFR §164.514) with pseudonymous longitudinal linkage and no retained re-link key. A practitioner's toggle is never treated as a client's authorization.
No positive bias, by construction
Disconfirming and null events are required codes. "Nothing moved" is a first-class observation. A client claiming progress is coded as a claim, not a demonstration, unless the transcript corroborates it.
Insider exclusion
Arcs belonging to project insiders are excluded from all validation samples, and the dual role of company personnel as investigators and data subjects is disclosed, not buried.
Pre-registration and held-out sets
Confirmatory hypotheses are pre-registered on OSF before analysis. Held-out evaluation sets are sealed before human consensus coding begins. Post-hoc detector changes void the set.
Conflict-of-interest mitigations, stated up front
We are a commercial entity evaluating measurements our own product produces. Mitigations: blind human ground truth, independent raters, pre-registration, published negative findings, and a clinical co-lead with claims sign-off.
Safety before annotation
Every session is screened before clinical annotators see it; a ratified escalation policy for high-acuity historical content is a precondition of the reliability studies, not an afterthought.
For research institutions

We cannot — and should not — do this alone.

Analytical validation is something we can produce; clinical validation is something only the field's trial-running institutions can anchor. The organizations that fund, run, and govern IFS research hold exactly what this program needs, and we hold exactly what their trials have lacked. That is a trade worth a conversation.

Process measurement for outcome-anchored trials
Trials in the IFS line currently lack a session-level process measure. Our pipeline can code recorded sessions turn-by-turn, enabling the mediation and mechanism analyses the published trials could not run — while their outcome anchors supply the criterion validity no one can produce alone.
Co-development and co-publication
The measurement paper this field needs should not be written by a company alone. We bring the codebook, the corpus, and the annotation infrastructure; partners bring clinical-scientific governance and methodological authority.
Hosted sub-studies
A research fellow or graduate researcher can run a bounded reliability or validity sub-study on our annotation portal and corpus — low commitment, real data, publishable scope.
A validation dossier built to the V3 standard
Verification, analytical validation, clinical validation (Goldsack et al., npj Digital Medicine 2020) — structured so any trial's PI and IRB can evaluate fitness-for-purpose without taking our word for anything.
Go deeper

The full papers behind this page.

Everything summarized above is developed at length in a set of internal research documents. We share them with research institutions, prospective collaborators, and serious colleagues — available upon request.

If you are working on the IFS evidence base, we would like to meet you.

Researchers, research funders, training institutions, trial teams: the whitepapers, the codebook, and an honest account of where the program stands are yours for the asking.

or write to us directly: hello@partscompanion.org