How accurate is JourneyOS?

At segment level, correlation reaches r=0.98 against real funnel data. At individual-customer prediction, accuracy drops to r around 0.20. JourneyOS is a segment-level diagnostic, not an individual predictor.

What data sources does JourneyOS use?

Customer reviews (Reddit, app stores), support tickets, NPS verbatims, and funnel event data from the client. No third-party data enrichment.

Can JourneyOS work for verticals beyond fashion e-commerce?

Yes. JourneyOS adapts its behavioral layers per vertical: fashion uses fit-confidence triggers, fintech uses loss-aversion triggers, consumer health uses health-anxiety triggers. The core method is the same.

Does JourneyOS use real customer data or fully synthetic customers?

JourneyOS builds customer types from real customer data (real reviews, real support tickets, real NPS responses). Simulation runs those real-data-derived customer types through the client’s specific funnel. No fully-synthetic customers.

What if a customer segment has too few data points?

JourneyOS abstains when a segment has fewer than 50 VoC artifacts, flagging the customer type as low-confidence rather than extrapolating. Honest uncertainty over false precision.

Contents

Part One: the method

What the method reads
From signals to simulation
The same read across verticals

Part Two: one applied case

A 40 percent drop-off, read four ways

Part Three: references and close

Full method references
Data handling principles
Book a walkthrough

How JourneyOS reads your funnel differently.

Your analytics tells you the step where users leave. It does not tell you which kind of customer left, or what they were trying to do. JourneyOS reads the same funnel data and answers those two questions, for any consumer-facing business with a funnel and customer voice.

What the method reads, and the honest limit of each signal.

Four kinds of signal carry most of the story on a consumer funnel. None is complete alone. The method reads them together and weights each against its known bias.

A · DID

What customers did. Purchase events, enrollment, cart behavior, return, cancellation. Hard data and low bias, but the absence of action hides intent. A user who left is not the same as a user who chose to leave. The signal looks different across verticals and reads the same underneath. In fashion e-commerce, a return can tell a story about fit, style confidence, or buyer regret. In subscription fintech, a plan downgrade or churn event is the same class of signal; it indicates a user who stayed but scaled back their commitment. In consumer health, an unfinished program or opt-out carries the same shape. The method reads the action, not the industry label.

B · SAID

What customers said. Reviews, surveys, support tickets, NPS responses. Direct voice but narrow: roughly 4 to 6 percent of buyers write reviews, and they lean toward extremes. Support tickets carry a different skew; users who ask for help are users who expected the product to work. Surveys over-sample already-engaged users. The method tags each source by type, not by sentiment, and reads sentiment against that source type. A five-star review from a self-selected vocal user carries different weight than a support ticket from a user who invested effort to ask for help. Flattening both into “customer feedback” hides the different truths each one can tell.

C · LOOKED

What customers looked at. Product detail page time, search terms, navigation paths, session sequences. Reveals curiosity and comparison, but not meaning. A long dwell can be deep consideration or confusion. The method reads dwell in context of other signals, never alone. Search queries are especially readable because they are verbalized intent; a user who typed a competitor’s name into your search bar has told you something a pure session path cannot. Navigation patterns are noisier; the method looks for repeated patterns across cohorts rather than drawing conclusions from single-session trails.

D · STUCK

Where customers got stuck. Session depth, re-entries, field-level friction, form abandonment. Shows where the flow breaks, but cannot separate stress from patience. The method looks for repeated friction across similar users, not one-off stuck sessions. A single user abandoning at a field might be interrupted; twenty users abandoning at the same field is a signal. Re-entries carry particular weight: a user who bounces off a page and returns within minutes tells a different story than one who returns a week later, and different again from one who never returns at all.

From signals to simulation.

Reading the signals. The method does not aggregate the four signal types into one number. Each signal contributes evidence weighted by its known bias; the output is a confidence-scored distribution over customer segments, not a single winner. When the signals agree, the method reports high confidence. When they disagree, the method names the tension rather than hiding it behind an average. A disagreement is often the most useful output; it means there are at least two distinct stories happening in your funnel that were previously invisible.

Building the customer-type ground. Validated behavioral research instruments calibrate the underlying customer-type distribution. SCOPE^[1] provides the socially-grounded customer-type framework and the finding that demographics alone explain only 1.5 percent of behavioral variance. The 15-facet structure means that a segment is not reduced to a single dimension; it carries Trust, Compliance, and Altruism as separable facets. The calibration is public and auditable; the method cites its ground, and a reader of this page can follow each paper link to the source.^[2]

Abstaining on thin evidence. When VoC coverage for a segment falls below a set threshold, the method abstains rather than extrapolating. It reports lower confidence and names the thin segment, rather than silently filling the gap with majority-segment patterns. PersonaCite^[3] frames this as a protocol: no evidence, no inference. The method flags thin segments as “low confidence”, reports what evidence is available, and recommends where the client can add coverage (longer VoC time windows, additional language segments, specific geographies) before the method can report reliably.

Confidence scoring and output shape. Every inference the method returns carries a confidence score and a named source of that confidence. A segment-level recovery estimate looks like this: “Type A customers, estimated recovery of 22 to 43 percent, confidence 0.78, based on 312 VoC artifacts and session data from 4,800 funnel instances.” A reader sees two numbers and knows how much to trust them. Low-confidence outputs are not hidden; they are foregrounded with their caveats. The method is designed so that a head of growth reviewing the output can argue with it on specifics, rather than accepting or rejecting it as a black box.

The method reads more like forensic reconstruction than live observation. It rebuilds who was there from the traces they left, names where the traces are too thin to read with confidence, and gives you back a segmented population you can interrogate.

The same read across verticals.

The underlying read is vertical-agnostic. A fashion e-commerce funnel gives the method purchases, reviews, product detail page behavior, and cart friction. A subscription fintech gives it enrollment events, onboarding survey responses, plan-comparison browsing, and field-level abandonment. A consumer health or patient-education onboarding flow gives it sign-up events, intake responses, article dwell patterns, and form re-entry. The signal categories differ on the surface; the method reads them the same way, and the same limitations above apply to each.

Applied case · Anonymized Indian fashion marketplace

A 40% drop-off at size selection, read four ways.

The marketplace’s analytics flagged a 40 percent drop-off at size selection. The operating hypothesis was a size-chart UX issue. Here is what the method surfaced when the founding team asked it to stress-test that hypothesis against VoC data and funnel events.

Survival rate per journey step, per customer type in this funnel. Type E abstained because VoC coverage fell below the 50-artifact threshold; PersonaCite protocol applies^[3].

Percent of cohort still in funnel by customer type by funnel stage. Type E abstained because voice-of-customer coverage fell below the 50-artifact threshold; the method declined to simulate that segment. Five customer types across fifteen funnel stages.
Type	Landing	Category	Product detail	Size selection	Add to cart	Checkout start	Shipping choice	Payment method	Review	Submit	Confirmation	Fit review	Return initiate	Return complete	Re-engagement
Type A The Fit-Anxious First-Timer	92%	87%	80%	72%	41%	38%	35%	33%	30%	28%	26%	24%	22%	20%	19%
Type B The Price-Watcher	95%	92%	88%	84%	70%	64%	58%	54%	50%	47%	44%	42%	40%	38%	36%
Type C The Refund-Reader	94%	90%	84%	78%	60%	56%	52%	49%	47%	45%	43%	41%	39%	37%	35%
Type D The Payment-Drop-Off	96%	93%	89%	85%	79%	73%	68%	62%	56%	50%	44%	40%	36%	33%	30%
Type E: abstained Insufficient VoC coverage; method declined to simulate.	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained	abstained

Featured customer type

The Fit-Anxious First-Timer

The Fit-Anxious First-Timer reaches size selection after browsing the category page twice. She re-opens the size chart three times, switches between cm and inch units, and then opens the competitor’s site in a new tab. From a real review artifact in her own words: “I was not sure my size was the same across brands, and the chart did not say.” She returns twice that week without completing checkout. The funnel step where she disappears is size selection. The behavior pattern is recheck followed by exit, not exit followed by regret. The method reads that difference, and the recovery estimate reflects what specifically a fit-confidence intervention would shift, not a generic UX hypothesis applied across the cohort.

Fit uncertainty reads as risk, and risk reads as the decision to come back tomorrow. She rarely does.

Intervention estimate

22-43% Recovery potential confidence 0.78 · based on 312 VoC artifacts, 1 funnel, 20 simulation runs per customer type

Recovery potential is probability-weighted; LTV is not modeled.

The other three types in this funnel surfaced with distinct triggers and recovery ranges: one with a price-sensitivity pattern, one with a trust-on-refunds pattern, one with a pre-purchase search-intent mismatch. Full per-type breakdowns, the complete emotional trajectory at each funnel step, and the deeper behavioral layers are walked through in a booked walkthrough rather than exhausted on this page.

Emotional trajectory for Type A across the funnel. Three axes (valence, arousal, trust) rendered as small multiples; P10 to P90 fan reflects 20-run variance. Narrow bands signal high-confidence trajectory; wide bands signal method-acknowledged uncertainty at that step.

Type A median (P50) emotional trajectory across fifteen funnel stages on three axes: valence, arousal, and trust. The P10 and P90 confidence band appears in the SVG fan chart above; this table surfaces the median values only. Wide bands in the SVG signal method-acknowledged uncertainty at that step.
Axis (P50)	Landing	Category	Product detail	Size selection	Add to cart	Checkout start	Shipping choice	Payment method	Review	Submit	Confirmation	Fit review	Return initiate	Return complete	Re-engagement
Valence	+0.40	+0.50	+0.40	-0.10	-0.30	-0.20	-0.10	-0.20	-0.30	-0.20	+0.10	-0.10	-0.20	-0.10	0.00
Arousal	+0.30	+0.40	+0.50	+0.70	+0.80	+0.60	+0.50	+0.50	+0.60	+0.60	+0.40	+0.50	+0.40	+0.30	+0.30
Trust	+0.60	+0.60	+0.50	+0.30	+0.30	+0.30	+0.40	+0.40	+0.40	+0.40	+0.50	+0.40	+0.30	+0.30	+0.40

Full method references

The method composes these published findings.

8 peer-reviewed papers across customer-type simulation, abstention protocols, and reliability assessment

[1] SCOPE (2026). Socially-Grounded Persona Framework.: Used for personality facet calibration and the 1.5 percent demographic variance finding that prevents demographics-first customer-type modeling. Without SCOPE, the method would over-weight demographic signals that empirically carry little behavioral information.
[2] DeepPersona (2025). Generative Engine for Scaling Deep Synthetic Personas.: Used to set the attribute count ceiling and the hierarchical taxonomy structure. DeepPersona shows that customer-type accuracy inverts past 250 to 300 attributes, which is why the method caps at a deliberate count rather than expanding without limit.
[3] PersonaCite (2026). VoC-Grounded Interviewable Agentic Synthetic AI Personas.: Used for the abstention protocol when evidence is thin. PersonaCite defines the no-evidence-no-inference rule that the method follows for under-represented segments.
[4] PersonaFuse (2025). Personality Activation-Driven Framework.: Used for activating personality dimensions in simulation runs. PersonaFuse allows the method to switch between personality expressions within the same session simulation, rather than fixing one static customer-type model per segment.
[5] Park et al. (2024). Generative Agent Simulations of 1,000 People.: Used for scaled customer-type simulation design. Park et al. established the empirical baseline for simulation scale and validated that large synthetic populations can approximate observed behavior.
[6] NN/g Studies (2025). Three Studies on Digital Twins and Synthetic Users.: Used for the variance compression benchmark and the individual versus population correlation finding. NN/g measured that 93.9 percent of outcome measures compressed in variance when synthetic customer types replaced real users, and that individual correlation (r = 0.105 to 0.20) is far weaker than population correlation (r = 0.98).
[7] Reliability Assessment (2026). Assessing Reliability of Persona-Conditioned LLMs.: Used for the 27x subgroup accuracy gap and segment-level abstention thresholds. Reliability Assessment provides the evidence for why customer-type conditioning can decrease accuracy on under-represented segments if applied naively.
[8] Persona Generators (2026). Generating Diverse Synthetic Personas at Scale.: Used for the customer-type diversity sampling approach. Persona Generators defines how to generate synthetic customer-type profiles that preserve population variance rather than collapsing toward the average.

Data handling

Customer-level data stays inside your first-party boundary. JourneyOS does not request, purchase, or broker customer PII. The only third-party inputs are public behavioral research findings used for calibration. Details of the data handling contract and retention rules are on the security and legal pages linked below.

Inputs: Reviews, NPS, support tickets, funnel events you already collect.
Storage: Processed inside the client environment when possible; no cross-client aggregation.
Outputs: Confidence-scored recommendations and a named evidence source on every claim.

Security Data handling

Book a walkthrough.

JourneyOS runs the method on a sample funnel and steps through what it surfaced. You decide what happens next.

Book a walkthrough