Loading analysis data...
This may take a few moments for large datasets
Total Queries
132
Total Responses
792
Products Found
-
Price Violations
38
Google AI Mode
ChatGPT
Category Distribution
Source Domains
Deep Dive Analysis Available
This interactive dashboard shows just the surface. We've published a comprehensive analysis of why AI product recommendations keep changing, plus made all our data and methodology available as an open dataset.
Semantic Consistency Analysis
What This Measures
Semantic consistency analyzes how AI systems respond to different phrasings of the same question within a query set (e.g., "best laptop" vs "which laptop should I buy"). The system checks if the most common product recommendation appears in at least 70% of responses across all phrasings of the same intent.
A query set is marked as "consistent" if the same product appears as the top recommendation in ≥70% of responses across different phrasings.
Overall Results (Both Models Combined)
Results by Model
Google AI Mode
ChatGPT
Cross-Model Agreement Analysis
What This Measures
Cross-model agreement checks if Google AI Mode and ChatGPT recommend the same top product for identical queries. Agreement means both models suggest the same product as their #1 recommendation. This analysis uses case-insensitive product name matching to determine agreement.
Within-Model Consistency Analysis
What This Measures
Within-model consistency analyzes how often the same model gives identical product recommendations when asked the exact same question multiple times. It uses Jaccard similarity to measure product overlap between responses:
- 0% overlap = Completely different products in each response
- <50% overlap = Some products match but majority are different
- ≥50% overlap = Majority of products are the same across responses
- 100% overlap = Exact same products in the same order
Product names are compared case-insensitively to ensure "iPhone 15" and "iphone 15" are treated as the same product.
Google AI Mode
Average Consistency: -
Jaccard similarity score
Top Product Changes: -
Different #1 product between runs
Product Mix Changes: -
Different top 5 products between runs
Product Count Varies: -
Different number of products recommended
ChatGPT
Average Consistency: -
Jaccard similarity score
Top Product Changes: -
Different #1 product between runs
Product Mix Changes: -
Different top 5 products between runs
Product Count Varies: -
Different number of products recommended
Response Source Breakdown
Google AI Mode
Google AI Mode always provides external sources
ChatGPT
ChatGPT relies heavily on training data for recommendations
Consistency by Source Type
Google AI Mode
ChatGPT
Key Findings
Google Shows Better Within-Model Consistency
Google AI Mode shows significantly higher average consistency compared to ChatGPT. Google also has a much higher percentage of repeated queries with high product overlap (≥50%). Specific percentages are loaded dynamically from the analysis data.
Models Agree More Than ChatGPT Agrees With Itself
Cross-model agreement is actually higher than ChatGPT's self-consistency, suggesting ChatGPT's recommendations are highly volatile between runs. Specific percentages are loaded dynamically from the analysis data.
Both Models Change Top Products Frequently
The #1 recommended product changes frequently for both models, with ChatGPT showing much higher volatility than Google when the same query is repeated. Over 90% of queries see changes in their top 5 products.
Semantic Understanding is Poor
A small percentage of query sets show semantic consistency. Different phrasings of the same question (e.g., "best laptop" vs "recommend a laptop") typically yield different products.
Inconsistency Examples
Query Set Jaccard Consistency Analysis
What This Measures
This analysis uses Jaccard similarity to measure how consistent each model is when responding to different phrasings of the same question within a query set. This is a true superset of the within-model consistency analysis - it includes all the same individual response comparisons plus additional comparisons across different phrasings.
Methodology: For each query set, we collect ALL individual responses from ALL queries (different phrasings), then calculate pairwise Jaccard similarities between every pair of responses. This includes both within-query comparisons (identical text, multiple runs) and cross-phrasing comparisons.
Superset Relationship: Individual zero-overlap rates are higher here than in within-model analysis because cross-phrasing comparisons introduce additional opportunities for complete disagreement. The actual percentages are loaded dynamically from the analysis data.
Google AI Mode - Query Set Consistency
Average Jaccard Similarity: -
Across different phrasings of same intent
Query Sets Analyzed: -
Sets with multiple phrasings
ChatGPT - Query Set Consistency
Average Jaccard Similarity: -
Across different phrasings of same intent
Query Sets Analyzed: -
Sets with multiple phrasings
Key Findings
Google Shows Better Semantic Understanding
Google AI Mode achieves higher average Jaccard similarity across different phrasings compared to ChatGPT. Google also shows a higher percentage of query sets with high consistency (≥50% overlap). Specific percentages are loaded dynamically from the analysis data.
Superset Analysis Confirms Mathematical Relationship
The individual zero-overlap rates correctly show this analysis as a true superset, with higher rates than within-model analysis for both models. Cross-phrasing comparisons introduce additional opportunities for complete disagreement, revealing that semantic variations significantly impact recommendation consistency.