AI Product Recommendation Analysis

Loading analysis data...

This may take a few moments for large datasets

Total Queries

132

Total Responses

792

Products Found

Price Violations

Google AI Mode

Response Rate 99.7%

With Sources 99.7%

Consistency Rate 64.4%

ChatGPT

Response Rate 97.5%

With Sources 35.9%

From Training Data 64.1%

Consistency Rate 8.3%

Category Distribution

Source Domains

Deep Dive Analysis Available

This interactive dashboard shows just the surface. We've published a comprehensive analysis of why AI product recommendations keep changing, plus made all our data and methodology available as an open dataset.

Read Full Analysis Access Open Dataset

Semantic Consistency Analysis

What This Measures

Semantic consistency analyzes how AI systems respond to different phrasings of the same question within a query set (e.g., "best laptop" vs "which laptop should I buy"). The system checks if the most common product recommendation appears in at least 70% of responses across all phrasings of the same intent.

A query set is marked as "consistent" if the same product appears as the top recommendation in ≥70% of responses across different phrasings.

Overall Results (Both Models Combined)

Consistency Rate

Query sets with ≥70% agreement

Consistent Sets

out of 33 total

Inconsistent Sets

Different products for same intent

Results by Model

Google AI Mode

Consistency Rate

Consistent Sets

- total query sets analyzed

ChatGPT

Consistency Rate

Consistent Sets

- total query sets analyzed

Cross-Model Agreement Analysis

What This Measures

Cross-model agreement checks if Google AI Mode and ChatGPT recommend the same top product for identical queries. Agreement means both models suggest the same product as their #1 recommendation. This analysis uses case-insensitive product name matching to determine agreement.

Agreement Rate

Same top product

Queries with Agreement

out of 131 comparable

Queries with Disagreement

Different top products

Within-Model Consistency Analysis

What This Measures

Within-model consistency analyzes how often the same model gives identical product recommendations when asked the exact same question multiple times. It uses Jaccard similarity to measure product overlap between responses:

0% overlap = Completely different products in each response
<50% overlap = Some products match but majority are different
≥50% overlap = Majority of products are the same across responses
100% overlap = Exact same products in the same order

Product names are compared case-insensitively to ensure "iPhone 15" and "iphone 15" are treated as the same product.

Google AI Mode

Complete Inconsistency

(0% product overlap)

Partial Consistency

(<50% overlap)

High Consistency

(≥50% overlap)

Consistency Distribution 100%

Average Consistency: -

Jaccard similarity score

Top Product Changes: -

Different #1 product between runs

Product Mix Changes: -

Different top 5 products between runs

Product Count Varies: -

Different number of products recommended

ChatGPT

Complete Inconsistency

(0% product overlap)

Partial Consistency

(<50% overlap)

High Consistency

(≥50% overlap)

Consistency Distribution 100%

Average Consistency: -

Jaccard similarity score

Top Product Changes: -

Different #1 product between runs

Product Mix Changes: -

Different top 5 products between runs

Product Count Varies: -

Different number of products recommended

Response Source Breakdown

Google AI Mode

With External Sources

100%

From Training Data

Google AI Mode always provides external sources

ChatGPT

With External Sources

35.9%

From Training Data

64.1%

ChatGPT relies heavily on training data for recommendations

Consistency by Source Type

Google AI Mode

ChatGPT

Key Findings

Google Shows Better Within-Model Consistency

Google AI Mode shows significantly higher average consistency compared to ChatGPT. Google also has a much higher percentage of repeated queries with high product overlap (≥50%). Specific percentages are loaded dynamically from the analysis data.

Models Agree More Than ChatGPT Agrees With Itself

Cross-model agreement is actually higher than ChatGPT's self-consistency, suggesting ChatGPT's recommendations are highly volatile between runs. Specific percentages are loaded dynamically from the analysis data.

Both Models Change Top Products Frequently

The #1 recommended product changes frequently for both models, with ChatGPT showing much higher volatility than Google when the same query is repeated. Over 90% of queries see changes in their top 5 products.

Semantic Understanding is Poor

A small percentage of query sets show semantic consistency. Different phrasings of the same question (e.g., "best laptop" vs "recommend a laptop") typically yield different products.

Inconsistency Examples

Query Set Jaccard Consistency Analysis

What This Measures

This analysis uses Jaccard similarity to measure how consistent each model is when responding to different phrasings of the same question within a query set. This is a true superset of the within-model consistency analysis - it includes all the same individual response comparisons plus additional comparisons across different phrasings.

Methodology: For each query set, we collect ALL individual responses from ALL queries (different phrasings), then calculate pairwise Jaccard similarities between every pair of responses. This includes both within-query comparisons (identical text, multiple runs) and cross-phrasing comparisons.

Superset Relationship: Individual zero-overlap rates are higher here than in within-model analysis because cross-phrasing comparisons introduce additional opportunities for complete disagreement. The actual percentages are loaded dynamically from the analysis data.

Google AI Mode - Query Set Consistency

Complete Inconsistency

(Response pairs with 0% overlap)

Partial Consistency

(<50% overlap)

High Consistency

(≥50% overlap)

Query Set Consistency Distribution 100%

Average Jaccard Similarity: -

Across different phrasings of same intent

Query Sets Analyzed: -

Sets with multiple phrasings

ChatGPT - Query Set Consistency

Complete Inconsistency

(Response pairs with 0% overlap)

Partial Consistency

(<50% overlap)

High Consistency

(≥50% overlap)

Query Set Consistency Distribution 100%

Average Jaccard Similarity: -

Across different phrasings of same intent

Query Sets Analyzed: -

Sets with multiple phrasings

Key Findings

Google Shows Better Semantic Understanding

Google AI Mode achieves higher average Jaccard similarity across different phrasings compared to ChatGPT. Google also shows a higher percentage of query sets with high consistency (≥50% overlap). Specific percentages are loaded dynamically from the analysis data.

Superset Analysis Confirms Mathematical Relationship

The individual zero-overlap rates correctly show this analysis as a true superset, with higher rates than within-model analysis for both models. Cross-phrasing comparisons introduce additional opportunities for complete disagreement, revealing that semantic variations significantly impact recommendation consistency.

Citations & Links Analysis

Google AI Mode

Responses with Links -

Unique Domains -

Avg Links/Response -

ChatGPT

Responses with Links -

Unique Domains -

Avg Links/Response -

Comparison

Link Coverage Diff -

Domain Overlap -

Most Cited Overall -

Top Cited Domains

Google AI Mode Top Sources

ChatGPT Top Sources

Link Distribution by Query Type

Query Explorer

Search Queries

Search Products