// Methodology

How We Rank

Creator: Calorie Rankings
Published: 2026-05-17T00:00:00.000Z

Every Calorie Rankings review and ranking is scored on the same 100-point rubric. The protocol is published below in enough detail that an outside party could replicate it.

The 100-point rubric

Scoring rubric
Criterion	Weight	What we measure
Accuracy & Database	25%	Per-entry verification, coverage, freshness, noise resilience
Logging Ease	20%	Median time-to-log across a 20-task battery; friction; recall efficiency
AI Photo Recognition	15%	Top-1/top-3 identification, portion MAPE, plate segmentation
Macro & Goal Tracking	15%	Macro depth, target flexibility, adaptive coaching algorithms
Insights & Reports	10%	Trend analysis, exportability, biometric/lab data integration
Value & Price	10%	Real 12-month cost vs feature delivery; free-tier usefulness
Privacy & Transparency	5%	Data handling, disclosure clarity, cancellation friction

How we measure accuracy

The accuracy criterion (25% of the 100-point total) is anchored to Mean Absolute Percentage Error (MAPE) against weighed reference meals. Each reference meal is built from USDA FoodData Central composition values, with every ingredient weighed on a calibrated kitchen scale (0.1g precision). We compute MAPE of each app's predicted kcal vs the reference value across the battery.

Scoring anchor: accuracy_points = clamp(100 − MAPE × 4, 0, 100). A 5% MAPE earns 80 points; 15% MAPE earns 40; 25%+ earns zero. The slope was chosen so an app at the boundary of clinical usefulness (~5% MAPE per Schoeller 1995) gets a strong but not perfect score.

Sample size, equipment model numbers, and the full reference-meal list will be published as a downloadable CSV alongside the first batch of benchmark reviews. The scoring code will be on GitHub.

How we score database quality

Database quality is measured on three sub-dimensions: coverage (a sampled 50-item probe across single ingredients, composed plates, and regional dishes), verification (4-tier grading: USDA / manufacturer label / verified user / unverified user), and noise resilience (how often a common-foods search surfaces a usable result in the top three hits).

How we score logging ease

Logging Ease (20% weight) is measured as the median time-to-log across a standardized 20-task battery covering five input modes:

Barcode scan → logged: target ≤ 10 seconds
Search common food → logged: target ≤ 20 seconds
Photo AI → logged (where supported): target ≤ 15 seconds
Custom food entry (first-time): target ≤ 60 seconds
Re-log a recent meal: target ≤ 5 seconds

An entry logged incorrectly (wrong food, wrong portion) counts as infinite time for that task — speed without accuracy doesn't earn points.

How we score AI photo recognition

For each AI-photo-capable app we run a 30-plate photo battery across three lighting conditions, three angles, and three plate sizes. Sub-scoring:

Top-1 identification correctness (40 of 100 AI-subscore points)
Top-3 identification correctness (20)
Portion-size MAPE (30)
Plate segmentation accuracy on multi-item plates (10)

How we score macro & goal tracking

Macro tracking (15% weight) covers four sub-dimensions: macro display depth (calories, P/C/F, net carbs, fiber as first-class metrics), target-setting flexibility (custom per-macro targets, time-windowed targets), adaptive coaching algorithms (TDEE estimation, weekly target adjustment), and recipe builder quality.

How we score value & price

Value (10% weight) is computed as feature-density per dollar of annual cost. Free-tier usefulness counts (a useful free tier raises the value sub-score). Aggressive trial-conversion pricing reduces the sub-score.

How we score privacy & transparency

Privacy (5% weight) is graded on data handling disclosure clarity, retention policy transparency, ease of data export and deletion, cancellation friction, and whether the product's monetization model creates conflicts of interest with user advice quality.

Test cadence

Top-tier apps are re-tested quarterly. Mid-tier apps are re-tested semi-annually. A vendor release that changes core methodology, database source, or photo-AI model triggers a 30-day re-test window.

Quality control

Until we publish named contributor bios, all writing and scoring is done by the editorial group and reviewed against the test data before publication. Substantive corrections are logged with date and reason (corrections policy).

How we use AI

We use AI tools for research summarization and copy editing. AI does not write reviews, does not generate scores, and is never the source of a factual claim. Full disclosure: how we use AI.

Why we don't take affiliate money

We don't maintain affiliate accounts with any of the apps we cover. Our reasoning is documented in our no-affiliate disclosure.