Model Benchmarks

5 models tested across reasoning, hallucination resistance, coding, and long-context recall.

Last updated: April 11, 2026 · Hardware: Apple M4 Pro (24GB unified memory, 273 GB/s bandwidth)

Model	Score	Long Context	Avg Speed	Size	Type	Cost
Gemma3:27B	9/10	4/6	93s	17GB	Local	Free
GPT-4o-mini	9/10	4/6	7s	Cloud	Cloud API	Paid
Gemini 2.5 Flash	9/10	5/6	12s	Cloud	Cloud API	Free tier
Gemma4:e4b	8/10	6/6	20s	9.6GB	Local	Free
Gemma4:26B (API)	9/10	5/6	23s	Cloud	Cloud API	Free tier

Accuracy Score (out of 10)

Reasoning (3 tests)

Hallucination (4 tests)

Coding (3 tests)

Long Context (6 tests)