Ekanta · TeleTrex

Free Download · macOS 12+

Ekanta

Your conversations stay on your machine. Full stop.

A Mac app that runs large language models entirely on your GPU — no cloud, no API keys, no data ever leaving your device. Apple Silicon and Intel Macs with powerful AMD GPUs are both fully accelerated.

Apple Silicon arm64 · 90 MB Intel Mac AMD GPU x64 · 97 MB

Requires macOS 12 Monterey or later · Models downloaded on first use and cached locally

100% Local GPU-Accelerated 13 Models OpenAI-Compatible API No Subscription

Private by design

Most AI tools route your conversations through remote servers — your questions, your context, your data. Ekanta is different. Every model runs entirely on your Mac using WebGPU acceleration. Nothing is transmitted. There is no server. Your conversations exist only on your device.

This matters for legal teams reviewing contracts, founders stress-testing strategy, clinicians thinking through cases, and anyone who should not be pasting sensitive material into a chat box connected to the internet.

Powerful models, no subscription

Ekanta ships with a curated catalogue of open-weight models optimised for Mac hardware — from fast half-billion-parameter models for quick tasks to 8B reasoning models that rival much larger cloud offerings. Models are downloaded once and cached. After that, they work offline, indefinitely, with no ongoing cost.

Llama 3.1 & 3.2 (Meta) — balanced quality and speed
DeepSeek-R1 distilled — chain-of-thought reasoning at 1.5B, 7B, and 8B
Qwen 2.5 (Alibaba) — strong coding and multilingual support
Mistral 7B — a reliable general-purpose benchmark
Gemma 2 (Google) — efficient instruction-following at 2B and 9B

A local API server, built in

Ekanta runs an OpenAI-compatible API server on your machine the moment it opens. Once a model is loaded, any tool that speaks to OpenAI — coding assistants, automation scripts, Claude Code, custom agents — can be pointed at Ekanta instead. No proxy, no configuration, no cloud bill.

The server also exposes Anthropic's /v1/messages format, so you can point Claude Code directly at Ekanta with a single environment variable and use local models as your coding assistant backend.

Built for the Mac

Ekanta uses WebGPU with Metal to accelerate inference on Apple Silicon and on Intel Macs with dedicated AMD Radeon graphics — the powerful discrete GPUs found in larger MacBook Pros and iMacs give a significant speed boost over CPU-only inference. The interface respects the macOS design language: native title bar, full keyboard navigation, and conversation history that persists across sessions.

Best for

Professionals who handle sensitive information and cannot use cloud AI services.
Developers who want a local inference backend for tools, agents, and coding assistants.
Teams subject to data-residency or compliance requirements that rule out SaaS AI.
Anyone who wants capable AI without a recurring subscription.

What you get

A fast, private chat interface with conversation history and system-prompt control.
GPU-accelerated inference for thirteen open-weight models out of the box.
An always-on local API server compatible with OpenAI and Anthropic client libraries.
Zero ongoing cost after download. No account required.

Download for Apple Silicon Download for Intel-based Mac with GPU