On-device AI Archives

Best Free LLM Models for Mobile & Edge Devices in 2025

August 3, 2025 by TechsWill

Infographic showing lightweight LLM models running on mobile and edge devices, including LLaMA 3, Mistral, and on-device inference engines on Android and iOS.

Large language models are no longer stuck in the cloud. In 2025, you can run powerful, open-source LLMs directly on mobile devices and edge chips — with no internet connection or vendor lock-in.

This post lists the best free and open LLMs available for real-time, on-device use. Each model supports inference on consumer-grade Android phones, iPhones, Raspberry Pi-like edge chips, and even laptops with modest GPUs.

📦 What Makes a Good Edge LLM?

Size: ≤ 3B parameters is ideal for edge use
Speed: inference latency under 300ms preferred
Low memory usage: fits in < 6 GB RAM
Compatibility: runs on CoreML, ONNX, or GGUF formats
License: commercially friendly (Apache, MIT)

🔝 Top 10 Free LLMs for Mobile and Edge

1. Mistral 7B (Quantized)

Best mix of quality + size. GGUF-quantized versions like q4_K_M fit on modern Android with 6 GB RAM.

2. LLaMA 3 (8B, 4B)

Meta’s latest model. Quantized 4-bit versions run well on Apple Silicon with llama.cpp or CoreML.

3. Phi-2 (by Microsoft)

Compact 1.3B model tuned for reasoning. Excellent for chatbots and local summarizers on devices.

4. TinyLLaMA (1.1B)

Trained from scratch for mobile use. Works in < 2GB RAM and ideal for micro-agents.

5. Mistral Mini (2.7B, new)

Community-built variant of Mistral with aggressive quantization. < 300MB binary.

6. Gemma 2B (Google)

Fine-tuned model with fast decoding. Works with Gemini inference wrapper on Android.

7. Neural Chat (Intel 3B)

ONNX-optimized. Benchmarks well on NPU-equipped Android chips.

8. Falcon-RW 1.3B

Open license and fast decoding with llama.cpp backend.

9. Dolphin 2.2 (2B, uncensored)

Instruction-tuned for broad dialog tasks. Ideal for offline chatbots.

10. WizardCoder (1.5B)

Code generation LLM for local dev tools. Runs inside VS Code plugin with < 2GB RAM.

🧰 How to Run LLMs on Device

🟩 Android

Use llama.cpp-android or llama-rs JNI wrappers
Build AICore integration using Gemini Lite runner
Quantize to GGUF format with tools like llama.cpp or llamafile

🍎 iOS / macOS

Use CoreML conversion via `transformers-to-coreml` script
Run in background thread with DispatchQueue
Use CreateML or HuggingFace conversion pipelines

📊 Benchmark Snapshot (on-device)

Model	RAM Used	Avg Latency	Output Speed
Mistral 7B q4	5.7 GB	410ms	9.3 tok/sec
Phiphi-2	2.1 GB	120ms	17.1 tok/sec
TinyLLaMA	1.6 GB	89ms	21.2 tok/sec

🔐 Offline Use Cases

Medical apps (no server calls)
Educational apps in rural/offline regions
Travel planners on airplane mode
Secure enterprise tools with no external telemetry

📂 Recommended Tools

llama.cpp — C++ inference engine (Android, iOS, desktop)
transformers.js — Web-based LLM runner
GGUF Format — For quantized model sharing
lmdeploy — Model deployment CLI for edge

📚 Further Reading

Debugging AI Workflows: Tools and Techniques for Gemini & Apple Intelligence

August 2, 2025 by TechsWill

Illustration of developers debugging AI prompts for Gemini and Apple Intelligence, showing token stream logs, latency timelines, and live test panels in Android Studio and Xcode.

As LLMs like Google’s Gemini AI and Apple Intelligence become integrated into mainstream mobile apps, developers need more than good prompts — they need tools to debug how AI behaves in production.

This guide covers the best tools and techniques to debug, monitor, and optimize AI workflows inside Android and iOS apps. It includes how to trace prompt failures, monitor token usage, visualize memory, and use SDK-level diagnostics in Android Studio and Xcode.

📌 Why AI Debugging Is Different

LLM output is non-deterministic — you must debug for behavior, not just bugs
Latency varies with prompt size and model path (local vs cloud)
Prompts can fail silently unless you add structured logging

Traditional debuggers don’t cut it for AI apps. You need prompt-aware debugging tools.

🛠 Debugging Gemini AI (Android)

1. Gemini Debug Console (Android Studio Vulcan)

Tracks token usage for each prompt
Shows latency across LLM stages: input parse → generation → render
Logs assistant replies and scoring metadata


// Gemini Debug Log
Prompt: "Explain GraphQL to a 10-year-old"
Tokens: 47 input / 82 output
Latency: 205ms (on-device)
Session ID: 38f3-bc2a

2. PromptSession Logs


val session = PromptSession.create(context)
session.enableLogging(true)

Enables JSON export of prompts and responses for unit testing and monitoring.

3. Prompt Failure Types

Empty response: Token budget exceeded or vague prompt
Unstructured output: Format not enforced (missing JSON key)
Invalid fallback: Local model refused → cloud call blocked

🧪 Testing with Gemini

Use Promptfoo or Langfuse to run prompt tests
Generate snapshots for expected output
Set up replays in Gemini SDK for load testing

Sample Replay in Kotlin


val testPrompt = GeminiPrompt("Suggest 3 snacks for a road trip")
val result = promptTester.run(testPrompt).assertJsonContains("snacks")

🍎 Debugging Apple Intelligence (iOS/macOS)

1. Xcode AI Debug Panel

See input tokenization
Log latency and output modifiers
Monitor fallback to Private Cloud Compute

2. AIEditTask Testing


let task = AIEditTask(.summarize, input: text)
task.enableDebugLog()
let result = await AppleIntelligence.perform(task)

Outputs include token breakdown, latency, and Apple-provided scoring of response quality.

3. LiveContext Snapshot Viewer

Logs app state, selected input, clipboard text
Shows how Apple Intelligence builds context window
Validates whether your app is sending relevant context

✅ Common Debug Patterns

Problem: Model Hallucination

Fix: Use role instructions like “respond only with facts”
Validate: Add sample inputs with known outputs and assert equality

Problem: Prompt Fallback Triggered

Fix: Reduce token count or simplify nested instructions
Validate: Log sessionMode (cloud vs local) and retry

Problem: UI Delay or Flicker

Fix: Use background thread for prompt fetch
Validate: Profile using Instruments or Android Traceview

🧩 Tools to Add to Your Workflow

Gemini Prompt Analyzer (CLI) – Token breakdown + cost estimator
AIProfiler (Xcode) – Swift task and latency profiler
Langfuse / PromptLayer – Prompt history + scoring for production AI
Promptfoo – CLI and CI test runner for prompt regression

🔐 Privacy, Logging & User Transparency

Always log AI-generated responses with audit trail
Indicate fallback to cloud processing visually (badge, color)
Offer “Why did you suggest this?” links for AI-generated suggestions

🔬 Monitoring AI in Production

Use Firebase or BigQuery for structured AI logs
Track top 20 prompts, token overage, retries
Log user editing of AI replies (feedback loop)

📚 Further Reading

✅ Suggested TechsWill Posts

Integrating Google’s Gemini AI into Your Android App (2025 Guide)

June 28, 2025 by TechsWill

Illustration of a developer using Android Studio to integrate Gemini AI into an Android app with a UI showing chatbot, Kotlin code, and ML pipeline flow.

Gemini AI represents Google’s flagship approach to multimodal, on-device intelligence. Integrated deeply into Android 17 via the AICore SDK, Gemini allows developers to power text, image, audio, and contextual interactions natively — with strong focus on privacy, performance, and personalization.

This guide offers a step-by-step developer walkthrough on integrating Gemini AI into your Android app using Kotlin and Jetpack Compose. We’ll cover architecture, permissions, prompt design, Gemini session flows, testing strategies, and full-stack deployment patterns.

📦 Prerequisites & Environment Setup

Android Studio Flamingo or later (Vulcan recommended)
Gradle 8+ and Kotlin 1.9+
Android 17 Developer Preview (AICore required)
Compose compiler 1.7+

Configure build.gradle


plugins {
  id 'com.android.application'
  id 'org.jetbrains.kotlin.android'
  id 'com.google.aicore' version '1.0.0-alpha05'
}
dependencies {
  implementation("com.google.ai:gemini-core:1.0.0-alpha05")
  implementation("androidx.compose.material3:material3:1.2.0")
}

🔐 Required Permissions


&lt;uses-permission android:name="android.permission.AI_CONTEXT_ACCESS" /&gt;
&lt;uses-permission android:name="android.permission.RECORD_AUDIO" /&gt;
&lt;uses-permission android:name="android.permission.POST_NOTIFICATIONS" /&gt;

Prompt user with rationale screens using ActivityResultContracts.RequestPermission.

🧠 Gemini AI Core Concepts

PromptSession: Container for streaming messages and actions
PromptContext: Snapshot of app screen, clipboard, and voice input
PromptMemory: Maintains session-level memory with TTL and API bindings
AIAction: Returned commands from LLM to your app (e.g., open screen, send message)

Start a Gemini Session


val session = PromptSession.create(context)
val response = session.prompt("What is the best way to explain gravity to a 10-year-old?")
textView.text = response.generatedText

📋 Prompt Engineering in Gemini

Gemini uses structured prompt blocks to guide interactions. Use system messages to set tone, format, and roles.

Advanced Prompt Structure


val prompt = Prompt.Builder()
  .addSystem("You are a friendly science tutor.")
  .addUser("Explain black holes using analogies.")
  .build()
val reply = session.send(prompt)

🎨 UI Integration with Jetpack Compose

Use Gemini inside chat UIs, command bars, or inline suggestions:

Compose UI Example


@Composable
fun ChatbotUI(session: PromptSession) {
  var input by remember { mutableStateOf("") }
  var output by remember { mutableStateOf("") }

  Column {
    TextField(value = input, onValueChange = { input = it })
    Button(onClick = {
      CoroutineScope(Dispatchers.IO).launch {
        output = session.prompt(input).generatedText
      }
    }) { Text("Ask Gemini") }
    Text(output)
  }
}

📱 Building an Assistant-Like Experience

Gemini supports persistent session memory and chained commands, making it ideal for personal assistants, smart forms, or guided flows.

Features:

Multi-turn conversation memory
State snapshot feedback via PromptContext
Voice input support (STT)
Real-time summarization or rephrasing

📊 Gemini Performance Benchmarks

Text-only prompt: ~75ms on Tensor NPU (Pixel 8)
Multi-turn chat (5 rounds): ~180ms per response
Streaming + partial updates: enabled by default for Compose

Use the Gemini Debugger in Android Studio to analyze tokens, latency, and memory hits.

🔐 Security, Fallback, and Privacy

All prompts processed on-device
Only fallback to Gemini Cloud if session size > 16KB
Explicit user toggle required for external calls

Gemini logs only anonymous prompt metadata for training opt-in. Sensitive data is sandboxed in GeminiVault.

🛠️ Advanced Use Cases

Use Case 1: Smart Travel Planner

– Prompt: “Plan a 3-day trip to Kerala under ₹10,000 with kids” – Output: Budget, route, packing list – Assistant: Hooks into Maps API + calendar

Use Case 2: Code Explainer

– Input: Block of Java code – Output: Gemini explains line-by-line – Ideal for edtech, interview prep apps

Use Case 3: Auto Form Generator

– Prompt: “Generate a medical intake form” – Output: Structured JSON + Compose UI builder output – Gemini calls ComposeTemplate.generateFromSchema()

📈 Monitoring + DevOps

Gemini logs export to Firebase or BigQuery
Error logs viewable via Gemini SDK CLI
Prompt caching improves performance on repeated flows

📦 Release & Production Best Practices

Bundle Gemini fallback logic with offline + online tests
Gate Gemini features behind toggle to A/B test models
Use intent log viewer during QA to assess AI flow logic

🔗 Resources

✅ Suggested Posts

Android 17 Preview: Jetpack Reinvented, AI Assistant Unleashed

June 27, 2025 by TechsWill

Illustration of Android Studio with Jetpack Compose layout preview, Kotlin code for AICore integration, foldable emulator mockups, and developer icons

Android 17 is shaping up to be one of the most developer-centric Android releases in recent memory. Google has doubled down on Jetpack Compose enhancements, large-screen support, and first-party AI integration via the new AICore SDK. The 2025 developer preview gives us deep insight into what the future holds for context-aware, on-device, privacy-first Android experiences.

This comprehensive post explores the new developer features, Kotlin code samples, Jetpack UI practices, on-device AI security, and use cases for every class of Android device — from phones to foldables to tablets and embedded displays.

🔧 Jetpack Compose 1.7: Foundation of Modern Android UI

Compose continues to evolve, and Android 17 includes the long-awaited Compose 1.7 update. It delivers smoother animations, better modularization, and even tighter Gradle integration.

Key Jetpack 1.7 Features

AnimatedVisibility 2.0: Includes fine-grained lifecycle callbacks and composable-driven delays
AdaptivePaneLayout: Multi-pane support with drag handles, perfect for dual-screen or foldables
LazyStaggeredGrid: New API for Pinterest-style masonry layouts
Previews-as-Tests: Now you can promote preview configurations directly to instrumented UI tests

Foldable App Sample


@Composable
fun TwoPaneUI() {
  AdaptivePaneLayout {
    pane(0) { ListView() }
    pane(1) { DetailView() }
  }
}

The foldable-first APIs allow layout hints based on screen posture (flat, hinge, tabletop), letting developers create fluid experiences across form factors.

🧠 AICore SDK: Android’s On-Device Assistant Platform

The biggest highlight of Android 17 is the introduction of AICore, Google’s new on-device assistant framework. AICore allows developers to embed personalized AI assistants directly into their apps — with no server dependency, no user login required, and full integration with app state.

AICore Capabilities

Prompt-based AI suggestions
Context-aware call-to-actions
Knowledge retention within app session
Fallback to local LLMs for longer queries

Integrating AICore in Kotlin


val assistant = rememberAICore()
val reply = assistant.prompt("What does this error mean?")
LaunchedEffect(reply) {
  resultView.text = reply.result
}

Apps can register their own knowledge domains, feed real-time app state into AICore context, and bind UI intents to assistant actions. This enables smarter onboarding, form validation, user education, and troubleshooting.

🛠️ MLKit + Jetpack Compose + Android Studio Vulcan

Google has fully integrated MLKit into Jetpack Compose for Android 17. Developers can now use drag-and-drop machine learning widgets in Jetpack Preview Mode.

MLKit Widgets Now Available:

BarcodeScannerBox
PoseOverlay (for fitness & yoga apps)
TextRecognitionArea
Facial Landmark Overlay

Android Studio Vulcan Canary 2 adds an AICore debugger, foldable emulator, and trace-based Compose previewing — allowing you to see recomposition latency, AI task latency, and UI bindings in real time.

🔐 Privacy and Local Execution

All assistant tasks in Android 17 run locally by default using the Tensor APIs and Android Runtime (ART) sandboxed extensions. Google guarantees:

No persistent logs are saved after prompt completion
No network dependency for basic suggestion/command functions
Explicit permission prompts for calendar, location, microphone use

This new model dramatically reduces battery usage, speeds up AI response times, and brings offline support for real-world scenarios (e.g., travel, remote regions).

📱 Real-World Developer Use Cases

For Productivity Apps:

Generate smart templates for tasks and events
Auto-suggest project summaries
Use MLKit OCR to recognize handwritten notes

For eCommerce Apps:

Offer FAQ-style prompts based on the product screen
Generate product descriptions using AICore + session metadata
Compose thank-you emails and support messages in-app

For Fitness and Health Apps:

Pose analysis with PoseOverlay
Voice-based assistant: “What’s my next workout?”
Auto-track activity goals with notification summaries

🧪 Testing, Metrics & DevOps

AICore APIs include built-in telemetry support. Developers can:

Log assistant usage frequency (anonymized)
See latency heatmaps per prompt category
View prompt failure reasons (token limit, no match, etc.)

Everything integrates into Firebase DebugView and Logcat. AICore also works with Espresso test runners and Jetpack Compose UI tests.

✅ Final Thoughts

Android 17 is more than just an update — it’s a statement. Google is telling developers: “Compose is your future. AI is your core.” If you’re building user-facing apps in 2025 and beyond, Android 17’s AICore, MLKit widgets, and foldable-ready Compose layouts should be the foundation of your design system.

📦 What Makes a Good Edge LLM?

🔝 Top 10 Free LLMs for Mobile and Edge

1. Mistral 7B (Quantized)

2. LLaMA 3 (8B, 4B)

3. Phi-2 (by Microsoft)

4. TinyLLaMA (1.1B)

5. Mistral Mini (2.7B, new)

6. Gemma 2B (Google)

7. Neural Chat (Intel 3B)

8. Falcon-RW 1.3B

9. Dolphin 2.2 (2B, uncensored)

10. WizardCoder (1.5B)

🧰 How to Run LLMs on Device

🟩 Android

🍎 iOS / macOS

📊 Benchmark Snapshot (on-device)

🔐 Offline Use Cases

📂 Recommended Tools

📚 Further Reading

📌 Why AI Debugging Is Different

🛠 Debugging Gemini AI (Android)

1. Gemini Debug Console (Android Studio Vulcan)

2. PromptSession Logs

3. Prompt Failure Types

🧪 Testing with Gemini

Sample Replay in Kotlin

🍎 Debugging Apple Intelligence (iOS/macOS)

1. Xcode AI Debug Panel

2. AIEditTask Testing

3. LiveContext Snapshot Viewer

✅ Common Debug Patterns

Problem: Model Hallucination

Problem: Prompt Fallback Triggered

Problem: UI Delay or Flicker

🧩 Tools to Add to Your Workflow

🔐 Privacy, Logging & User Transparency

🔬 Monitoring AI in Production

📚 Further Reading

✅ Suggested TechsWill Posts

📦 Prerequisites & Environment Setup

Configure build.gradle

🔐 Required Permissions

🧠 Gemini AI Core Concepts

Start a Gemini Session

📋 Prompt Engineering in Gemini

Advanced Prompt Structure

🎨 UI Integration with Jetpack Compose

Compose UI Example

📱 Building an Assistant-Like Experience

Features:

📊 Gemini Performance Benchmarks

🔐 Security, Fallback, and Privacy

🛠️ Advanced Use Cases

Use Case 1: Smart Travel Planner

Use Case 2: Code Explainer

Use Case 3: Auto Form Generator

📈 Monitoring + DevOps

📦 Release & Production Best Practices

🔗 Resources

✅ Suggested Posts

🔧 Jetpack Compose 1.7: Foundation of Modern Android UI

Key Jetpack 1.7 Features

Foldable App Sample

🧠 AICore SDK: Android’s On-Device Assistant Platform

AICore Capabilities

Integrating AICore in Kotlin

🛠️ MLKit + Jetpack Compose + Android Studio Vulcan

MLKit Widgets Now Available:

🔐 Privacy and Local Execution

📱 Real-World Developer Use Cases

For Productivity Apps:

For eCommerce Apps:

For Fitness and Health Apps:

🧪 Testing, Metrics & DevOps

✅ Final Thoughts

🔗 Further Reading

✅ Suggested Posts: