100% local · Sin servidores · Sin suscripción 100% on-device · No servers · No subscription

Neo Mambí IA Chat

Un asistente de IA que corre dentro de tu teléfono. Gemma 4, Qwen 2.5 y DeepSeek-R1 sin servidores, sin tarjeta extranjera, sin VPN. An AI assistant that runs inside your phone. Gemma 4, Qwen 2.5 and DeepSeek-R1 with no servers, no foreign card, no VPN.

Pensado para Cuba y para cualquier lugar donde no se pueda pagar ChatGPT o no haya internet. Privacidad por arquitectura. Built for Cuba and anywhere paying for ChatGPT isn't an option or internet isn't reliable. Privacy by architecture.

Descargar Download APK Android

Disponible para Android en Google Play o por descarga directa del APK. Available for Android on Google Play or as a direct APK download.

01 · La idea 01 · The idea

La IA vive en tu teléfono, no en una nube ajena The AI lives on your phone, not in someone else's cloud

Las apps de IA tradicionales mandan cada palabra que escribes a un servidor lejano. Cuesta dinero, consume datos móviles y tus conversaciones pasan por sistemas que no controlas. Traditional AI apps send every word you type to a distant server. It costs money, eats mobile data and your conversations pass through systems you don't control.

IA Chat le da la vuelta a eso: descargas el modelo una vez y a partir de ahí toda la inferencia ocurre en tu dispositivo. Cero envíos. Cero suscripción. IA Chat flips that: you download the model once and from then on all inference happens on your device. Zero uploads. Zero subscription.

IA en la nube vs. IA local Cloud AI vs. local AI

NubeCloud

✕Tarjeta extranjeraForeign card needed
✕Suscripción mensualMonthly subscription
✕Internet siempreAlways-on internet
✕Chats salen de tu paísChats leave your country
✕Censura geográficaGeo-blocked

LocalOn-device

✓Sin tarjetaNo card
✓$0 al mes$0/month
✓Funciona offlineWorks offline
✓Datos en tu teléfonoData stays on phone
✓Funciona en CubaWorks in Cuba

02 · Modelos disponibles 02 · Available models

Cuatro modelos, elige el que se ajuste a tu teléfono Four models, pick the one your phone fits

Todos corren con LiteRT-LM directamente en el dispositivo. Cambia entre modelos en cualquier momento según la tarea o la RAM disponible. All run on LiteRT-LM directly on-device. Swap models at any time depending on the task or available RAM.

Recomendado Recommended

Google

Gemma 4 E2B

MultimodalMultimodal

El equilibrio perfecto entre tamaño y capacidad. Acepta texto, imagen y audio. Contexto extendido de 32K tokens. Multi Token Prediction (MTP) para decodificación más rápida. The perfect balance of size and capability. Accepts text, image and audio. Extended 32K-token context. Multi Token Prediction (MTP) for faster decoding.

🖼️ ImagenImage 🎙️ Audio 🧠 RazonaReasoning

~2.4 GB · 8 GB RAM

Google

Gemma 4 E4B

Multimodal · Más potenteMultimodal · More powerful

El hermano mayor de Gemma 4. Más parámetros y mejor calidad de respuesta, manteniendo soporte de imagen, audio y razonamiento. Para teléfonos con buena RAM. Gemma 4's older sibling. More parameters and stronger answer quality while keeping image, audio and reasoning support. For phones with solid RAM.

🖼️ ImagenImage 🎙️ Audio 🧠 RazonaReasoning

~3.4 GB · 12 GB RAM

Alibaba

Qwen 2.5 1.5B

Texto · LivianoText · Lightweight

Modelo de texto pequeño y rápido, optimizado por Alibaba para asistencia en español. Cabe en teléfonos con menos RAM y responde con baja latencia. A small, fast text model from Alibaba, optimized for Spanish assistance. Fits in phones with less RAM and responds with low latency.

💬 ChatChat ⚡ RápidoFast

~1.5 GB · 6 GB RAM

DeepSeek

R1 Distill Qwen 1.5B

Texto · RazonamientoText · Reasoning

Modelo destilado de DeepSeek-R1 que muestra su cadena de pensamiento paso a paso. Ideal para problemas que requieren análisis o matemáticas. A DeepSeek-R1 distillation that shows its chain of thought step by step. Ideal for problems that need analysis or math.

💬 ChatChat 🧠 RazonaReasoning

~1.7 GB · 6 GB RAM

Los modelos se descargan desde Hugging Face (litert-community). Una vez descargados funcionan sin conexión. Models are downloaded from Hugging Face (litert-community). Once downloaded they work offline.

03 · Capacidades 03 · Capabilities

Más que un chat. Un asistente completo. More than a chat. A complete assistant.

Chat persistentePersistent chat

Conversaciones guardadas localmente. Vuelve a una sesión semanas después y sigue donde la dejaste. Conversations saved locally. Come back to a session weeks later and pick up where you left off.

Imagen, audio y PDFImage, audio & PDF

Adjunta foto y pregunta. Graba audio y resúmelo. Sube un PDF y haz preguntas — todo sin salir del teléfono. Attach a photo and ask. Record audio and summarize it. Upload a PDF and ask questions — all without leaving the phone.

RAG sobre tus documentosRAG over your documents

Indexa tus PDFs y notas con MediaPipe. Pregunta y recibe respuestas citando el documento. Todo offline. Index your PDFs and notes with MediaPipe. Ask and get answers that cite the source. All offline.

Extended thinkingExtended thinking

Con DeepSeek-R1 o Gemma 4 (modo thinking) ves el razonamiento paso a paso en bloques colapsables. Útil para problemas largos. With DeepSeek-R1 or Gemma 4 (thinking mode) you see the step-by-step reasoning in collapsible blocks. Useful for long problems.

Regenera y comparaRegenerate & compare

No te gustó la respuesta? Regenera con otra semilla o cambia de modelo. Navega entre versiones. Not happy with the answer? Regenerate with another seed or swap the model. Navigate across versions.

Ajusta el samplerTune the sampler

Top-K, Top-P, temperatura y ventana de contexto. Configuración profesional para el que la quiera. Top-K, Top-P, temperature and context window. Pro-grade tuning for those who want it.

04 · Por qué confiar 04 · Why trust it

"100% local" no es marketing "100% local" isn't marketing

Estas son las cuatro promesas técnicas que la arquitectura hace cumplir, no nuestra palabra. These are the four technical promises the architecture enforces — not just our word.

Inferencia en CPU/GPU/NPU del teléfonoInference on phone CPU/GPU/NPU

Construido sobre LiteRT-LM (Google AI Edge) y AICore. La predicción de cada token ocurre en el chip de tu teléfono, sin pasar por ningún servidor de IA. Built on LiteRT-LM (Google AI Edge) and AICore. Every token prediction happens on your phone's chip, with no AI server in the loop.

Red solo para la descarga inicialNetwork used only for initial download

Los modelos se bajan una vez desde Hugging Face. Después puedes pasar el teléfono a modo avión y la app sigue funcionando perfectamente. Models download once from Hugging Face. After that you can put the phone in airplane mode and the app keeps working perfectly.

Cero telemetría, cero analíticaZero telemetry, zero analytics

No mandamos a ningún sitio qué preguntas haces, qué modelos usas, cuántas veces abres la app. No rastreamos a quien la usa. We send nothing to anyone about what you ask, what model you use, how often you open the app. We don't track who uses it.

Tus chats viven contigoYour chats live with you

Conversaciones, documentos importados y audios se guardan localmente en una base SQLite. Bórralos en cualquier momento desde la app. Conversations, imported documents and audio are stored locally in a SQLite database. Wipe them at any time from inside the app.

05 · Disponible 05 · Available now

La IA que cabe en tu bolsillo
sin pedir nada a cambio The AI that fits in your pocket
asking nothing in return

Ya disponible en Google Play y como APK directo para Android. Descárgalo, instálalo y empieza a conversar sin servidores ni suscripciones. Now on Google Play and as a direct APK for Android. Download it, install it and start chatting without servers or subscriptions.

Descargar APK Download APK Ver el resto del ecosistema See the rest of the ecosystem

Chat se construyó tomando como base el proyecto open source Google AI Edge Gallery, al que agradecemos su trabajo y su licencia abierta. IA Chat was built on top of the open-source project Google AI Edge Gallery, whose work and open license we gratefully acknowledge.