Designing Privacy-First Architecture
Why local-first processing matters, and how we built Chai.im and Hippo with privacy as a core architectural principle, not an afterthought.
Privacy shouldn't be a feature—it should be fundamental to how systems are built. At GrepLabs, we design architecture with privacy at the core, not as an afterthought. This article explains our approach and why it matters.
The Problem with Traditional Architecture
Most modern applications follow a simple pattern:
- Collect user data
- Send it to servers
- Process it in the cloud
- Return results
This model has serious problems:
- **Data breaches**: Centralized data stores are attractive targets
- **Surveillance**: Companies can analyze and monetize user behavior
- **Lock-in**: Your data lives on someone else's servers
- **Latency**: Every operation requires a network round-trip
Our Core Principles
1. Local-First Processing
Process data on the user's device whenever possible. This isn't just about privacy—it's about performance and resilience.
Benefits:
- **Privacy**: Data never leaves your device
- **Speed**: No network latency
- **Offline**: Works without internet
- **Cost**: Reduced server infrastructure
Implementation in Hippo:
// Local vector embedding generation
async function indexFile(file: File) {
// All processing happens locally
const content = await extractText(file);
const chunks = chunkText(content, 512);
// Local AI model generates embeddings
const embeddings = await localModel.embed(chunks);
// Store in local SQLite database
await localDb.insert('documents', {
path: file.path,
embeddings,
metadata: extractMetadata(file)
});
}2. End-to-End Encryption
When data must leave the device, encrypt it so only the intended recipient can read it.
Chai.im's Signal Protocol Implementation:
The Signal Protocol provides:
- **Perfect forward secrecy**: Compromising one key doesn't reveal past messages
- **Deniability**: You can't prove someone sent a specific message
- **Asynchronous**: Works even when recipients are offline
// Simplified encryption flow
async function sendMessage(recipient: string, plaintext: string) {
// Get recipient's public key
const recipientKey = await keyStore.getPublicKey(recipient);
// Generate ephemeral key pair for this message
const ephemeralKey = generateKeyPair();
// Derive shared secret
const sharedSecret = deriveSharedSecret(
ephemeralKey.privateKey,
recipientKey
);
// Encrypt with AES-256-GCM
const ciphertext = encrypt(plaintext, sharedSecret);
// Send encrypted message
return {
ephemeralPublic: ephemeralKey.publicKey,
ciphertext,
nonce: generateNonce()
};
}3. Minimal Data Collection
Only collect what's absolutely necessary, and be transparent about what you collect.
What we collect:
- Anonymous usage metrics (opt-in)
- Crash reports (opt-in)
- Account email (for authentication only)
What we don't collect:
- Message content
- File contents
- Search queries
- Browsing history
- Contact lists
4. Transparency Through Open Source
Open source code ensures trust through verifiability. Anyone can audit our implementations and verify our privacy claims.
Deep Dive: Chai.im Architecture
Chai.im is our enterprise encrypted messaging platform. Here's how privacy is baked into every layer:
Message Flow
┌──────────────┐ E2E Encrypted ┌──────────────┐
│ Sender │ ────────────────────→ │ Server │
│ Device │ │ (Relay) │
└──────────────┘ └──────────────┘
│ │
│ Local AI │ Encrypted
│ Processing │ Storage
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Summary │ │ Recipient │
│ Generated │ │ Device │
└──────────────┘ └──────────────┘Key Points:
- Server never sees plaintext messages
- AI summaries generated locally
- Server only relays encrypted blobs
- Metadata minimized (no IP logging)
Authentication: FIDO2/WebAuthn
We use hardware security keys and biometrics instead of passwords:
// Registration flow
async function registerDevice() {
const credential = await navigator.credentials.create({
publicKey: {
challenge: serverChallenge,
rp: { name: "Chai.im" },
user: { id: userId, name: userEmail },
pubKeyCredParams: [
{ type: "public-key", alg: -7 }, // ES256
{ type: "public-key", alg: -257 } // RS256
],
authenticatorSelection: {
authenticatorAttachment: "platform",
userVerification: "required"
}
}
});
return sendToServer(credential);
}Benefits:
- No passwords to phish
- Hardware-bound credentials
- Biometric verification
- Resistant to credential stuffing
HIPAA Compliance
For healthcare customers, we provide:
- Audit logging (encrypted)
- Access controls
- Message retention policies
- BAA agreements
- Compliance documentation
Deep Dive: Hippo Architecture
Hippo is our local-first file organizer with AI-powered semantic search.
Local Processing Pipeline
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ File │ ──→ │ Text │ ──→ │ Vector │
│ Watcher │ │ Extractor │ │ Embedding │
└──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Search │ ←── │ Query │ ←── │ Local │
│ Results │ │ Engine │ │ SQLite │
└──────────────┘ └──────────────┘ └──────────────┘Everything runs locally:
- File watching (native OS APIs)
- Text extraction (Rust libraries)
- Vector embeddings (ONNX runtime)
- SQLite with vector extensions
- Search ranking algorithms
Performance at Scale
Indexing 100K+ files requires optimization:
// Batch processing for efficiency
async fn index_batch(files: Vec<PathBuf>) -> Result<()> {
// Process files in parallel
let embeddings: Vec<_> = files
.par_iter()
.map(|f| extract_and_embed(f))
.collect();
// Batch insert to SQLite
let mut tx = db.begin_transaction()?;
for (path, embedding) in files.iter().zip(embeddings) {
tx.insert_document(path, embedding)?;
}
tx.commit()?;
Ok(())
}Results:
- Initial indexing: ~1000 files/minute
- Incremental updates: <100ms
- Search latency: <50ms for 100K files
Optional Encrypted Sync
For users who want cross-device access:
- **Client-side encryption**: Files encrypted before upload
- **Key derivation**: Master key derived from user password
- **Zero-knowledge**: Server cannot decrypt files
- **Selective sync**: Choose what to sync
Beyond Privacy: Additional Benefits
Performance
Local processing eliminates network latency:
- Hippo search: <50ms vs 500ms+ for cloud solutions
- Chai.im AI summaries: instant vs seconds of delay
Reliability
Works offline:
- Hippo: Full functionality without internet
- Chai.im: Queue messages for later delivery
Cost Efficiency
Reduced server infrastructure:
- Less cloud compute
- Less storage
- Lower bandwidth costs
- Savings passed to users
Challenges and Trade-offs
Device Storage
Local-first requires storage on user devices. We mitigate this with:
- Efficient compression
- Smart caching
- User-controlled retention
Consistency
Without a central server, syncing is harder. Solutions:
- CRDTs for conflict resolution
- Vector clocks for ordering
- Eventual consistency model
Recovery
If you lose your device, data could be lost. Options:
- Optional encrypted backup
- Multi-device sync
- Export/import tools
Conclusion
Privacy-first architecture isn't just about protecting data—it's about building better software. By processing locally, encrypting everything, minimizing collection, and being transparent, we create products that are faster, more reliable, and more trustworthy.
The future of software is local-first. Join us in building it.
*Interested in our architecture? Check out our open source repositories for implementation details.*