Back to Blog
Architecture

Designing Privacy-First Architecture

Why local-first processing matters, and how we built Chai.im and Hippo with privacy as a core architectural principle, not an afterthought.

GrepLabs Team
December 20, 2025
12 min read

Privacy shouldn't be a feature—it should be fundamental to how systems are built. At GrepLabs, we design architecture with privacy at the core, not as an afterthought. This article explains our approach and why it matters.

The Problem with Traditional Architecture

Most modern applications follow a simple pattern:

  • Collect user data
  • Send it to servers
  • Process it in the cloud
  • Return results

This model has serious problems:

  • **Data breaches**: Centralized data stores are attractive targets
  • **Surveillance**: Companies can analyze and monetize user behavior
  • **Lock-in**: Your data lives on someone else's servers
  • **Latency**: Every operation requires a network round-trip

Our Core Principles

1. Local-First Processing

Process data on the user's device whenever possible. This isn't just about privacy—it's about performance and resilience.

Benefits:

  • **Privacy**: Data never leaves your device
  • **Speed**: No network latency
  • **Offline**: Works without internet
  • **Cost**: Reduced server infrastructure

Implementation in Hippo:

// Local vector embedding generation
async function indexFile(file: File) {
  // All processing happens locally
  const content = await extractText(file);
  const chunks = chunkText(content, 512);

  // Local AI model generates embeddings
  const embeddings = await localModel.embed(chunks);

  // Store in local SQLite database
  await localDb.insert('documents', {
    path: file.path,
    embeddings,
    metadata: extractMetadata(file)
  });
}

2. End-to-End Encryption

When data must leave the device, encrypt it so only the intended recipient can read it.

Chai.im's Signal Protocol Implementation:

The Signal Protocol provides:

  • **Perfect forward secrecy**: Compromising one key doesn't reveal past messages
  • **Deniability**: You can't prove someone sent a specific message
  • **Asynchronous**: Works even when recipients are offline
// Simplified encryption flow
async function sendMessage(recipient: string, plaintext: string) {
  // Get recipient's public key
  const recipientKey = await keyStore.getPublicKey(recipient);

  // Generate ephemeral key pair for this message
  const ephemeralKey = generateKeyPair();

  // Derive shared secret
  const sharedSecret = deriveSharedSecret(
    ephemeralKey.privateKey,
    recipientKey
  );

  // Encrypt with AES-256-GCM
  const ciphertext = encrypt(plaintext, sharedSecret);

  // Send encrypted message
  return {
    ephemeralPublic: ephemeralKey.publicKey,
    ciphertext,
    nonce: generateNonce()
  };
}

3. Minimal Data Collection

Only collect what's absolutely necessary, and be transparent about what you collect.

What we collect:

  • Anonymous usage metrics (opt-in)
  • Crash reports (opt-in)
  • Account email (for authentication only)

What we don't collect:

  • Message content
  • File contents
  • Search queries
  • Browsing history
  • Contact lists

4. Transparency Through Open Source

Open source code ensures trust through verifiability. Anyone can audit our implementations and verify our privacy claims.

Deep Dive: Chai.im Architecture

Chai.im is our enterprise encrypted messaging platform. Here's how privacy is baked into every layer:

Message Flow

┌──────────────┐     E2E Encrypted      ┌──────────────┐
│   Sender     │ ────────────────────→  │   Server     │
│   Device     │                        │  (Relay)     │
└──────────────┘                        └──────────────┘
       │                                       │
       │ Local AI                              │ Encrypted
       │ Processing                            │ Storage
       ▼                                       ▼
┌──────────────┐                        ┌──────────────┐
│   Summary    │                        │   Recipient  │
│   Generated  │                        │   Device     │
└──────────────┘                        └──────────────┘

Key Points:

  • Server never sees plaintext messages
  • AI summaries generated locally
  • Server only relays encrypted blobs
  • Metadata minimized (no IP logging)

Authentication: FIDO2/WebAuthn

We use hardware security keys and biometrics instead of passwords:

// Registration flow
async function registerDevice() {
  const credential = await navigator.credentials.create({
    publicKey: {
      challenge: serverChallenge,
      rp: { name: "Chai.im" },
      user: { id: userId, name: userEmail },
      pubKeyCredParams: [
        { type: "public-key", alg: -7 },  // ES256
        { type: "public-key", alg: -257 } // RS256
      ],
      authenticatorSelection: {
        authenticatorAttachment: "platform",
        userVerification: "required"
      }
    }
  });

  return sendToServer(credential);
}

Benefits:

  • No passwords to phish
  • Hardware-bound credentials
  • Biometric verification
  • Resistant to credential stuffing

HIPAA Compliance

For healthcare customers, we provide:

  • Audit logging (encrypted)
  • Access controls
  • Message retention policies
  • BAA agreements
  • Compliance documentation

Deep Dive: Hippo Architecture

Hippo is our local-first file organizer with AI-powered semantic search.

Local Processing Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   File       │ ──→ │   Text       │ ──→ │   Vector     │
│   Watcher    │     │   Extractor  │     │   Embedding  │
└──────────────┘     └──────────────┘     └──────────────┘
                                                 │
                                                 ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Search     │ ←── │   Query      │ ←── │   Local      │
│   Results    │     │   Engine     │     │   SQLite     │
└──────────────┘     └──────────────┘     └──────────────┘

Everything runs locally:

  • File watching (native OS APIs)
  • Text extraction (Rust libraries)
  • Vector embeddings (ONNX runtime)
  • SQLite with vector extensions
  • Search ranking algorithms

Performance at Scale

Indexing 100K+ files requires optimization:

// Batch processing for efficiency
async fn index_batch(files: Vec<PathBuf>) -> Result<()> {
    // Process files in parallel
    let embeddings: Vec<_> = files
        .par_iter()
        .map(|f| extract_and_embed(f))
        .collect();

    // Batch insert to SQLite
    let mut tx = db.begin_transaction()?;
    for (path, embedding) in files.iter().zip(embeddings) {
        tx.insert_document(path, embedding)?;
    }
    tx.commit()?;

    Ok(())
}

Results:

  • Initial indexing: ~1000 files/minute
  • Incremental updates: <100ms
  • Search latency: <50ms for 100K files

Optional Encrypted Sync

For users who want cross-device access:

  • **Client-side encryption**: Files encrypted before upload
  • **Key derivation**: Master key derived from user password
  • **Zero-knowledge**: Server cannot decrypt files
  • **Selective sync**: Choose what to sync

Beyond Privacy: Additional Benefits

Performance

Local processing eliminates network latency:

  • Hippo search: <50ms vs 500ms+ for cloud solutions
  • Chai.im AI summaries: instant vs seconds of delay

Reliability

Works offline:

  • Hippo: Full functionality without internet
  • Chai.im: Queue messages for later delivery

Cost Efficiency

Reduced server infrastructure:

  • Less cloud compute
  • Less storage
  • Lower bandwidth costs
  • Savings passed to users

Challenges and Trade-offs

Device Storage

Local-first requires storage on user devices. We mitigate this with:

  • Efficient compression
  • Smart caching
  • User-controlled retention

Consistency

Without a central server, syncing is harder. Solutions:

  • CRDTs for conflict resolution
  • Vector clocks for ordering
  • Eventual consistency model

Recovery

If you lose your device, data could be lost. Options:

  • Optional encrypted backup
  • Multi-device sync
  • Export/import tools

Conclusion

Privacy-first architecture isn't just about protecting data—it's about building better software. By processing locally, encrypting everything, minimizing collection, and being transparent, we create products that are faster, more reliable, and more trustworthy.

The future of software is local-first. Join us in building it.


*Interested in our architecture? Check out our open source repositories for implementation details.*

Tags
PrivacyArchitectureSecurityLocal-First