The execution story: three AI layers, evals, a prototype, a feature built from user feedback. One shipped app in 12 weeks.
Loft AI is an AI-powered bookmarking app. Save content from anywhere, find it again through natural language. One tap from any app, auto-organized into collections, with an AI chat that retrieves what you saved when you need it.
This case study covers three things: how we validated three AI layers and caught a hallucinating model, how I prototyped a Home Screen redesign that unblocked the team, and how we shipped the first version of the app after multiple iterations, including a share-sheet feature that now drives 82.4% of bookmarks.
The discovery and scoping work (user research, segmentation, the mobile-first call) are in the previous case study. This picks up where that ends.
Layer 1: Content ingestion and enrichment
When a user saves a link, Loft extracts metadata via oEmbed (social platforms) and browser parsing (web), then sends it to an LLM to return a clean title, summary, auto-tags, and image.
We tested across the real breadth of what users save: Instagram, LinkedIn, X, YouTube, news, recipes, product pages. Two things came out of this:
Model selection mattered. One candidate model returned a summary that called a well-known public speaker a comedian, a clear hallucination. That model was out. We landed on GPT-4o mini: accurate enough for the use case, cost-effective at the call volume this feature needed.
Layer 2: Collections clustering
Saved content is vectorized and stored, then auto-clustered into user-specific Collections by semantic similarity. We tested against ~2,000 bookmarks across diverse save types, checking that topically related saves grouped together and that collection names were readable.
One finding the eval surfaced: how aggressively collections get created looks like a model question, but it's really a user preference. Tuning the model harder wouldn't produce a "right" answer when the right answer is personal. The fix is user control, not better clustering. That was parked for a later phase. For launch, we chose defaults that felt sensible for most saves.
Key design decision: if a bookmark can't be confidently clustered, it lands in the same "Reading List" fallback with the source URL intact. Users see a bookmark, not an error.
Layer 3: Ask Loft (RAG-based chat)
Users ask natural language questions across their saved content. The query runs against the vector store, retrieves the closest matches, and synthesizes a response.
Practical fix from testing: capped retrieval at the 20 most relevant results per query. Unbounded responses were slow and sprawling.
With the AI layers validated and producing the outputs we wanted, we turned to design.
The early home screen and library screen were functionally identical, both showing the list of recently added bookmarks. The initial mockups had envisioned daily cards and a personalized feed, but those were parked for a later release. I felt the home screen, as it stood, wasn't delivering any distinguishable value to the user.
The challenge: make the home screen feel meaningful for users without heavy engineering investment.
I did some brainstorming and then prototyped. A couple of hours in Lovable and v0, building a version of the home screen using only data the app already had for every user: top three collections by save count, recent bookmarks, top tags, and quick actions. No new backend endpoints. No LLM calls.
I recorded a short walkthrough and paired it with a lean PRD: the reasoning behind each section and edge cases for new users. Both shared together, so the team saw the idea and the thinking at once.
One round of feedback. One iteration. The designer applied the design system, engineering shipped it. The prototype was adopted essentially as-is. It's live today as Loft's home screen.
With the first version of the app ready, we started sharing TestFlight builds with early users.
What testers told us: saving to Loft meant copying a link, opening the app, and pasting it in. The same number of steps as forwarding something to WhatsApp or saving it anywhere else.
One of the engineers proposed invoking Loft directly from the iOS and Android share sheet: one tap, no opening the app first. We tried it internally, it felt right, and we built it.
Blockers and pivots: Loft launched as a freemium product, which meant integrating a paywall SDK alongside everything else. In parallel, I set up analytics in Amplitude, designing the event schema from scratch so we'd have visibility into user behavior.
The first App Store submission was rejected for insufficient data disclosure.
Mid-sprint, the paywall SDK broke: product IDs stopped syncing with App Store Connect, taking the payment flow down. When engineering decided to switch to RevenueCat, I partnered closely on the integration: tier structure, feature gating, paywall UX.
Launches don't always go as planned. The job is to navigate the roadblocks without losing the thread.
We targeted a tighter window. We shipped in 12 weeks. Slippage came from model optimization, the paywall tool switch, and the App Store review cycle. Every delay had a clear cause and a decision attached to it.
Loft AI is live on iOS and Android. Free with in-app purchase: Loft Pro for power users. The three AI layers shipped as designed. The share-sheet accounts for 82.4% of saves.