Why

I decided to do this for the sheer convenience it offered. Often times, I do the task of drafting the content flow for a blog, or a presentation, or taking quick notes on Obsidian, and my vault is synced with the one on my phone using GitHub.

Having a tool that seamlessly helps me plan while running locally is quite beneficial given the spotty internet connection in our hostels. Having it run locally also adds the benefit of added privacy.

Now one may ask, why use FLM (FastFlowLM)? The easy answer is that it lets me put my NPU to good use, while being able to conserve my precious battery life. This is especially beneficial on those days when I need to carry my laptop to class and I don’t want to take the charger along.

What is FastFLowLM?

FastFlowLM is an NPU-first runtime that offers an Ollama-style experience while allowing for inference using the on-device NPU. Currently, it’s a Ryzen-only platform, but they seem to have plans to launch it for Qualcomm and Intel hardware soon too. What makes it so good is that it’s just a small executable that effectively abstracts most low-level details to expose a known interface for you.

Rather check them out at: FastFlowLM Or contribute on: FastFlowLM

When to Use RAG

RAG wins when your knowledge base changes frequently or when you need verifiable, sourced answers:

  • Dynamic content — docs, policies, product catalogs
  • Source attribution — “According to section 3.2 of the manual…”
  • Large knowledge bases — too much to fit in context or training data

What did I do?

Nothing, really. It was installing a few plugins into Obsidian and writing a script to start the FLM server every time I open Obsidian.

The plugins:

  • AI Providers
  • Local GPT

First, we start the FLM server using:

flm serve

Since the FLM server exposes an OpenAI-style API, we configure the AI providers plugin with the following steps:

  • Go to “Add new provider” in the AI Providers plugin.
  • Set Provider type to OpenAI.
  • Set Provider URL as http://127.0.0.1:52625/v1.
  • Enter any provider name (it’s required).
  • Enter any random text as API Key; it’s ignored by the FLM server but is technically required.
  • In the model field, click refresh (the server must be running for this) and select a model.

Now, to configure the shortcuts and the LocalGPT plugin, follow these additional steps:

  • Go to the plugin settings.
  • Select the Main AI provider from the list.
  • Once done, you can scroll down to see its capabilities and also set shortcuts for them in the plugins page.