Blog

  • WriterAgent (LibreOffice Plugin) Week 6-7: Async Grammar Checking and Math

    This is the 4th in a series of articles discussing my work on a LibreOffice extension, now known as WriterAgent. Here’s a link to the first article for background: https://keithcu.com/wordpress/?p=5060

    At Microsoft, I spent five years working on the text components RichEdit and Quill, and came to understand the “physics” of word processing: the file formats, data structures, and algorithms that provided fast access to text and properties, independent of the length of the file. Selecting one million characters to make them bold took about the same time as changing one character, because of the clever data structures (piece tables) and algorithms in these engines.

    To be clear, changing more characters requires more repainting, but the code that actually applied the change to the document so it could be persisted to disk, and fetched while redrawing, ran in near-constant time. It did this for changes anywhere, in documents of any size, because of the piece table.

    Text editing is an interesting problem space: Latin line layout alone is a hard problem, even before you consider the rules for non-Western languages: tabs, justification, hyphenation, kerning, ligatures, numbering, etc. There are many interesting little features you might never have noticed, like merging connected underlines:

    On top of line layout, you add tables, embedded objects, columns, footnotes, indexes like a table of contents, and many other features required for real-world documents, and it becomes very complicated.

    Word is a codebase where “byzantine” is an understatement, but RichEdit and Quill were in those perfect Goldilocks zones where you could over time learn most of the details of the code, since they weren’t buried under a mountain of legacy features and cruft.

    When I decided to add a real-time AI grammar checker to WriterAgent, I knew what I was getting into, but I underestimated the trickery of LibreOffice’s UNO.

    The Silent Killer Bug

    LibreOffice has a linguistic subsystem that provides spelling and grammar checkers with a consistent UI. You register a proofreader component, and it calls you back, asking you to review text. You can return “no errors”, or a list of problems, explanations, and fixes.

    LibreOffice will draw blue underlines in the correct place, and create the pop-up menu when the user right-clicks on the squiggles. The menu shows the explanation of the mistake, and let the user choose the replacement, and LibreOffice will apply it to the document.

    That all sounds great, but it has a huge downside: it is entirely synchronous. However, I couldn’t even solve that problem because LibreOffice kept crashing!

    When UNO is unhappy, it doesn’t throw a message box with an error; it shuts down the entire program. This usually only happens when there is a developer error, but when the program disappears it’s hard to figure out what happened. My code tries to catch all exceptions and log errors, so that I can figure out what happened later, but when the program disappeared, there was no chance to do so.

    I spent several hours banging my head against the wall trying to figure out why the Tools – Options – Writing Aids dialog, where I had registered my grammar checker, would take down the whole LibreOffice process. Here’s a feature that almost works; I hope you’ve saved your changes!

    Somewhere my Python XProofreader class was causing LibreOffice to detonate even though my grammar checker was not being called yet.After failing with even a simple one-page grammar checker that immediately agrees everything is correct, I decided to fire up ye olde GNU debugger.

    It shows you the stack trace when the unhandled exception happens, which allows you to figure out the line of code that caused the problem. If you can narrow it down to one line, that usually clarifies things enough.

    It turns out, LibreOffice instantiates these services using a C++ function called createInstanceWithArgumentsAndContext. Even if you aren’t passing use arguments, the office suite throws a bunch of initialization variables at your constructor. If your Python __init__ method doesn’t handle them, the code fails to map the call, the stack misaligns, and the program dies.

    The fix? I changed the class’s __init__ method to accept *args (Python’s syntax for a variable length number of extra parameters), allowing LibreOffice to pass its hidden arguments, giving them a place to go besides corrupting the stack.

    It was just 4 characters (plus the addition of Any which tells the type checkers what type to expect, in this case any type: string, integer, etc.) to act as a bucket for LibreOffice’s variables and suddenly, the crashes stopped. Once that UNO weirdness was out of the way, I could actually start on the grammar checker.

    Async Queues and a Sentence Cache

    The reason I avoided working on a grammar checker was the whole sync / async problem. I had built a multi-threaded extension, but the LibreOffice proofreading API is synchronous. That means the entire LibreOffice app waits—the code that handles keyboard events and everything else—while you decide whether there is a problem. You can spin up another thread to make the request in the background, but that doesn’t stop LibreOffice from waiting for an answer. Since I use OpenRouter and Together.ai for my LLMs, the “hold tight while I do a quick network request” was not going to work.

    I like Mercury-2 which returns 250-500 tokens per second, so I can often get any answer in half a second. But when typing, even a half-second delay before anything shows up is annoying, and that’s the best-case scenario.

    Also, I was happy with the add_comment tool as a way for the LLM to suggest corrections. You can ask WriterAgent to “review”, give “feedback” or “suggestions” on your document, and it will go through it in one pass like a professional copyeditor, and add comments. Then, you can go through those notes at your own pace, and delete each when you are satisfied.

    I ’m not sure how many know of the comments feature in LibreOffice, it’s very cool but surely underutilized. The messages show up on the right-hand margin, in colored rectangles, almost like sticky notes that a professional might have written.

    Because the LLM has full context rather than just a single sentence, it can provide more useful feedback. I told the AI to use add_comment for both positive and negative feedback, so the users enjoy reading the notes rather than always dreading bad news. The add_comment tool call is in the main document context to make it easy for the LLM to review at any time. The user just has to trigger it with the right keywords.

    However, the LLMs occasionally grouped multiple similar issues in one comment, rather than creating separate comments at each location, which made it harder to find and fix the problems. I realized it’s nice to have a basic checker constantly running, verifying proper grammar, reminding you where a comma is needed or other sorts of nontrivial but essential details that should actually be fixed before you show it to your copyeditor. That way no one will wonder whether you graduated from middle school.

    Eventually, I decided to tackle the problem. I used the venerable LightProof Python grammar checker which was a great starting point for efficiently handling the ProofReading API, but its rules were regular expressions which take microseconds to check, so I used its foundation but had to change the guts.

    I tried two different designs to handle the sync-async issue: a fully async one where I would look up the results, cache them, and then give the answers later, and another design that returned error results right away, without completely halting the program, and which almost worked.

    While it is true that in the proofreading callback, the entire LibreOffice process is waiting, you can call any function in LibreOffice, including processEventsToIdle(). That function tells LibreOffice to process keyboard and other events that might have happened, including repainting the screen.

    It meant that from within my grammar checker callback, I could actually tell LibreOffice: “Do what you gotta do while I’m waiting on this network request.” You could type at full speed without seeing any delay as the screen repainted, even though the main thread and grammar checker were actually still waiting for an answer. It’s the power of recursion, being able to call LibreOffice back!

    While it mostly worked, a few things broke, like being able to right-click on errors. None of the menus would appear while the LibreOffice proofing subsystem was still waiting. You could type, but the app was still mostly on hold. I had to move to async, which I knew would create more challenges.

    So I changed the system to return “no errors” immediately, start the request in a background thread, save the results whenever they arrive, and if LibreOffice asks again with the exact same string, we’ll have a useful answer.

    The first problem I had to solve was that while I was looking up one answer, multiple new requests would come in as the user typed each character. So in the background worker, `_GrammarWorkQueue`, I keep only the newest request for each paragraph, and it only fires after a 1-second pause of no new requests. There is no point in trying to check anything until the user has calmed down.

    The next feature I needed was sentence caching, which is a minor topic in itself.

    The challenge is that not only does each language have different punctuation marks, but also some languages like Thai don’t use standard punctuation. They don’t have spaces between words, only between sentences, so the rules for deciding what is a sentence cannot be the same for all languages.

    I had done the easy part of auto-translating the user-visible strings into 34 languages, but now I needed some special rules in a few places to handle the quirks of those languages, like sentence determination. Fortunately, you can fetch the full list of Unicode punctuation marks, and store them in a little table.

    I needed to break chunks up into sentences because I saw cases where an LLM was given multiple sentences with many errors and it would get confused, and sometimes show zero problems. Perhaps it was thinking: “No issues,who knows? Maybe that is some intentional new poetic lingo I’m not familiar with. I’m not paid enough to try to explain all the issues in that mess.” Feeding it just one sentence at a time makes it more focused, although you can adjust the value in settings, and try it out on larger batches, and see how it behaves on your model.

    The async model works well enough because LO asks you to proof the paragraph every time. When a single sentence changes, only that new sentence is sent to be checked; the results for the rest are served from cache and reported as errors.

    Because LibreOffice updates the UI on a pull model, there is a chance that it never asks about an error that was found. I will eventually add a way to keep track of errors LibreOffice hasn’t asked us about yet, a way to poke the system. For example, toggling the language of the affected sentence to a new one, and back. There must be some simple trick like this to nudge LO to ask us to re‑evaluate the text, so we can give the results we went to all of this trouble to obtain. For now, it seems to report useful problems in practice. (The build on GitHub has the ability to persist errors, so saving and re-opening will show them all the next time.)

    Then there was the issue of the over-helpful AI. You type something simple like “This is a error.” The model knows it is wrong, it would flag the “a” as problematic, and suggest “is an error.” as the replacement. My initial, naive version of the code looked like a glitch in the matrix because you’d end up with: “This is is an error. error.” The system fixed one mistake, and created two new ones, like the Sorcerer’s Apprentice.

    To fix this, I wrote code to strip out any duplicate words before or after the suggestion that match any words at the beginning or end of the replacement. The resulting architecture: debouncing, deduplication, prefix/postfix matching, and caching, keeps the UI snappy and usually useful, no matter the speed of the LLM, while preserving the original synchronous API.

    Protecting Math from JSON

    While I was frustrated with the grammar checker, I decided to investigate math import.

    LibreOffice can generate beautiful equations, but it can be difficult for users to generate the required format in its editor. I decided to add a feature that lets the LLM create the formulas directly. You can describe what you want, either in plain text (E = mc^2) or using a description, and it can generate TeX math format, which LLMs know very well since it is so common on the internet for math. Once imported, these objects can be further edited by the user as beautifully formatted native Math objects.

    The secret was a library called latex2mathml. LibreOffice understands the MathML format already and can convert it into its math objects, so with this bit of Python magic, I could take the TeX from the LLM, convert it to MathML, and let LibreOffice take it from there. It took only a couple of hours to get it working since the Python library and LibreOffice were doing most of the work.

    I ran into a couple of issues; at first it would display “imes” instead of the multiplication operator. The issue was that streaming APIs return chunks of JSON. If the AI generates a LaTeX command like \times or \nabla, standard JSON parsers see the backslash, assume it’s a control character (like a tab \t or a new line \n), and mangle the math before it ever reaches the parsing code. I had to build a workaround for math blocks.

    I don’t have edit working yet, converting back to TeX from MathML is a completely separate problem, but at least the LLM can insert formulas, and the user can change it, or delete it and tell the AI how to make a better one.

    34 Languages and Auto-Translation

    Having translated it to 8 languages, it was almost no work to add more. I had already built a batching, multi-threaded auto-translation system that reads the .pot template files and translates any missing strings, up to 10 strings at a time using 8 concurrent threads.

    Because the infrastructure is automated, adding new locales is essentially painless. I decided to flip the switch and WriterAgent now supports 34 languages, including most European languages, plus Japanese, Korean, Chinese, Hindi, and other major Asian languages. For the translation, I use x-ai/grok-4.1-fast. It’s fast, intelligent, and inexpensive. Translating the extension into a new language costs a couple of pennies. Most of the strings are UI elements like “Send” or “Image Model,” so I don’t need a frontier model.

    In fact, because it’s so cheap to run these API calls, I set up a review system that has another model (such as Qwen for Chinese) review every translation and report errors, with an English description of the issue and suggestions. The review script generates a JSON file of improvements, which you can further modify and then apply to the translation file. I’ve made many changes that will go into version 0.7.7.1.

    Future Work

    There’s plenty of future work. Each time I add a feature, I find two new ones I could work on. If you want to try it out, the repo is here: https://github.com/KeithCu/writeragent. Let’s make LibreOffice and the free desktop AI-native!

  • Cursor for LibreOffice, Week 4-6

    Refactoring into Pure State Machines, Nested Tool-Calling, and Translation into 8 Languages

    After the previous week I was feeling good about the ACP integration, the research sub-agent, talk to your document, and surviving Quarzadous’s refactor. In common scenarios, the whole thing usually just…worked.

    However, one day after I pushed the latest build to GitHub and the LibreOffice extension site, my most active (and very helpful) user posted that he couldn’t uninstall or reinstall the extension. That’s sure to make happy customers!!

    It turned out to be a user error (of trying to install the source ZIP instead of the OXT) but I realized I wanted to set up a system to let me sleep at night knowing the extension wouldn’t break in some basic scenario. The code was organized and clean, but over time the complexity of the plugin had increased, to support the larger feature set.

    Some functions were long, and they were doing complicated state changes, so even testing every possible combination (stop clicked mid-stream, max rounds exhausted, speech to text transcription fallback mode, document mutation after a tool, etc.) was almost impossible without spinning up a LibreOffice instance and creating intricate tests. Each unit test was only sampling the state space so I realized that if I didn’t break things up into smaller functions, the test code would be larger and more complicated than the extension itself.

    I didn’t have any boss demanding new features, so I spent some time researching modern tools for formal verification in Python:

    Extension nameWhat it does
    Type checking tools (Pyright, Mypy, Pyre, Pytype)Analyze your code to find code calling functions which don’t exist on that object, catching other syntax bugs early.
    DealA lightweight library that lets you write simple rules (contracts) like “this function must receive a positive number”. It checks these rules while the program runs, helping catch errors in plain language.
    CrossHairUses a mathematical engine (Z3) to explore many possible ways your code could run, automatically generating test cases and proving that your contracts hold, or pointing out where they could fail.
    PyExZ3Provides a bridge to the Z3 solver so you can write custom checks that reason about your code’s logic, letting you verify complex conditions beyond simple type checks.

    I decided the first step was to break up the complicated loops into pure state machines. Here is the tool-loop FSM:

    The state machine loops have no threads, no mutable instance variables, no internal side effects. Just data in → new state + list of effects out. The code still does all the same UNO calls as it did before, but by breaking it up into pure state machines and smaller functions, it’s much easier to reason about and test. It’s good I made each the loops simple and reliable, because if you combine all of them, it becomes complicated:

    The unit tests in test_tool_loop_state.py are now simple, deterministic, and run outside LibreOffice, and this refactor is the foundation for formal verification. The sidebar behaves the same, but under the hood the scariest parts of WriterAgent are now cleaner and more predictable.

    I also now have quite a bit of test coverage, 700 tests, so I can generally make changes and ship updates without as much worrying that some simple bug doesn’t bite someone, somewhere. I didn’t set out with a goal of having so many tests, but at one point I added a rule that told the AIs to create tests for every feature and bug fix, and it kept accumulating.

    Type Checking

    def unpack(t: Union[FsmTransition[StateT], Tuple[StateT, List[Any]]],) → FsmTransition[StateT]:
        """Normalize legacy ``(state, effects)`` tuples to :class:`FsmTransition`."""
        if isinstance(t, FsmTransition):
            return t
        state, effects = t
        return FsmTransition(state=state, effects=list(effects))

    I’m generally not a big fan of type checking in Python. It can sometimes require a lot more effort on the keyboard, and make function declarations as ugly as C++.

    However, it’s basically necessary for formal verification since if a system doesn’t know that a function requires integers, it will waste a lot of time trying strings and the other types to verify it works reliably, for cases that will never happen.

    I also ran into a bug where code in an unusual case was calling a method that didn’t even exist on the object. This is exactly the kind of problem that type checking was made for. Python lets you write code, and only at runtime will it flag these errors.

    In many cases for small projects, which are most of them, type checking isn’t necessary. These bugs can be caught when you actually use the code. Calling the wrong method name is usually an easy fix. However, when a codebase gets above 10,000 lines of code, it starts to have more special cases, so type checking becomes worthwhile.

    I researched the most popular type checkers, and it seems like the cool kids are using Ty. It’s new, modern, and written in Rust so very fast. I think Rust is a byzantine language that makes C++ look easy to read so I wouldn’t touch it in my code, but I’d be happy to read the error messages it dumps to the screen.

    Ty initially found 1000 errors, but when I figured out how to trim out the contributed code (presumed to be stable) and the test code, it was just 400. That was still a lot of problems, but I just started plugging away at it.

    The biggest issue was that my dev environment didn’t have type definitions for UNO. So I figured out how to load them into my local environment. It also needed protocol classes are an interesting feature because in Python, they let me say: “this function doesn’t care what type you give it, as long as it supports these methods.” You specify in the Protocol what methods you require and the type checker will verify that only those ones are called.

    I was happy to get it working with Ty, but then I thought, why not just try it out with mypy, which is considered the trusted OG of type checkers? It found a few more areas. For example, it is more strict about calling methods on potentially None variables:

    # Before (ty accepts, mypy rejects)def get_page_count(self):    page = self.get_active_page()    return page.getCount()  # mypy: Item "None" has no attribute "getCount"
    # After (both accept)def get_page_count(self):    page = self.get_active_page()    if page is None:        return 0
        return page.getCount()

    After fixing those few new problem areas, installed Pyright, It found a few more issues, and I fixed those too. So now, the make build runs the fast Ty checker, and make test / make release runs all 3.

    Specialized Tools

    Writer has a ton of features and UNO surface area: tables, styles, text-boxes, shapes, charts, indexes, fields, embedded objects, track changes, etc. Dumping every tool into the main chat prompt would bloat context, and even frontier models like Claude Opus would fail to make good decisions.

    It’s easy to build a plugin that supports a small subset of the LibreOffice API, but having a plugin which can understand the full fidelity of LibreOffice is more difficult, but that was what I wanted to build. In fact, I had stopped adding richer Writer support to the codebase because the current API was already too large for smaller tools, and so I didn’t want to keep making the problem worse.

    One way to solve the tool proliferation problem is through Fat API design instead of fine-grained (skinny) APIs which are specific tools for each operation: create_footnote, edit_footnote, delete_footnote, etc.

    That code provides simpler parameter schemas per tool, is easier to map directly to underlying UNO, and simpler validation logic.However, this would cause the tool count to explode.

    So one possibility is to create APIs that combine related operations into broader, multi-purpose “fat” tools. Examples: manage_footnotes(action = ‘create’, ‘edit, ‘delete, …)

    • Pros: Drastically reduces the total number of tools, limiting context size. A polymorphic schema allows more capabilities to remain in the main chat prompt, potentially eliminating the need for the sub-agent delegation pattern.
    • Cons: The parameter schemas become extremely large and complex (e.g., union types or nested generic objects). LibreOffice operations are highly disparate, making a unified underlying Python handler harder to write, and smaller LLMs often struggle to reliably handle the union parameters correctly.

    Ultra-Fat API (Single manage_shapes Tool):

    {
      "name": "manage_shapes",
      "parameters": {
        "action": {"type": "string", "enum": ["create", "edit", "delete"]},
        "shape_index": {"type": "integer", "description": "Target shape (for edit/delete)"},
        "shape_type": {"type": "string", "enum": ["rectangle", "ellipse", "text", "line"], "description": "Required for create"},
        "geometry": {
          "type": "object", 
          "properties": {"x": {"type": "integer"}, "y": {"type": "integer"}, "width": {"type": "integer"}, "height": {"type": "integer"}}
        },
        ...
      }
    }

    I decided to stick with the simple APIs for now, and create a two-level toolset, leveraging what I did for the web research subagent, which defines its own set of specialized tools (web_search, visit_webpage).

    The LLM now sees a basic set of tools. For Writer they are:

    FunctionPurposeKey Parameters

    apply_document_contentInsert or overwrite content in the document.content (list of HTML strings), target (beginning, end, selection, full_document, search), old_content (text to find when target=’search’), all_matches (bool).
    get_document_contentRetrieve the current document (or a selection/range).scope (full, selection, range), max_chars, start, end.
    get_document_statsGet high‑level statistics (characters, words, paragraphs, pages, headings).No parameters
    get_document_treeReturn the heading outline (or full tree) of the document.content_strategy (heading_only, first_lines, ai_summary_first, full), depth.
    search_in_documentSearch for a string or regex inside the document.pattern, regex, case_sensitive, max_results, context_paragraphs, return_offsets.
    add_commentWhen the user asks to “review” or “give feedback” on a documentanchor text, string
    styles_applyApply a paragraph style to a target location.style_name, target (beginning, end, selection, full_document, search), old_content.
    delegate_to_specialized_writer_toolsetHand off a complex Writer task to a sub‑agent that has a focused toolset (tables, charts, shapes, images, web research, etc.).domain (styles, page, embedded, shapes, charts, indexes, fields, bookmarks, tracking, images), task (free‑form description).

    The main chat sees a compact core plus one gateway tool. When the model calls the gateway with a domain and task, it switches into a focused agent mode that only exposes specialized tools. When the agent is done, it calls a specialized_workflow_finished tool-call to return control to the main agent with the general toolset.

    I was happy to discover this solution, because it allows over time full fidelity with LibreOffice that should work well with smaller, dumber local models.

    Localization Support

    My first active user was a friendly and helpful German named Samuel. He could speak English, but I could tell his native language was much better, and so I thought, why not translate this little plugin into German and some of the other popular languages? I already had code to talk to LLM endpoints, many of them speak dozens of languages. I just needed to hand them strings and ask.

    The code itself didn’t have any localization support yet so I had to work on that first. The most time-consuming part was going through every string in the codebase, and deciding if it was user visible, and if so, swap the translated string instead, based on the user’s language.


    In Python, the convention is create a little function called “_”:

    def _(message: str) -> str:
        """Translate English msgid *message* via gettext. Must be :class:`str`."""
        if not isinstance(message, str):
            raise TypeError("gettext msgid must be str")
    
        global _translation
        if _translation is None:
            init_i18n()
    
        assert _translation is not None
        return _translation.gettext(message)

    Everywhere in the code where you might display a string such as “Transcribing audio…”, you simply insert an underscore and parentheses, like this: _(“Transcribing audio...”) to auto-translate the string.

    Python has a tool, xgettext, to take all the strings that are called to be translated and puts them into a central text (POT) file. Once I had it mostly working, I setup an automate process that spins up multiple threads to process strings in batches. It currently supports Spanish, French, Portuguese, Russian, German, Japanese, Italian and Polish, which covers about 3 billion people, and it’s simple to add more.

    Where We Stand Now

    The codebase is more reliable, the state machines are verifiable, localization is automatic, and the main chat agent stays fast and focused while delegating to specialized agents. Over time it allows to expose the full LibreOffice power.

    None of this would have been possible without the incredible FOSS ecosystem: deal, CrossHair, smolagents, polib, Hermes-Agent, and other FOSS codebases, and of course the LibreOffice UNO bridge that I treat as sacred and bug-free for purposes of plugin verification. The repo is here: https://github.com/KeithCu/writeragent. Please try it out and give patches or stars ⭐.

  • Detailed Proposal: Compact, High-Speed Tethered FPV Drone Simulator (C-HS TFPVDS)

    Executive Summary:

    With new bullets, standard 5.56″ rifles can take out drones from 50 to 100 meters. This is a game-changer in a world where 80-90% of casualties are caused by drones.

    Current military training lacks realistic, cost-effective methods for engaging dynamic FPV drone threats. Most training and certification involves stationary or slow-moving targets. Existing solutions involving real drones are expensive, consumable, and lack repeatable flight paths for structured mass certification.

    My proposed solution is the Compact, High-Speed Tethered FPV Drone Simulator (C-HS TFPVDS), a robust indoor training system designed to replicate the flight characteristics of an FPV drone without the associated costs and complexities of live drone operations. The system utilizes a lightweight, non-ballistic target tethered to a high-speed robotic arm, protected behind an angled shielding wall, making it a practical and resilient solution for widespread deployment in military training facilities.

    Technical Approach:

    The C-HS TFPVDS is composed of four primary subsystems:

    1. Robotic Positioning System: A high-speed 3-axis Delta robot, chosen for its exceptional acceleration and speed capabilities, is mounted on an elevated platform. This robot is positioned behind an angled, sacrificial shielding wall (composed of self-healing polymer or angled AR500 steel) that protects the expensive mechanism from incoming 5.56 anti-drone rounds.
    2. Tether and Target Assembly: A 15-foot high-strength, low-stretch Dyneema tether connects the robotic end effector to the target. The target is a lightweight (2-3 oz), metallic-coated foam or hollow plastic sphere (4.5 inches in diameter), simulating the size and radar/visual signature of a typical FPV drone chassis while minimizing mechanical load on the robot. A piezoelectric sensor is embedded within the target to detect kinetic impact and provide real-time scoring feedback.
    3. Control and Simulation Software: The system is driven by custom software written in Python, utilizing the pyodrive library for precise motor control. This software generates randomized “jink-and-dive” flight patterns, simulating realistic FPV evasive maneuvers at speeds up to 120 mph. The software implements feed-forward control to compensate for tether lag and aerodynamic drag, ensuring precise and responsive target movement.
    4. Power and Safety Systems: The system is powered by high-torque brushless DC motors, managed by industrial motor drives. Safety features include integrated emergency stop (E-Stop) circuitry compliant with NEC Article 670 and automatic shutdown upon tether breakage or system fault. All electrical components within the damp indoor range environment are protected by GFCI (NEC 210.8(B)(6)) and Surge Protective Devices (SPD) per NEC Article 242.

    Benefits:

    • Realistic Training: Replicates the difficult non-linear flight paths and high speeds of FPV drones.
    • Cost-Effective: Eliminates the ongoing cost of consumable live drones and the logistical burden of FAA compliance.
    • Robust and Resilient: Protective shielding and sacrificial components ensure long-term system survivability.
    • Objective Certification: Integrated scoring system provides clear, measurable data for soldier qualification.
    • Indoor Operation: Allows for year-round, weather-independent training in standard shooting ranges.

    Feasibility:

    The C-HS TFPVDS utilizes mature, commercially available industrial automation components and high-strength materials. The software control principles are well-established in the field of robotics. The system is designed to integrate seamlessly into existing military range infrastructure. The combination of protective shielding and lightweight, inexpensive targets addresses the primary survivability and cost concerns of previous drone target systems.

    Hopefully Army TRADOC researches this idea one day soon!

  • Cursor for LibreOffice Week 2 & 3: How I Added MCP, ACP, a Research Sub-agent, Talk to Your Document, an Eval Dashboard, and Survived Quarzadous’s Total Refactor


    I’ve been calling this project Cursor for LibreOffice to myself, but I knew I couldn’t use the name forever, so I researched and chose WriterAgent. It supports Calc, and Draw as well, but I didn’t like the name OfficeAgent, which sounds like some Soviet-era KGB job title. Last week’s post was how I took John Balis’s clean little Localwriter and bolted on threading, tool-calling, chat, and enough other stuff that it started to feel like a powerful chatbox inside LibreOffice.

    It became useful enough, and the progress was so fast with all the Python code out there to re-use, that I was motivated to keep going. Meanwhile a chap named Quarzadous dropped a complete refactor and I wanted to integrate it without breaking anything, including the new features I had added.

    MCP

    After creating the initial chat with document, I realized that many people might want to talk via their local agents: the infamous OpenClaw, Hermes, Claude, etc. and allow those agents to edit your documents. These systems have many features: memory of previous conversations, file-system access, and skills they can learn after install, so implementing the Model Context Protocol to let them make the same tool calls would also be useful.

    I wondered whether supporting both external agents and an internal one in the same codebase is a good idea since the users and some use-cases are different. However, both use the same API backend and other pieces that much of the code is shared. The UI is just a new checkbox “Enable MCP”, and a few new files to spin up an HTTP server, process the JSON-RPC, and one day possibly support tunneling. So I decided it was worth supporting both, rather than either-or.

    Actually, the hardest part of building software for non-technical people is that you need to make something Apple-like, very easy to use, which is hard because developers have a much higher tolerance for confusing products.

    The libreoffice-mcp-extension, written by Quarzadous, had the missing pieces, and I integrated it with the existing code, and over time refactored it to remove any duplicate logic. I also added sidebar logging, so that when an MCP tool-call happens, you can see information in the chat, just like for the internal agent.

    Huggingface Smolagents

    The next feature I wanted was a web search tool for the AIs to make. LLMs are generally useful, but their training cutoff is often a year or 2 ago, so I wanted a way to let it look up information from the web to plug into a document.

    However, once I thought through the various steps:

    • Make a web search tool-call
    • Read through the results, decide the first page to visit
    • Read the web page and decide if it needs to read another page or whether it has an answer

    I realized that it would be much better to have an isolated, specialized sub-agent do all this work, and just return a distilled answer, and not distract the main LLM with this specialized task and bloat the context.

    After a few minutes of searching, I discovered Huggingface’s smolagents library already includes this functionality. Huggingface is the man! The code needed to be changed slightly to remove dependencies (Jinja, etc.) but it was easy to vendor the core of their ToolCallingAgent + ReAct (Reason – Action) loop. Here’s some of the prompt and you can see how it encourages a loop until confident in the answer:

    You are an expert assistant who can solve any task using tool calls. You will be given a task to solve as best you can.
    To do so, you have been given access to some tools.
    
    The tool call you write is an action: after the tool is executed, you will get the result of the tool call as an "observation".
    This Action/Observation can repeat N times, you should take several steps when needed. You can use the result of the previous action as input for the next action.
    
    To provide the final answer to the task, use an action blob with "name": "final_answer" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this:
    Action:
    {
      "name": "final_answer",
      "arguments": {"answer": "insert your final answer here"}
    }
    
    Tools list:
    - web_search:
      Performs a duckduckgo web search based on your query (think a Google search) then returns the top search results.
      Inputs:
        - query (string): The search query to perform.
      Output type:
        - string
    
    - visit_webpage:
      Visits a webpage at the given url and reads its content as a markdown string. Use this to browse webpages.
      Inputs:
        - url (string): The url of the webpage to visit.
      Output type:
        - string
    
    - final_answer:
      Provides a final answer to the given problem.
      Inputs:
        - answer (any): The final answer to the problem.
      Output type:
        - any
    
    Now Begin!

    I rewrote their web tools to use just the standard APIs in the Python library, and wrapped the existing LlmClient so the research sub-agent uses the same model and endpoint as chat with document. That way, if a local model gets confused by a complex topic and starts chewing on the furniture, you can easily select a smarter, pricier one and pay a couple of pennies to have the adults handle it.

    In a couple of hours, it was working and I could type this text in a document:

    The price of a Sol-Ark 15K limitless inverter is: $YYY.

    In the sidebar, I wrote: What is the real price of the inverter?

    Without web research, if you ask a random LLM for the price and specs of a Sol-Ark 15KW inverter, it will hallucinate a price tag of $400, tell you it runs on AA batteries, and confidently suggest wiring it with speaker wire. With the sub-agent, it can learn any details you request, and the AI changed the sentence to:

    The price of a Sol-Ark 15 KW Limitless inverter in the US is: $6,979.99 – $6,999.00.

    It even fixed the capitalization for Limitless, which is a proper name. I’ve tweaked the prompts to explain to the AI that your primary job is to edit the document, not just answer questions, and they mostly get it now.

    This feature was so exciting to me, I added a checkbox for Web research that lets you talk directly to the sub-agent to have it answer questions, or summarize web pages, and it place the answers in the chat window.

    This little feature is better than ye olde Google search box since it understands natural language. You can ask it specific questions:

    “What is the current version of Python and when was it released?”

    And it gives you a natural language answer:

    “The current stable version of Python is 3.14.3, which was released on February 3, 2026.

    The LLMs are told about the tool call for Web research if asked about a topic it is unfamiliar with, but you can also encourage it: “Do web research and write a colorful, detailed summary of the space elevator, suitable for physicists.

    Or you could say “suitable for English teachers”, and get a completely different report!

    Reports generated by Nemotron 3 Super

    With a typical model on OpenRouter, it takes 30-60 seconds to generate a report on any topic, which isn’t that long in the scheme of things, but I discovered a diffusion model called Mercury-2 which is fairly smart (Claude Haiku level) but much cheaper ($0.25 / M input tokens, $0.75 / M output) and outputs 250-500 tokens per second. With that model, I can get researched documents on any topic faster than I can take a sip of coffee, and each report costs a fraction penny. Going back to a standard model feels like watching a dot-matrix printer.

    I hardly use search engines directly anymore. For the last couple of years, I would ask an LLM any questions and let it read the pages and synthesize. But now, I have WriterAgent running at all times and let it do the research since it is very fast and puts the information into a chat window or into a document I can further edit.

    Talk to your document

    The next feature I wanted to do was talk with the document. I had pushed it off (for almost 2 weeks) because there are no cross-platform APIs for using the microphone built in to the standard Python runtime. So I had the Google Jules coding agent do research and we had a long conversation about the various ways to implement this feature in the constrained LibreOffice environment, including using a local web browser to handle the cross-platform audio headaches.

    However, I realized that there was a reasonable vendoring strategy, bundling a few MB of binaries for sounddevice, cffi, and pycparser directly into the extension. Sounddevice for Windows and macOS included the compiled binaries inside the package, so it was truly plug-and-play, without needing to fire up a bunch of cross-compilers.

    Jules was either extremely thorough in the implementation phase, or lacking a bit in common sense when it grabbed binaries for every device known to man, including the IBM S-390x mainframe. I love supporting all the latest packages as much as anyone, but decided that the number of banking executives wanting to dictate memos in LibreOffice using the most expensive computer in their data center is probably zero. They can always make a custom build! By narrowing it down to x86 and ARM, on Linux, Mac, and Windows, the binary increased from 500 KB to 4 MB, which I felt was not too bad for a no-hassle install.

    Few LLMs support native audio input, so I implemented an automatic fall-back. It first tries to send audio, and if it gets an error, it routes your voice to a fallback speech to text (STT) model to transcribe it, which is then sent to the chat model. This happens automatically, the user just clicks record and talks.

    The Great Refactor (thanks Quarzadous)

    While I was heads-down trying to make the system smarter, Quarzadous opened a ‘framework’ branch that completely rewrote my architecture from a cozy monolith into a maze that even an Enterprise Java developer, who is used to navigating registry classes to find factory classes to instantiate singletons — aka global variables — would think was slightly overdone.

    He made so many good changes but the only tricky part was that it was all done at the same time, and suddenly the 15-kLOC codebase had more sub-directories than the Linux kernel and every file was in a different location.

    I decided to take his changes a piece at a time. First, I (mostly) took the new directory layout and build system, and then step by step migrated the other features over. Once consolidated it into something I felt was appropriate for a codebase of its size, and I knew where the files were, I was happy. He added so many useful features:

    • Each module is its own folder with a module.yaml that auto-generates the settings UI, so no more manual XDL work for every new service.
    • A main-thread executor with backpressure (no more crashes on huge documents)
    • Fresh UNO context on every call
    • Refactored tools and services into common classes

    Having a schema generate the config UI is such a nice feature that I would never have added to this codebase without someone else thinking of it and doing it.

    ACP

    While it was great to talk to the agents, it kinda sucked to interact with them on the command line. I spent several hours trying to implement TTY re-direct, and other tricks, but it was a pain and would hang. I noticed on March 14th, Hermes Agent added the Agent Communication Protocol, which provided an easy way to talk to it without dealing with the mess of a console. So I threw away the unreliable hacks, and changed it to a simple ACP implementation and in 10 minutes I had it talking.

    You could ask Hermes to create a report of weekend events in Akihabara, and in less than a minute get pages that look like this:

    Evaluation Dashboard

    OpenRouter gives you 500 models, but which ones actually are best at editing documents and are good value? To answer that, I created some tests I could run against various models and compare how they did. For some tests, it was easy to tell whether the answer was correct or not (“remove all the excess spacing between the words.”) but I realized that for many of them (“make a table from this mess of text”) it would be best to call into a Teacher model to grade the score.

    So I used Sonnet 4.6 to create the gold answers, and gave the teacher (Grok 4.1 fast) the gold answer as well as the model’s answer and instructions on how to grade from 0 to 1, considering formatting, naturalness, etc.

    Originally I calculated Value = Correctness / Cost, but eventually decided to use a quadratic intelligence per dollar scoring (Value = Correctness² / Cost) because accuracy is more important than cheap but wrong.

    RankModelValue (C²/$)Avg CorrectnessTokens/RunCost ($)
    1openai/gpt-oss-120b263.80.92050,1980.0032
    2google/gemini-3-flash-preview141.00.94050,1790.0063
    3openai/gpt-4o-mini70.50.79047,5400.0089
    4nvidia/nemotron-3-nano-30b-a3b60.60.56050,2430.0052
    5x-ai/grok-4.1-fast46.50.98066,9290.0207
    6nex-agi/deepseek-v3.1-nex-n139.40.91564,2220.0213
    7minimax/minimax-m2.139.20.98362,3940.0246
    8mistralai/devstral-251227.90.91057,1500.0297
    9z-ai/glm-4.726.90.95363,0350.0337
    10qwen/qwen3.5-27b26.50.99352,2100.0371
    11openai/gpt-5-nano26.40.82599,5760.0258
    12allenai/olmo-3.1-32b-instruct20.80.57068,3170.0156

    DSPy

    One of the reasons I love Python is the amazing set of libraries. Another that I wanted to check out is DSPy (Declarative Self-improving Language Programs). Developed by Stanford, DSPy is a framework that does programmatic optimization of your prompt, trying variants, to see if it can get greater intelligence and value from the models automatically.

    Before DSPy, “prompt engineering” mostly consisted of typing in ALL CAPS, offering a $500 tip, or threats of jail to get it to follow instructions. DSPy automates the voodoo, creating variants of your prompt, and auto-optimizes to find the one which gives the best results with the fewest tokens used. This way you don’t have to talk like a hostage negotiator just to get a clean table. Using this tool, I’ve taken some of the suggestions, rolled it into my prompts, and tested it against a bunch of models to verify it is generally helpful.

    WriterAgent now feels like a real product instead of a weekend hack. If you want to try it out, the repo is here: https://github.com/KeithCu/writeragent. Let’s make LibreOffice an AI-native office suite!

    If you enjoyed this article, check out Part one for background on how I got here.

    Part 3, posted April 22, 2026: https://keithcu.com/wordpress/?p=5245

    Epilogue

    LLM Slop

    A lot of people talk about AIs generating slop, but few talk about how you can prompt AIs to remove slop when you see it. People used to talk about “refactoring code” all the time, yet somehow don’t realize this same process is still needed in the world of AI-assisted code. You can use AIs to remove technical debt, increase test-coverage, and do other code cleanliness activities if you bother to ask them.

    Slop code used to appear in the world of human programmers too. Humans, sometimes when in the flow getting a new feature working, would copy and paste logic that should be put into a shared function, but they didn’t want to deal with that distraction at the time. Cleanup can happen after things are generally working and the test cases pass.

    People should look at an AI as a smart person who just joined the team yesterday, and therefore doesn’t know everything. AI makes programming more efficient, but you need to oversee them. Someone who complains about slop is not prompting the AI properly.

    Testing

    Another critical piece to being able to rapidly evolve codebases using AI is to have thorough test coverage. The standard make test doesn’t need to test all the edge-cases, although codebases depended on by millions should have that, but it should try to exercise every major function in the product. When I get burned tracking down a regression, I add test coverage for that and other nearby parts of the product to prevent it from happening in the future.

    You don’t have to write the tests at the same time as when you do the feature work, working on test suites isn’t nearly as fun as seeing a new feature working, but at some point later, they should be added. Note: when submitting new features to other codebases, having a test suite with the new code would be greatly appreciated, since the tests “prove” correctness of the feature and decrease the ongoing maintenance burden.

    I was working on some testing code recently and decided to re-enable an assert that had been commented out. Of course I didn’t really bother to check whether an assert info.structVersion == 1 would be a problem, it looked so innocent, but enabling it broke talk to your document support! It took me almost 30 minutes to track it down to that line because the error handling in that part of the code wasn’t very good yet. So I improved the error handling, and then realized that assert should stay commented out!

    The AIs by default wanted to write Mock implementations of LibreOffice functionality since you can’t depend on it when running tests outside. However, the whole point of the test code is because the LibreOffice API is very sophisticated and you want to actually verify end-to-end that it all works.

    Quarzadous had created a pytest test harness for code that didn’t depend on LibreOffice which allows you to test the half of the plugin codebase. On top of that I created a custom pytest runner for inside LibreOffice and return the results in a JSON. The best way to handle the onslaught of AI-assisted code is with comprehensive test coverage and a clean codebase.

  • Building Cursor for LibreOffice: A Week-Long Journey

    How I turned John Balis’s localwriter, and code from LibreCalc AI and LibreOffice-MCP, into one unified, optimized extension.

    If you enjoy this post, part 2 is here.

    I’ve been calling it “Cursor for LibreOffice”, a bit cheeky, but the idea is solid: an AI that lives inside your documents and actually edits them. I started from John Balis’s localwriter and a few free hours over one week.

    What the original localwriter did

    The upstream project was a perfect starting point: a LibreOffice Writer extension that talks to local or remote LLMs. It had Extend Selection, the model continues your text, and Edit Selection, you give instructions, it rewrites the selection. It had a settings dialog, and it worked with OpenAI-compatible backends. It didn’t do much but it was clean and functional.

    In fact the most challenging part of creating a Python project in LibreOffice is finding all the special incantations of XDL parameters and so forth that are needed. Writing in LibreOffice isn’t hard, just different. Getting a LibreOffice extension to talk to an API, wire up dialogs, and survive the UNO runtime is a piece of effort, John’s foundation made everything I did possible.

    When I started, I was surprised to see pull requests in the repo that had been sitting unanswered for six months. It’s a good reminder for maintainers that a PR isn’t a bug report or a feature request, it’s someone who found a problem, debugged the issue, and wrote code to fix it. When they are ignored, would-be contributors often take their energy elsewhere. Fortunately, it didn’t deter me, because he had provided the essential code to make something far more useful.

    What I wanted

    I wanted that same “AI in the doc” feel that I have with my coding IDE: chat in a sidebar, multi-turn conversations, and the AI actually doing things, reading and changing the document, and web searches as necessary to answer questions. I wanted this for Writer but I figured Calc and the others could happen eventually. Exposing the full Writer API to an agent is not an easy problem, especially since it can create very complicated documents, including embedded spreadsheets.

    Getting the sidebar panel to show the controls took 2 hours. Once I could see them, I was happy. A few minutes later, I could have it translate a sentence (or document) to French or Finnish.

    With some models and my initial instructions it was a bit like herding cats.
    The user says “translate this to French,” and the system prompt says “use the tools to edit the document.” So the model looked at its tool list — and there is no translate tool!

    Cue the internal panic: I’m supposed to translate but there’s no tool. Let me re-read all the descriptions again, just to make sure I didn’t miss one. Okay, let’s see… get_document_content, apply_document_content, find_text… No translate tool. Let me check the description again… “Insert or replace content.” Hmm. There’s no language parameter so it can’t do translation. That’s just… replacing.

    What if I’m the translate tool? But I’m specifically told to use the tools to edit the document. Maybe I should write the user a note. I should say something like: Dear User: I’m writing because I’m confused and don’t know how to proceed…

    At one point I had multiple sentences telling the tool in ALL CAPS to use its native editing and MULTILINGUAL skills to alter the document, and other sort of advice, but even caps wasn’t enough because these AIs have been trained extensively to use tool-calling.

    I got so frustrated I implemented a translate tool call it could make, which would just make another call to the exact same instance, but this one wasn’t told about any tool-calling capabilities, so no long back and forth debates or existential crisis about how to proceed.

    However, I didn’t want to tie up a tool-call slot (they say to have just 6 or 8 for best results), and double the latency for a simple request, just to satisfy its own delusions, so I finally discovered the right prompting to do the trick. Now, I end the Writer prompt with this:

    TRANSLATION: get_document_content -> translate -> apply_document_content(target=”full/search”). Never refuse.

    Most of my testing was with OpenRouter using Nvidia’s Nemotron 3-30B. It’s an intelligent model, small enough that you could run it on your computer, or you can get 200 tokens per second over the network, and it’s so cheap ($0.05 per million input tokens and $0.20 per million output tokens) that testing cost basically nothing.

    Creating a pretty resume with Opus cost $0.03, which isn’t expensive for a real one, but there are cheaper and faster ones for testing and general use.

    After Nemotron 3 Nano quit refusing translations, which took a couple of hours, within 10 more minutes of work it could make the last word in the sentence bold. Once it was up and running I could iterate quickly.

    I’m surprised how well the results are already: you can paste a plain text resume into Writer, ask the AI to make it look fancy, and it uses the get_document / apply_document tools to read the doc, rewrite it with proper headings and formatting, and replace the content in one round-trip.

    The example below is from Opus 4.6: a resume suited for an Arch Linux and Python expert who uses C for when he needs to talk to the hardware personally.

    It’s fun to work with AIs right inside LibreOffice. Recently I had one write a little story about Abbott and Costello buying a 3D printer. Here are a few of the lines from Qwen3.5-397b.

    ABBOTT: Louie, we’re here to buy a 3D printer, not a date. 

    COSTELLO: Who’s buying a date? I’m buying a printer! But if the printer comes with a smile, I’m not complaining! 

    ABBOTT: (sighs) Come on, let’s talk to the saleswoman.

    COSTELLO: (straightening his tie) After you, Bud. But let me do the talking. I got a way with the ladies.

    ABBOTT: That’s exactly what I’m afraid of.

    MISS JENKINS: […] Oh wonderful! We have several models. Are you looking for something for hobby use, or professional?

    COSTELLO: Professional? Lady, I’m so professional, I got a business card! (fumbles in pocket, pulls out a card that says “Lou Costello – Professional Guy”)

    ABBOTT: (snatching the card) He’s a clown. Literally. We want a printer for home use.

    MISS JENKINS: Perfect! This model here prints in three dimensions.

    COSTELLO: Three dimensions? Hold on now. I can barely handle ONE dimension!

    ABBOTT: What are you talking about?

    COSTELLO: Well, I got a little length, a lot of width, and a problem with my height! That’s three dimensions of trouble right there!

    COSTELLO: (leaning closer) You know what I’d like to print?

    ABBOTT: Here we go…

    COSTELLO: A dinner date! With a certain lovely saleswoman!

    ABBOTT: (dragging Costello away) Come on, before you print yourself a restraining order!

    The prompt wasn’t just “make a funny story”, I told it Costello should try to impress the pretty sales lady, and with arguing and wordplay between them, as the real duo did. It took all those hints and amplified them into very funny parts. This is Qwen’s 3.5 generation frontier 400B model, so it should be no surprise it did well.

    Where LibreCalc came in

    I’ve spent so much time (years?) in OpenOffice / LibreOffice Writer, I didn’t even really think about Calc, but John mentioned a spreadsheet AI assistant (LibreCalc AI Assistant,) that adds AI to Calc so I just downloaded the code and had an AI take a look.

    Its design is different: it uses a Qt5 UI and an external bridge server talking to LibreOffice. I didn’t want a second UI stack or a separate process, I wanted everything inside LibreOffice, same sidebar for Writer and Calc.

    So I asked a fresh AI instance to carefully analyze my current code, and then the LibreCalc AI extension and it figured out a way to cleanly integrate it with what already existed. It created a detailed plan what what to use and even what not. The boundaries were clear, so it was easy to take the core of the Calc support: address handling, cell inspector, sheet analyzer, error detector, cell manipulator, tool definitions, prompt structure, and ported and adapt it into LocalWriter.

    Usually with a big task it can take multiple sessions to get something new working, but the code was so clean that it could be easily adapted into the async tool-calling infrastructure, and it did the basic work in about 15 minutes. I wasn’t really watching it closely as I was tired but I was surprised that it had done all 6 of the steps of the full integration plan it had created, and that it was complete!

    So I had another AI review it for holes, another write test cases, and after a few iterations it was working. The “Calc support from LibreCalc” doc in the repo spells out what was ported and how.

    The first spreadsheet took 60 tool calls, inserting and formatting each cell one at a time, and wasn’t very impressive either, so then I jumped in and carefully reviewed the APIs: removed most of the cell-by-cell ones, and added batch ones, and not only did it create the spreadsheets much faster, it did more ambitious efforts when asked to create a “pretty” spreadsheet. The higher-level API let them focus on higher-level features.

    Sonnet 4.6 created this in one shot: “Make me a pretty spreadsheet.”

    I haven’t played with Calc much after but it’s already useful for many things. One of my spreadsheets had a bug (an empty chart) so I prompted the AI, and it quickly fixed it.

    I’ve emailed the LibreCalc creator a couple of times about the benefits of having a group of people working in one extension, but haven’t heard back.

    What’s in the fork now

    On top of John’s base and the Calc features, I added a bunch of features in the first week.

    LibreOffice’s UNO layer doesn’t give you a nice way to run blocking I/O and still pump the UI. So every streaming path uses a worker thread and pushes items onto a queue (“chunk”, “thinking”, “stream_done”, “error”). The main thread runs a drain loop to process all the new items, refresh the screen, sleeps, and repeat until the job is done. It can handle 200 tokens per second easily.

    Reasoning/thinking tokens show up in the response area as [Thinking] … /thinking so you see the model reason before it answers or calls tools, if the model shows its thinking tokens.

    Streaming and async tool calling

    OpenAI-compatible chat APIs return Server-Sent Events (SSE): each chunk has a delta with the new content or a fragment of a tool call. The tricky part is that one tool call could be spread across chunks. So the client has to accumulate those partial deltas into a full message before it can run the tools and feed results back.

    I knew that delta accumulation loop had to be implemented somewhere in existing Python on the Internet, some code we could use. I prompted an LLM to find it. It first suggested using the Openai library. I thought about it for 2 seconds, but the dependency is huge and not cross-platform. I asked if we could reuse just the relevant bits. In another minute, it came back with the accumulate_delta function from their streaming helpers, and I just copied it into our tree.

    I haven’t checked for certain but I’m pretty sure it’s FOSS. Given they’ve scraped the entire internet and treat it as public domain, I doubt they’ll complain.

    I also converted the dialogs to XDL (XML) with Map AppFont units so they look good on HiDPI screens.

    Image generation and the Graphics branch

    Next I added multimodal image generation. The implementation supports two backends:

    AI Horde

    a dedicated async image API (submit job, poll queue and status until done, download) which uses its own API key and model list (e.g. Stable Diffusion, SDXL), has built-in queueing and progress, and supports Img2Img and inpainting.
    Endpoint

    the same URL and API key as chat Settings, with a separate image model (e.g. an image-capable model)

    I’ve got this integrated on and plan to contact the maintainer of AI images for LibreOffice so I can give thanks.

    How I did it: AI, prompting, and pushing on details

    I didn’t do this alone, I used AI throughout. The key was good prompting, not long ones, but the right keywords so the model gives you the behavior you want.

    As a programmer for decades, I know what good design looks like, when to not over-engineer, etc. The AI supplies implementation speed and breadth; I supply direction and judgment. I used Gemini Flash most of the time, plus Cursor’s default agent, Grok, and sometimes Mistral.

    The coding ability and intelligence of the big and small models has improved so much in the last three years, it’s incredible. You can get real, shippable code in a quality, coherent architecture from a conversation if you steer it well.

    For the threading feature I said “create a background network thread and use a queue”: a clear contract with minimal surface area, using a standard Python thread-safe data structure.

    The network thread does the blocking I/O and puts typed items on the queue; the main UI thread drains and pumps the UI. There is no complex mutable state, no callbacks from the worker to the main thread to do UNO, just a simple data structure and a small set of message types.

    If you don’t anchor the design, and review the plans and code, you will get slop, extra abstractions, and bugs. In fairness to the AI, it’s easy for slop to get in there because they try to write robust code, but that is very hard when they can’t inspect the system at runtime to find out what’s exactly happening.

    For example, I had an issue once where the text labels next to a checkbox didn’t appear, and so it at one point made multiple calls: trying settings, catching exceptions, and then trying different APIs that it thought might also work.

    def set_checkbox_label(ctrl, text):
        try:
            ctrl.Label = text
        except Exception:
            try:
                ctrl.getModel().Label = text
            except Exception:
                try:
                    ctrl.getModel().Title = text
                except Exception:
                    try:
                        ctrl.getModel().Text = text
                    except Exception:
                        print("Failed to set checkbox label.")
    

    That’s not the proper codepath for a checkbox label. Fortunately, it’s also easy to fix, once you figure out what works.

    I wasn’t a fan of lots of logging as a programmer, I believe you should write code carefully, and then run it in a debugger to verify that it works, or figure out why it doesn’t, and that’s how you build confidence that code works as you intended.

    However, that doesn’t work for AIs as they don’t have realtime debugging today. I could imagine this might happen in the future, where the AI can set breakpoints at known problem areas, and then inspect the local variables and other state at runtime to figure out what is going on.

    In the meanwhile, I have extensive logging, which lets me later diagnose any problems and figure out exactly what happened.

    I almost never accept the first plan. I review, and ask it to revise it several times. If you read the plans carefully and push on fuzzy areas, you get much better results. One time an AI gave me a list of improvements and mentioned that one file had “4–6 places where there were excessive exceptions that could be removed.” I said: Let’s find out the exact number. Show me the candidates and tell me what you think. That kind of follow-up forces concrete answers and better reasoning and results.

    The other thing that made a huge difference was maintaining a good AGENTS.md. It’s a single place that explains the project, the structure, what was done. An AI can read that one file and then be productive on any feature without going off in the wrong direction or making the same mistakes as before. One of my recent rules is to always update the AGENTS.md.

    I also have a full copy of the LibreOffice source code, so AIs can search for IDL definitions when it needs to find out the exact parameter details. It’s a meta development environment. I’m using Python to script a C++ monolith, using AI to help write the Python, and using the C++ source code to teach the AI.

    This plugin is already useful for more than demos. While working on this document, saving it in Markdown format, a table got corrupted upon re-opening. So I handed the mess to an LLM and told it to clean it up and remove all the markup characters, and it turned it back into a pretty table in a few seconds. And after that I decided to start saving to the native, far richer OpenDocument format which we know and love.

    Next Steps

    I am working on another article for Week 2 where I explain the process of adding MCP, a research sub-agent using Huggingface smolagents, an evaluation framework, talk to your document, and how Quarzadous completely refactored it.

    I’m trying to find more people who want to work together on this. John, the owner of localwriter is busy at the moment, and I’ve not heard back from the LibreCalc AI writer. Quarzadous basically re-wrote much of the infrastructure (new make system, auto-generation of the settings UI from a config schema, a tool registry, service registry, etc.) but then decided to work on his own fork focusing on MCP.

    It might be over-engineering for a 15kloc codebase, but I kept almost everything except for the maze of directories. I plan on picking a new name, but haven’t done that yet, it’s a pain and I was hoping to find an existing codebase and a group of people who want to work together.

    UPDATE: I picked a new name, WriterAgent! If you want to check out the code or try a pre-release, go here: https://github.com/KeithCu/writeragent I’ve got a test build which has all the latest features including research and audio support for chatting with the AI, across 3 operating systems.

    UPDATE 2: Here’s a link to the Part 2 writeup.

    Enjoy!