Local LLM Workflows | Tech Perspective

With recent improvements in consumer GPUs, tooling, and open-weight models, running large language models locally has become not only feasible but genuinely useful. I set up my PC as a local AI workstation and tested several real-world LLM-related use cases, focusing on productivity, automation, multimodal generation, and developer workflows.

This post is a high-level digest of what I tried: the applications, models, use cases, and required software. Each topic will be expanded into a separate, detailed post covering installation, configuration, and concrete generation examples.

Base Environment Overview

Before diving into individual applications, here is the PC specs and foundational software stack used across all experiments.

Part Type	Product Name	Manufacturer	Main Specifications
CPU	Core Ultra 7 265K	Intel	Arrow Lake-S architecture, unlocked processor, high-performance desktop CPU
CPU Cooler	Peerless Assassin 120 Black	Thermalright	Dual-tower air cooler, 120 mm fan, 6 Heat Tubes
Motherboard	PRO Z890-S WIFI	MSI	Intel Z890 chipset, LGA1851 socket, Wi-Fi, Intel 200S Boost support
Memory	CP2K32G60C40U5W	Corsair	64 GB (32 GB ×2) DDR5, 6000 MT/s, CL40, support Intel XMP 3.0 and AMD EXPO
Storage	CT2000T500SSD8JP	Crucial	2 TB NVMe SSD, PCIe Gen4, high-speed M.2 storage
Graphics Card	GeForce RTX 5060 Ti 16G VENTUS 2X OC PLUS	MSI	NVIDIA GeForce RTX 5060 Ti, 16 GB GDDR7, factory overclocked, dual-fan design
PC Case	North Charcoal Black TG Dark	Fractal Design	Mid-tower case, tempered glass side panel, airflow-focused design
Power Supply	AG-650M-JP *1	Apexgaming	650 W, 80 PLUS Gold certified, fully-modular PSU

System Notes / Conditions

Graphics Card Power Headroom: A 650W power supply is sufficient for standard operation, but when placing high loads on both the CPU and GPU simultaneously, stability may be reduced compared to a higher-wattage PSU configuration (a 750W or higher PSU is planned for future replacement).

Installed and Required Software

Operating System & Platform: Windows 11 (host OS), WSL2, Ubuntu 24.04.3 LTS (WSL distribution).
Core Tooling: PowerShell 7.5.4, Docker Desktop (WSL2 backend enabled), Git (Git GUI on Windows, Git CLI inside WSL), Visual Studio Code.
Python & Runtime: Python (64-bit, Windows), Python (Ubuntu, via WSL), uv Python environment manager (used for ComfyUI and related tools).
GPU drivers: NVIDIA GPU, NVIDIA CUDA driver for WSL, NVIDIA Container Toolkit (required for GPU passthrough in Docker containers).

1. LM Studio – Core Local LLM Runtime

Primary Role: Local inference engine and model manager for LLMs and multimodal models.

Models Tested

gpt-oss-20b: General-purpose reasoning and text generation.
gemma-3-12b: Efficient instruction-following model.
qwen3-vl-8b: Multimodal (vision-language) use cases.
glm-4.6v-flash: Fast inference for lightweight tasks.

Key Features & Use Cases

OpenAI-compatible local API and GPU acceleration.
MCP-based web search integration.
Local chat and reasoning without cloud dependency.
Acting as a backend LLM provider for other tools (n8n, VS Code).
Multimodal experimentation (image + text prompts) and Web-augmented responses.

LM Studio provides API endpoints to multiple downstream applications, enabling easy and stable integration of LLMs into workflows. It also supports MCP (Model Context Protocol)-based web search integration, allowing for advanced inference using online information even in a local environment. Since users can approve each internet search request individually, this maintains the advantage of local LLMs in controlling data leakage.

2. Kokoro-FastAPI – Local Text-to-Speech

Primary Role: High-quality local text-to-speech generation.

Stack

Kokoro-82M TTS model
Kokoro-FastAPI (Dockerized FastAPI wrapper for Kokoro-82M)

Features & Use Cases

Web UI-based text-to-speech generation.
REST API-based TTS requests.
Generating narration audio for generated text and serving as a TTS service for n8n pipelines.

This setup allowed fully local TTS without reliance on cloud APIs, with acceptable latency and consistent audio quality.

3. n8n on WSL (Dockerized Automation)

Primary Role: Workflow automation and orchestration.

Integrations

LM Studio (OpenAI-compatible local LLM endpoint)
PostgreSQL (persistent workflow and data storage)
Kokoro-FastAPI (TTS generation)

Use Cases

Automated text generation pipelines and LLM-driven decision logic.
Generating text → converting to speech → storing or distributing outputs.
Prototyping agent-like workflows without custom code.

I found that the publicly available Docker Compose YAML for n8n can be deployed smoothly and reliably in a WSL environment with little to no modification. I also verified that—using the same node configuration—workflows can be tested by substituting external APIs with a local LLM endpoint, which proved especially useful for prototyping purposes.

4. Visual Studio Code – AI-Assisted Development

Primary Role: Local-AI-powered code editor.

Features & Integration

Connected to LM Studio via OpenAI-compatible API (No external cloud AI dependency).
AI code completion, code explanation, and refactoring.
Prompt-based code generation.

Use Cases

Writing and refactoring scripts.
Reviewing configuration files.
Experimenting with prompt-driven development using local models.

This setup demonstrated that a fully local AI coding environment is achievable for many everyday development tasks. At the same time, it became clear that 16GB of VRAM is insufficient for utilizing advanced LLM models, and that even simple coding tasks require more VRAM due to the importance of context length. Considering GPU performance and pricing as of December 2025, relying on external services for AI coding is likely the most practical approach in most cases.

5. ComfyUI – Image and Video Generation

There are three ways to install ComfyUI in your local environment: 'Desktop Application', 'Windows Portable Package', or 'Manual Installation'. Prioritising flexibility this time, I opted for 'Manual Installation' and tested several workflows. For a more casual approach, I recommend the 'Windows Portable Package', which can be launched immediately after downloading.

Environment

Installed in a uv Python environment.
GPU acceleration enabled.

Models and Workflows Tested

Z-Image-Turbo: Text to Image workflow
Qwen Image Edit: Image Edit workflow
Wan 2.2 Image-to-Video GGUF models: Image to Video workflow

Use Cases

High-quality image generation workflows.
Image-to-video generation.
Experimenting with modular, node-based AI pipelines.

ComfyUI's flexibility makes it ideal for experimentation, but careful management is required for environment control and dependency management (such as Python versions and CUDA). Additionally, GPU memory limitations become particularly noticeable during video generation.

Overall Observations

Thanks to the GGUF format, even an RTX 5060 Ti 16GB can support a wide range of local LLM use cases.
In AI coding as well as image and video generation, GPU memory is often the primary limiting factor.
A single local LLM runtime (LM Studio) can reliably host and operate multiple applications.
By combining Docker and WSL2, multiple AI services can be built and run stably without having to worry about dependency management.
Local TTS and automated workflows have reached a level where they are viable for building practical prototypes.

Exploring Local LLM Workflows

On this page:

Base Environment Overview

System Notes / Conditions

Installed and Required Software

1. LM Studio – Core Local LLM Runtime

Models Tested

Key Features & Use Cases

2. Kokoro-FastAPI – Local Text-to-Speech

Stack

Features & Use Cases

3. n8n on WSL (Dockerized Automation)

Integrations

Use Cases

4. Visual Studio Code – AI-Assisted Development

Features & Integration

Use Cases

5. ComfyUI – Image and Video Generation

Environment

Models and Workflows Tested

Use Cases

Overall Observations