Need help getting started with image generation

Im totally new to all this AI stuff, i’ll get straight to the point.
I want to generate images locally offline on my personal desktop, i’ve got an amd gpu.
So i tried out stablediffusion on a website and was stunned by how good the results where.
I soon realized to generate more detailed images i might want to use a LLM to enhance prompts.
So these are the two things i want to get running locally, a model for image generation so that would be something like stablediffusion correct me if im wrong, and a large language model to enhance image prompts. I have not tried out any of the many AI models so far since i avoided all the hype.
Trying to say, if you can recommend me which models suit best for my purpose that would be helpful.
Also, i preffer running only opensource AI.
I can’t code any programming language, thus a more simple setup or atleast a guide to follow step by step would be very welcome. I tried getting stabblediffusion running but i failed, powershell on windows10 kept throwing errors i tried to solve but couldn’t succeed.

1 Like

When using open-source generative AI models, there are still some limitations with AMD GPUs. While things have improved significantly on Linux and Windows 11 + WSL2 environments today, options remain quite limited on Windows 10


What you’re setting up (two separate local apps)

  • Image generation: Stable Diffusion 1.5 “weights” + a GUI that runs locally (you open it in your browser at 127.0.0.1).
  • Prompt enhancement: a small local text model that turns “an idea” into POSITIVE / NEGATIVE / SETTINGS you copy/paste into the image GUI.

Keeping them separate is the simplest “offline + no-coding” workflow.


The most realistic Windows 10 + AMD path (no WSL2)

Best first-success route

SD.Next + ONNX Runtime + DirectML (DmlExecutionProvider)
SD.Next explicitly supports ONNX Runtime and notes you can select DmlExecutionProvider by installing onnxruntime-directml, and that DirectX 12 is required. (GitHub)

Alternatives (only if you want them later)

  • AUTOMATIC1111 + Microsoft DirectML extension: uses ONNX Runtime + DirectML, but requires models optimized via Olive (more moving parts). (GitHub)
    AMD’s own guide for that extension calls it “preview” and (in that guide) states only SD 1.5 is supported. (AMD)
  • A1111 main repo on Windows+AMD: not officially supported; their wiki points to DirectML-focused forks/approaches instead. (GitHub)
  • SD.Next + ZLUDA: can be a speed/compatibility upgrade on some AMD cards, but it’s an “after you already work” option. SD.Next documents launching it with --use-zluda and notes HIP SDK version constraints. (GitHub)

Step-by-step: SD 1.5 image generation with SD.Next (Windows 10 + AMD)

0) Put it in an easy folder

Use something like:

  • C:\AI\sdnext\

Avoid OneDrive/Desktop/Program Files. (This prevents many permissions/path problems.)

1) Install the basics (one-time)

  • Latest AMD GPU driver + reboot
  • Git for Windows
  • Python (many SD Windows setups are happiest on Python 3.10.x)

2) Install + start SD.Next (use cmd.exe, not PowerShell)

Open Command Prompt and run:

cd C:\AI
git clone https://github.com/vladmandic/sdnext.git
cd sdnext
webui.bat --debug

SD.Next documents launching on Windows with webui.bat --debug. (GitHub)

When it finishes starting, it prints a local URL (often http://127.0.0.1:7860). Open that in your browser.

3) Add an SD 1.5 model file (the “weights”)

A common starter SD 1.5 checkpoint is:

  • v1-5-pruned-emaonly.safetensors (license shown as creativeml-openrail-m) (Hugging Face)

Place the .safetensors file into SD.Next’s model folder (SD.Next “Getting Started” covers the basic “generate with a few clicks” workflow and model handling). (GitHub)

4) Turn on AMD GPU acceleration (ONNX Runtime + DirectML)

In SD.Next, switch to the ONNX Runtime pipeline and choose DmlExecutionProvider (DirectML). SD.Next notes:

  • DML EP becomes available by installing onnxruntime-directml
  • DirectX 12 is required (GitHub)

Why this matters: ONNX Runtime’s DirectML EP has specific constraints (for example, it does not support memory-pattern optimizations or parallel execution in ORT sessions). (ONNX Runtime)

5) First “known-stable” test settings (prove it works)

Start conservative:

  • 512×512
  • Steps: 20
  • CFG: ~7
  • Batch size: 1

Test prompts:

  • Positive: portrait photo, soft studio lighting, sharp focus
  • Negative: lowres, blurry, watermark, text, bad anatomy, extra fingers

Once you can generate one image reliably, then raise resolution/complexity.


Quick troubleshooting (the fastest fixes)

A) Start in “safe mode” to remove extension problems

webui.bat --debug --safe

--safe disables user extensions and is recommended for troubleshooting. (GitHub)

B) UI acts broken / buttons don’t work

SD.Next recommends deleting ui-config.json if it’s bloated (old settings can override new defaults and break the UI). (GitHub)

C) DirectML crashes / weird ORT errors

DirectML EP requires certain ORT options (mem-pattern + parallel execution) to be disabled; enabling them can cause errors. (ONNX Runtime)
If you see errors like 80070057, they’re commonly associated with those constraints; ONNX Runtime has issue reports in this area. (GitHub)


Prompt enhancement (offline, GUI-first)

Pick one “local chat” app

Option 1: Jan (desktop GUI, open source, offline)

Jan is presented as an open-source ChatGPT-like app for running models locally. (GitHub)

Option 2: KoboldCpp (single EXE + browser UI; good AMD hint)

KoboldCpp releases explicitly recommend the Vulkan option in the nocuda build for AMD. (GitHub)

Option 3: Ollama (simple installer)

Ollama’s Windows docs state it does not require Administrator and installs in your home directory by default. (Ollama Official Document)

Good beginner prompt-enhancer models (small + practical)

Specialized prompt optimizers (often best for SD prompting):

  • TIPO-200M (prompt optimization for text-to-image workflows). (Hugging Face)
  • DART v2 (generates Danbooru-style tags; useful if you like tag prompts). (Hugging Face)

General small instruct model (good at structured output):

  • SmolLM2-1.7B-Instruct (compact “run on-device” class model). (Hugging Face)

Copy/paste template for your prompt enhancer

Use this once as your “system prompt” (or first message).

You write prompts for Stable Diffusion 1.5.

Return exactly these sections:

POSITIVE:
NEGATIVE:
SETTINGS:
VARIATIONS:

Rules:
- POSITIVE: 1–2 lines. Include subject, environment, lighting, camera/framing, style/medium.
- NEGATIVE: comma-separated. Include common artifacts: lowres, blurry, watermark, text, deformed hands, extra fingers.
- SETTINGS: suggest resolution (start 512x512), steps (20–30), CFG (6–8).
- VARIATIONS: 5 short alternate POSITIVE prompts that keep the same idea but change lighting/camera/mood.

User idea: <paste your idea here>

Workflow:

  1. Write your idea → 2) copy POSITIVE/NEGATIVE/SETTINGS → 3) paste into SD.Next → 4) generate.
1 Like

Thankls for the fast reply !
Is there no way to install WSL v2 on windows10?
I do also have cachyos linux if you recommend that over windows, but i am very new into it so without well to be honest, without detailed instructions i probably wont be able to handle things.
Maybe i should have also stated my hardware is relatively capable, 16gb vram, 32gb ram.
I know this isn’t enough for some ai models, but it could be worse.
I would like to use “high-end” ai models rather than some beginner models which produce poor or limited output.