Z-Image LoRA Fix For ComfyUI Nunchaku Models

by Editorial Team 45 views
Iklan Headers

Hey guys, this is a deep dive into an interesting issue I ran into while working with Z-Image LoRA models in ComfyUI. Specifically, I'll be discussing why a converted LoRA wouldn't apply correctly to a nunchaku model, even though it worked perfectly fine with a standard model. It's a bit of a technical journey, but hopefully, by the end, you'll have a better understanding of how LoRAs work and how to troubleshoot these kinds of problems.

The Core Problem: LoRA Compatibility and Conversion

At the heart of the issue is the compatibility of LoRA models and how they're converted for use in different environments. I was using a LoRA that was originally trained with musubi-tuner. This tool is great for fine-tuning models, but when I converted the trained LoRA using convert_z_image_lora_to_comfy.py, the converted LoRA failed to apply to a nunchaku model within ComfyUI. The LoRA worked fine with the standard model but not the nunchaku version, which was quite perplexing.

Now, let's talk about the tools involved. kohya-ss and musubi-tuner are vital for training and converting LoRA models. They give you the flexibility to adapt your models to specific styles and concepts. ComfyUI, on the other hand, is a powerful and flexible visual interface for Stable Diffusion, and it's where the rubber meets the road in terms of generating images.

Comparing Key Structures: Trained vs. Converted LoRAs

One of the keys to solving the problem was looking at the inner workings of the LoRAs. The keys are basically the names of the parts of the model that the LoRA is changing. The keys in the trained LoRA, as produced by musubi-tuner, had a structure like this:

lora_unet_layers_0_attention_to_k.alpha
lora_unet_layers_0_attention_to_k.lora_down.weight
lora_unet_layers_0_attention_to_k.lora_up.weight
lora_unet_layers_0_attention_to_out_0.alpha
lora_unet_layers_0_attention_to_out_0.lora_down.weight
lora_unet_layers_0_attention_to_out_0.lora_up.weight
lora_unet_layers_0_attention_to_q.alpha
lora_unet_layers_0_attention_to_q.lora_down.weight
lora_unet_layers_0_attention_to_q.lora_up.weight
lora_unet_layers_0_attention_to_v.alpha
lora_unet_layers_0_attention_to_v.lora_down.weight
lora_unet_layers_0_attention_to_v.lora_up.weight
lora_unet_layers_0_feed_forward_w1.alpha
lora_unet_layers_0_feed_forward_w1.lora_down.weight
lora_unet_layers_0_feed_forward_w1.lora_up.weight
lora_unet_layers_0_feed_forward_w2.alpha
lora_unet_layers_0_feed_forward_w2.lora_down.weight
lora_unet_layers_0_feed_forward_w2.lora_up.weight
lora_unet_layers_0_feed_forward_w3.alpha
lora_unet_layers_0_feed_forward_w3.lora_down.weight
lora_unet_layers_0_feed_forward_w3.lora_up.weight

After the conversion with convert_z_image_lora_to_comfy.py, the keys looked a bit different:

lora_unet_layers_0_attention_out.alpha
lora_unet_layers_0_attention_out.lora_down.weight
lora_unet_layers_0_attention_out.lora_up.weight
lora_unet_layers_0_attention_qkv.alpha
lora_unet_layers_0_attention_qkv.lora_down.weight
lora_unet_layers_0_attention_qkv.lora_up.weight
lora_unet_layers_0_feed_forward_w1.alpha
lora_unet_layers_0_feed_forward_w1.lora_down.weight
lora_unet_layers_0_feed_forward_w1.lora_up.weight
lora_unet_layers_0_feed_forward_w2.alpha
lora_unet_layers_0_feed_forward_w2.lora_down.weight
lora_unet_layers_0_feed_forward_w2.lora_up.weight
lora_unet_layers_0_feed_forward_w3.alpha
lora_unet_layers_0_feed_forward_w3.lora_down.weight
lora_unet_layers_0_feed_forward_w3.lora_up.weight

You'll notice that some of the key names have changed, specifically in the attention and output layers. It's subtle, but these differences can be enough to prevent the LoRA from being correctly applied, especially in specialized model architectures like the nunchaku model. The conversion script changes the structure, and this sometimes is not correctly applied.

The Civitai LoRA: A Working Example

To better understand what was going on, I compared this to a LoRA for Z-Image that I got from Civitai. This one worked flawlessly with both the standard and nunchaku models. The key structure for the Civitai LoRA looked like this:

diffusion_model.layers.0.adaLN_modulation.0.lora_A.weight
diffusion_model.layers.0.adaLN_modulation.0.lora_B.weight
diffusion_model.layers.0.attention.to_k.lora_A.weight
diffusion_model.layers.0.attention.to_k.lora_B.weight
diffusion_model.layers.0.attention.to_out.0.lora_A.weight
diffusion_model.layers.0.attention.to_out.0.lora_B.weight
diffusion_model.layers.0.attention.to_q.lora_A.weight
diffusion_model.layers.0.attention.to_q.lora_B.weight
diffusion_model.layers.0.attention.to_v.lora_A.weight
diffusion_model.layers.0.attention.to_v.lora_B.weight
diffusion_model.layers.0.feed_forward.w1.lora_A.weight
diffusion_model.layers.0.feed_forward.w1.lora_B.weight
diffusion_model.layers.0.feed_forward.w2.lora_A.weight
diffusion_model.layers.0.feed_forward.w2.lora_B.weight
diffusion_model.layers.0.feed_forward.w3.lora_A.weight
diffusion_model.layers.0.feed_forward.w3.lora_B.weight

The keys in the Civitai LoRA, notice the diffusion_model prefix and the use of .lora_A.weight and .lora_B.weight. It's a different way of structuring the LoRA, and importantly, it's compatible with the nunchaku model. This really pointed to an issue with how the original LoRA was being converted.

Fixing the Conversion: The Patch and Its Implications

After trying to convert the LoRA with convert_lora.py, I still couldn't get it working with the nunchaku model. The keys were closer to the Civitai LoRA, but still not quite right. Here's what the keys looked like after this second conversion:

diffusion_model.layers.0.attention.to.k.lora_A.weight
diffusion_model.layers.0.attention.to.k.lora_B.weight
diffusion_model.layers.0.attention.to.out.0.lora_A.weight
diffusion_model.layers.0.attention.to.out.0.lora_B.weight
diffusion_model.layers.0.attention.to.q.lora_A.weight
diffusion_model.layers.0.attention.to.q.lora_B.weight
diffusion_model.layers.0.attention.to.v.lora_A.weight
diffusion_model.layers.0.attention.to.v.lora_B.weight
diffusion_model.layers.0.feed.forward.w1.lora_A.weight
diffusion_model.layers.0.feed.forward.w1.lora_B.weight
diffusion_model.layers.0.feed.forward.w2.lora_A.weight
diffusion_model.layers.0.feed.forward.w2.lora_B.weight
diffusion_model.layers.0.feed.forward.w3.lora_A.weight
diffusion_model.layers.0.feed.forward.w3.lora_B.weight

The key difference? The lack of the dot between to and k, v, q, and out and the missing dot between feed and forward. To solve this, I needed to modify the conversion process. I applied a patch to convert_lora.py to fix these specific key names:

diff --git a/src/musubi_tuner/convert_lora.py b/src/musubi_tuner/convert_lora.py
index 4c32cf0..20def30 100644
--- a/src/musubi_tuner/convert_lora.py
+++ b/src/musubi_tuner/convert_lora.py
@@ -121,6 +121,13 @@
def convert_to_diffusers(prefix, diffusers_prefix, weights_sd):
                    module_name = module_name.replace("self.attn", "self_attn")  # fix self attn
                    module_name = module_name.replace("k.img", "k_img")  # fix k img
                    module_name = module_name.replace("v.img", "v_img")  # fix v img
+                if ".attention.to." in module_name or ".feed.forward." in module_name:
+                    # Z-Image lora name to module name: ugly but works
+                    module_name = module_name.replace("to.q", "to_q")  # fix to q
+                    module_name = module_name.replace("to.k", "to_k")  # fix to k
+                    module_name = module_name.replace("to.v", "to_v")  # fix to v
+                    module_name = module_name.replace("to.out", "to_out")  # fix to out
+                    module_name = module_name.replace("feed.forward", "feed_forward")  # fix feed forward
                else:
                    # HunyuanVideo lora name to module name: ugly but works
                    module_name = module_name.replace("double.blocks.", "double_blocks.")  # fix double blocks

This patch specifically targets the key names of the Z-Image LoRA. It replaces instances of to.q, to.k, to.v, to.out and feed.forward with to_q, to_k, to_v, to_out, and feed_forward, respectively. This is a very specific fix to make sure the key names match what the nunchaku model is expecting.

The Result: Success!

After applying this patch and converting the LoRA again using convert_lora.py, the LoRA finally worked correctly with both the standard and the nunchaku models. This shows how crucial it is for the key names to align between your LoRA and the model you're using.

Important Considerations and Further Steps

While this solution worked in my case, it's important to remember that LoRA conversion can be complex, and the specific fix may not apply to all situations. The key is understanding the differences in the key names and how they relate to the underlying structure of the model. Here's a quick rundown of what you should do if you run into this problem yourself.

Troubleshooting Tips

  • Inspect Key Names: The most important step is to compare the key names of your converted LoRA to those of a known-working LoRA. If the keys are different, that's your first clue.
  • Model Architecture: Different model architectures can have different key name conventions. This is particularly true for specialized models like the nunchaku model. The way the model is structured will affect how LoRA weights should be applied.
  • Conversion Tools: Use the correct conversion tools for the LoRA and model type. Incorrect use will definitely cause issues!
  • Test Thoroughly: Always test your converted LoRAs with both a standard model and the specialized model to ensure compatibility.

Where to Go From Here

I haven't tested this patch with other types of LoRAs, so I'm hesitant to create a pull request. Therefore, I’m sharing this as an issue, and I'd love your feedback, guys! If you've encountered similar problems or have insights into how to improve this fix, please share them. Let's make this community even better. Happy generating! I hope this helps you guys!