🚀 D²MoRA: Diversity-Regulated Asymmetric MoE-LoRA Decomposition for Efficient Multi-Task Adaptation

Jianhui Zuo¹ Xuemeng Song^2✉ Haokun Wen^3,4 Meng Liu⁵ Yupeng Hu¹ Jiuru Wang⁶ Liqiang Nie^3✉

¹School of Software, Shandong University
²Department of Computer Science and Engineering, Southern University of Science and Technology
³School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
⁴School of Data Science, City University of Hong Kong
⁵School of Computer and Artificial Intelligence, Shandong Jianzhu University
⁶School of Computer Science and Engineering, Linyi University

These are the official pre-trained model weights and configuration files for D²MoRA, a novel diversity-regulated asymmetric MoE-LoRA decomposition framework for parameter-efficient fine-tuning (PEFT) of large language models in multi-task adaptation scenarios.

🔗 Paper: [Accepted by AAAI 2026]
🔗 GitHub Repository: AAAI26-D2MoRA

📌 Model Information

1. Model Name

D²MoRA (Diversity-Regulated Asymmetric MoE-LoRA Decomposition) Checkpoints.

2. Task Type & Applicable Tasks

Task Type: Parameter-Efficient Fine-Tuning (PEFT) / Low-Rank Adaptation (LoRA) / Mixture-of-Experts (MoE) / Multi-Task Learning
Applicable Tasks: Efficient adaptation of large language models for heterogeneous downstream tasks, especially multi-task commonsense reasoning and related language understanding tasks.

3. Project Introduction

Low-Rank Adaptation (LoRA) has become a powerful parameter-efficient fine-tuning paradigm for adapting large language models. Recent studies further integrate LoRA with the Mixture-of-Experts (MoE) mechanism to improve multi-task adaptation. However, existing knowledge-sharing paradigms among LoRA experts still suffer from two major limitations:

Constrained Functional Specialization
Existing one-to-many sharing paradigms force all experts to operate in a single shared low-rank subspace, limiting the flexibility of expert-specific transformations.
Induced Expert Homogenization
Sharing a single down-projection matrix across experts may cause different experts to become overly similar, weakening expert diversity and reducing the benefit of MoE specialization.

To address these issues, D²MoRA introduces a diversity-regulated asymmetric MoE-LoRA decomposition framework. Instead of treating each LoRA expert as a fixed (A, B) pair, D²MoRA decomposes LoRA experts into two independent sets of base experts:

Down-projection experts: A₁, A₁, ..., A_M
Up-projection experts: B₁, B₂, ..., B_N

This design enables a novel asymmetric many-to-many pairing mechanism between down-projection and up-projection experts, allowing more flexible cross-expert knowledge sharing while preserving expert specialization. In addition, D²MoRA introduces:

Sample-Aware Down-Projection Expert Mixture
Low-Rank Embedding-Aware Up-Projection Expert Mixture
Dual Orthogonality Regularization

to explicitly improve the diversity of both (A)-experts and (B)-experts and mitigate expert homogenization.

💡 Note: D²MoRA is evaluated in both multi-task and single-task settings, and consistently demonstrates strong effectiveness and generalization ability.

4. Training Data Source

The model was primarily trained and evaluated on the Commonsense 170K benchmark, which contains eight public commonsense reasoning datasets:

BoolQ
PIQA
SIQA
HellaSwag
WinoGrande
ARC-c
ARC-e
OBQA

🚀 Usage & Basic Inference

These weights are designed to be used directly with the official D²MoRA GitHub repository.

Step 1: Prepare the Environment

Clone the GitHub repository and install dependencies following the official repository instructions:

git clone https://github.com/iLearn-Lab/AAAI26-D2MoRA.git
cd D2MoRA

Please refer to the official repository for the exact environment setup and dependency installation details.

Step 2: Download Model Weights & Data

Download the checkpoint files (e.g., best_model.pth) from this Hugging Face repository and place them into your local checkpoint directory.

You should also prepare the Commonsense 170K benchmark and related processed data according to the official repository instructions.

Step 3: Training / Evaluation

D²MoRA is built for PEFT-based adaptation of large language models such as LLaMA-7B and LLaMA2-7B.

In the paper, the method fine-tunes the Query / Key / Value projections of self-attention layers. Typical experimental settings include:

Backbones: LLaMA-7B, LLaMA2-7B
Adapted modules: Query / Key / Value projections
Orthogonality coefficient: λ = 1e-4
Dropout: 0.05
Batch size: 4 per A100 GPU (40GB)

Representative D²MoRA settings reported in the paper include:

LLaMA-7B
- {M = 3, N = 8, r = 8}
- {M = 3, N = 4, r = 16}
LLaMA2-7B
- {M = 3, N = 8, r = 8}
- {M = 4, N = 3, r = 16}

Please use the official repository scripts for training and evaluation.

📝⭐️ Citation

If you find our work or these model weights useful in your research, please consider leaving a Star ⭐️ on our GitHub repo and citing our paper:

@inproceedings{zuo2026d2mora,
  title={D2MoRA: Diversity-Regulated Asymmetric MoE-LoRA Decomposition for Efficient Multi-Task Adaptation},
  author={Zuo, Jianhui and Song, Xuemeng and Wen, Haokun and Liu, Meng and Hu, Yupeng and Wang, Jiuru and Nie, Liqiang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={29286--29294},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

🚀 D2MoRA: Diversity-Regulated Asymmetric MoE-LoRA Decomposition for Efficient Multi-Task Adaptation