This study introduces SST, a low-rank optimization method that achieves near full-rank performance in AI model training while drastically reducing trainable parameters. Tested on the OPT language model and Hyperbolic Graph Neural Networks (HGNNs), SST outperformed LoRA and ReLoRA across multiple benchmarks — from zero-shot NLP evaluations to node classification and link prediction. The results show that SST offers a more efficient and scalable alternative for training large models without sacrificing accuracy or generalization.This study introduces SST, a low-rank optimization method that achieves near full-rank performance in AI model training while drastically reducing trainable parameters. Tested on the OPT language model and Hyperbolic Graph Neural Networks (HGNNs), SST outperformed LoRA and ReLoRA across multiple benchmarks — from zero-shot NLP evaluations to node classification and link prediction. The results show that SST offers a more efficient and scalable alternative for training large models without sacrificing accuracy or generalization.

SST vs LoRA: A Leaner, Smarter Way to Train AI Models

2025/10/30 16:07

Abstract and 1. Introduction

  1. Related Work

  2. Low Rank Adaptation

    3.1 LoRA and 3.2 Limitation of LoRA

    3.3 ReLoRA*

  3. Sparse Spectral Training

    4.1 Preliminaries and 4.2 Gradient Update of U, VT with Σ

    4.3 Why SVD Initialization is Important

    4.4 SST Balances Exploitation and Exploration

    4.5 Memory-Efficient Implementation for SST and 4.6 Sparsity of SST

  4. Experiments

    5.1 Machine Translation

    5.2 Natural Language Generation

    5.3 Hyperbolic Graph Neural Networks

  5. Conclusion and Discussion

  6. Broader Impacts and References

Supplementary Information

A. Algorithm of Sparse Spectral Training

B. Proof of Gradient of Sparse Spectral Layer

C. Proof of Decomposition of Gradient of Weight

D. Proof of Advantage of Enhanced Gradient over Default Gradient

E. Proof of Zero Distortion with SVD Initialization

F. Experiment Details

G. Singular Value Pruning

H. Evaluating SST and GaLore: Complementary Approaches to Memory Efficiency

I. Ablation Study

5.2 Natural Language Generation

We utilize the OPT [9] architecture as the baseline for our language generation experiments. All models are pre-trained on OpenWebText [39], an open-source reproduction of OpenAI’s WebText. To facilitate fair comparisons across different OPT model sizes, we standardize the total training tokens for all models at 19.7 billion. A consistent rank (r = 64) is applied for all low-rank methods.

\ Table 3 displays the validation perplexity results on the OpenWebText dataset across different sizes of OPT models. The results indicate that SST not only achieves lower perplexity scores compared to LoRA and ReLoRA* but also approximates the performance of full-rank training, with significantly fewer trainable parameters.

\ Figure 2: Comparison of performance on effective steps between SST and full-Rank training. Effective steps are quantified by multiplying the number of trainable parameters by the number of steps taken. All methods and model sizes utilize the same number of tokens in each step.

\ Figure 2 illustrates a comparison of effective steps among various training methods. The effective step metric, which considers both the number of trainable parameters and the number of training steps, demonstrates that SST offers a more efficient training approach compared to the full-rank method.

\ Each pretrained model undergoes zero-shot evaluations on all 16 NLP tasks used in OPT article [9], including ARC Easy and Challenge [40], HellaSwag [41], OpenBookQA [42], PIQA [43], StoryCloze [44], SuperGLUE [45], WinoGrad [46], and WinoGrande [47]. Evaluations are conducted using the LM Evaluation Harness framework [48]. Except for the ReCoRD task, which uses F1 score, all other tasks are evaluated using accuracy.

\ Table 4 details the zero-shot evaluation results across the 16 NLP tasks. SST consistently performs comparably or better than other low-rank methods and shows competitive performance against the full-rank models.

\ We further conduct an analysis experiment on inference by doing post-training singular value pruning on SST model (see appendix G).

\

5.3 Hyperbolic Graph Neural Networks

Hyperbolic Graph Neural Networks (HGNNs) [11, 12] capitalize on the expansive and hierarchical nature of hyperbolic space to efficiently manage and analyze graph-structured data. This geometric space is particularly suitable for graphs due to its ability to closely mimic the underlying data structures with minimal distortion, offering a substantial improvement over traditional Euclidean methods.

\ \ Table 3: Validation perplexity on OpenWebText across various OPT model sizesalong with the number of trainable parameters of each method. Rank r = 64. Values in bold highlight the highest performance among the low-rank methods.

\ \ We evaluated the effectiveness of SST on HyboNet [12] version HGNN in node classification and link prediction across four distinct datasets: Airport [11], Cora [49], Disease [50], and PubMed [51]. Each experiment was conducted with three random seeds.

\ \ Table 4: Zero-shot evaluations on the same 16 NLP tasks featured in the OPT article [9]. Except for the ReCoRD task, which uses F1 score, all other tasks are evaluated using accuracy, with values presented as percentages. Mean scores in bold represent superior performance among the low-rank methods. Additionally, we include the win percentage (counting ties) for each low-rank method compared to the full-rank training.

\ \ \ Table 5: Node Classification and Link Prediction Results. Model’s dimension d = 16. Results are reported as test F1 scores for node classification and test precision for link prediction, expressed in percentages. Values highlighted in bold represent the highest performance among the low-rank methods, while those marked with an “*” denote performance that exceeds that of the full-rank variants.

\ \ The results, detailed in Table 5, demonstrate strong performance in both node classification and link prediction tasks. SST not only shows comparable performance to full-rank training (exceeding it in the Disease link prediction task) but also significantly outperforms LoRA at equivalent ranks. Notably, SST’s advantage over LoRA is larger on r = 1 than r = 2, likely due to SST’s sampling strategy being particularly effective in sparser scenarios.

:::info Authors:

(1) Jialin Zhao, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(2) Yingtao Zhang, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(3) Xinghang Li, Department of Computer Science;

(4) Huaping Liu, Department of Computer Science;

(5) Carlo Vittorio Cannistraci, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Computer Science, and Department of Biomedical Engineering Tsinghua University, Beijing, China.

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

Forget Cardano, Why Shiba Inu’s Shibarium Is The Real Ghost Chain

Forget Cardano, Why Shiba Inu’s Shibarium Is The Real Ghost Chain

Shiba Inu’s effort to grow beyond being a meme coin is struggling. Its blockchain network, Shibarium, was created to bring real use and value to the project, but it has not gained much attention or activity. Developer interest and user engagement are very low, and the network’s overall growth has slowed down sharply. Recent network issues, including technical troubles and security problems, have made things worse. Many users have left, and new projects are not joining. As a result, Shibarium now shows very little activity, leading many in the crypto community to call it a “ghost chain.”  Shiba Inu’s Struggle To Evolve Beyond A Meme Coin Shiba Inu tried to change its image from a simple meme coin into a real blockchain project capable of competing with other networks. The team launched Shibarium, a layer-2 blockchain, in 2023 to help make this move. However, this plan has not worked as expected, with Shibarium failing to attract developers, projects, or users and gaining no market share. Related Reading: XRP At $1,000 Is Peanuts If Used To Clear US National Debt; Pundit Explains According to data from DeFi Llama, Shibarium has only 18 developers since it began. It is a much lower number than on other blockchains, which have hundreds or even thousands of active developers. The total value locked (TVL) on the network, which shows how much money people have invested in it, has fallen to just $878,000.  Shibarium has also failed to attract any stablecoins, which are among the most widely used tokens in decentralized finance. Not a single stablecoin project has deployed on the network, reflecting Shibarium’s lack of presence in one of the most critical areas of the crypto world. Other newer and more active layer-2 networks like Base, Arbitrum, Plasma, and Linea have already moved far ahead, leaving Shibarium behind. Hacks And The Decline Of Shibarium Network Activity Things got worse for the network when ShibaSwap, the most popular decentralized app (dApp) on the Shibarium network, was recently compromised. The attack eroded user confidence and forced developers to pause a key bridge connecting Shibarium to other networks. Even with the bridge now active, most of the network’s activity stopped. Many users could not move their tokens or use apps, making the network almost entirely silent. Related Reading: Here’s Why The XRP Price Still Isn’t Bearish Despite The 50% Flash Crash Because of this drop in network activity, Shibarium is no longer helping burn SHIB tokens. Typically, a portion of network transaction fees goes toward buying and burning Shiba Inu tokens, helping reduce supply and support the token’s price. But now, with very few transactions, the burn process has slowed down significantly. The decline in users, developers, and activity are indicators that Shibarium’s dream of becoming a strong, useful blockchain has not come to fruition. Instead of growing into a central crypto platform, it has become what some would call the real ghost chain.  Featured image created with Dall.E, chart from Tradingview.com
Share
NewsBTC2025/10/31 03:00