Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
1いいね 12 views回再生

Saving 77% of the Parameters in Large Language Models Technical Report - NotebookLM Deep Dive

LLM Scaling Law is Wrong.

Paper: https://www.researchgate.net/publicat...

This technical report demonstrates that large language models (LLMs) can maintain their learning capacity while reducing their non-embedding parameters by up to 77%. We achieve this by adapting a parameter reduction technique originally developed for computer vision, replacing dense layers with an optimized subnetwork that contains grouped pointwise convolutions. Using a 2-layer phi-3-mini-4k-instruct codebase from Microsoft as our baseline, we show that our optimized model (kphi-3) achieves comparable validation loss while using only 15-23% of the original non-embedding parameters. Each experiment was conducted on a single NVIDIA L4 GPU within a 3-day timeframe, supporting the democratization of AI research. Our findings suggest that current LLM architectures may be substantially overparameterized, opening possibilities for more efficient model training and deployment.

#largelanguagemodels
#largelanguagemodel
#artificialintelligence
#llm
#neuralnetworks
#deeplearning

コメント