The Impact of Parameter Scaling: Analysis of Specific Large Language Model Capabilities

Ariya Uttama Putera; Felix Marcellino; Sonya Rapinta Manalu; Keenan Ario  Muhamad

doi:10.21512/ijcshai.v3i1.15119

Authors

Ariya Uttama Putera Bina Nusantara University
Felix Marcellino Bina Nusantara University
Sonya Rapinta Manalu Bina Nusantara University
Keenan Ario Muhamad Bina Nusantara University

DOI:

https://doi.org/10.21512/ijcshai.v3i1.15119

Keywords:

LLM, parameter scaling, model efficiency, capability evaluation, inference speed

Abstract

Large Language Models (LLMs) are currently very diverse. Some of the largest include Chat-GPT, Gemini, Microsoft Copilot, Claude Sonet, Grok, and DeepSeek. Based on this, the plan of this research is to determine how efficient these AI models can be, based on their strengths in LLM training. In this study, we will examine the impact of LLM scaling parameters on the results of each local model we will test. This study also limits the number of parameters and classifies the questions to be asked. From these questions, we can identify and classify which local LLM models perform better when asked the same questions. Then, we will objectively evaluate each of them based on the results of the study. Thus, this study aims to establish a known correlation between scaling parameters and results. We also hope that it will be useful for improving work efficiency in selecting AI that suits user needs and expanding users' knowledge of AI so they can perform their jobs more efficiently and accurately. From this research, we conclude, aware of the results of the work that has been done, that local LLMs with large scaling are not entirely good and efficient. As with Gemma3, even with 12B parameters, the results weren't better than the Gemma3 model with 4B parameters. Alternatively, if you're using similar hardware to ours, you can use GPT-oss (openai/gpt-oss-20B) and Qwen3 (Qwen/Qwen3-4B & Qwen/Qwen3-8B), which offer good results in terms of reasoning and inference speed.

Dimensions

Author Biographies

Ariya Uttama Putera, Bina Nusantara University

Mobile Application & Technology Program, Computer Science Department, School of Computer Science

Felix Marcellino, Bina Nusantara University

Mobile Application & Technology Program, Computer Science Department, School of Computer Science

Sonya Rapinta Manalu, Bina Nusantara University

Mobile Application & Technology Program, Computer Science Department, School of Computer Science

Keenan Ario Muhamad , Bina Nusantara University

Mobile Application & Technology Program, Computer Science Department, School of Computer Science

References

[1] J. Kaplan et al., "Scaling Laws for Neural Language Models," arXiv:2001.08361, 2020. [Online]. Available: https://arxiv.org/abs/2001.08361

[2] W. X. Zhao et al., “A Survey of Large Language Models,” Preprint di arXiv, e-print arXiv:2303.18223, Mar. 2023. [Online]. Available: https://arxiv.org/abs/2303.18223

[3] Y. Tay et al., “Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers,” Preprint di arXiv, e-print arXiv:2109.10686, Sep. 2021. [Online]. Available: https://arxiv.org/abs/2109.10686

[4] T. B. Brown et al., “Language Models are Few-Shot Learners,” Preprint di arXiv, e-print arXiv:2005.14165, Mei 2020. [Online]. Available: https://arxiv.org/abs/2005.14165

[5] J. Hoffmann et al., “Training Compute-Optimal Large Language Models,” Preprint di arXiv, e-print arXiv:2203.15556, Mar. 2022. [Online]. Available: https://arxiv.org/abs/2203.15556

[6] P. Brauner, A. Hick, R. Philipsen, dan M. Ziefle, “What does the public think about artificial intelligence?—A criticality map to understand bias in the public perception of AI,” Front. Comp. Sci., vol. 5, Art. no. 1113903, Mar. 2023, doi: 10.3389/fcomp.2023.1113903. Available: https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2023.1113903/full

[7] S. Mirchandani et al., “Large language models as general pattern machines,” Preprint di arXiv, e-print arXiv:2307.04721, Jul. 2023. [Online]. Available: https://arxiv.org/abs/2307.04721

[8] X. Bi et al., “DeepSeek LLM: Scaling Open-Source Language Models with Longtermism,” Preprint di arXiv, e-print arXiv:2401.02954, Jan. 2024. [Online]. Available: https://arxiv.org/abs/2401.02954

[9] T. Henighan et al., “Scaling Laws for Autoregressive Generative Modeling,” Preprint di arXiv, e-print arXiv:2010.14701, Okt. 2020.

[Online]. Available: https://arxiv.org/abs/2010.14701

[10] B. Zhang et al., “When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method,” Preprint di arXiv, e-print arXiv:2402.17193, Feb. 2024. [Online]. Available: https://arxiv.org/abs/2402.17193

The Impact of Parameter Scaling: Analysis of Specific Large Language Model Capabilities

Authors

DOI:

Keywords:

Abstract

Author Biographies

Ariya Uttama Putera, Bina Nusantara University

Felix Marcellino, Bina Nusantara University

Sonya Rapinta Manalu, Bina Nusantara University

Keenan Ario Muhamad , Bina Nusantara University

References

Downloads

Published

How to Cite

Issue

Section

License

sidebarmenu

Indexing and Abstracting

toolsijschai

visitors