Small language models are the new rage, say the researchers

by admin
Small language models are the new rage, say the researchers

The original version of This story appeared in How many magazine.

Large language models work well because they are so large. The latest models of Openai, Meta and Deepseek use hundreds of billions of “parameters” – the adjustable buttons that determine connections between data and are modified during the training process. With more parameters, the models are better able to identify models and connections, which makes them more powerful and precise.

But this power has a cost. The formation of a model with hundreds of billions of parameters takes huge calculation resources. To form its ultra gemini 1.0 model, for example, Google would have spent $ 191 million. Large language models (LLM) also require considerable calculation power each time they meet a request, which makes them notorious of energy pigs. Only one chatgpt query consumes about 10 times As much energy as a single Google research, according to the Electric Power Research Institute.

In response, some researchers now think small. IBM, Google, Microsoft and Openai all have recently published small (SLM) models that use a few billion parameters – a fraction of their LLM counterparts.

Small models are not used as tools for general use as their biggest cousins. But they can excel in specific and more closely defined tasks, such as the summary of conversations, the answer to patient questions as a health care chatbot and data collection in smart devices. “For many tasks, a parameter model of $ 8 billion is actually quite good,” said Zico KolterA computer scientist at Carnegie Mellon University. They can also operate on a laptop or mobile phone, instead of a huge data center. (There is no consensus on the exact definition of “small”, but the new models are all maximum of around 10 billion parameters.)

To optimize the process of training these small models, researchers use some tips. Large models often scratch raw training data on the Internet, and this data can be disorganized, disorderly and difficult to process. But these large models can then generate a high quality data set that can be used to form a small model. The approach, called distillation of knowledge, obtains the wider model to effectively transmit his training, as a teacher giving lessons to a student. “The reason (SLMS) becomes so good with such small models and so little data is that they use high quality data instead of disorderly stuff,” said Kolter.

Researchers also explored means of creating small models starting with large and reducing them. A method, known as pruning, consists in deleting unnecessary or ineffective parties neural network—The extent of the network of connected data points underlying a large model.

Pruning was inspired by a real neural network, the human brain, which gains effectiveness by snippting the connections between synapses as a person. Today's pruning approaches go back to A 1989 article In which the computer scientist Yann Lecun, now in Meta, argued that up to 90% of the parameters of a trained neural network could be removed without sacrificing efficiency. He described the “optimal brain lesions” method. Pruning can help researchers refine a small language model for a special task or environment.

For researchers interested in the way in which language models do the things they do, small models offer an inexpensive way to test new ideas. And because they have fewer parameters than large models, their reasoning could be more transparent. “If you want to make a new model, you have to try things,” said Leshem ChoshenA scientific researcher at Mit-ibm Watson Ai Lab. “Small models allow researchers to experiment with lower issues.”

The large expensive models, with their ever -increasing parameters, will remain useful for applications such as generalized chatbots, image generators and Discovery of drugs. But for many users, a small targeted model will work just as well, while being easier for researchers to train and build. “These effective models can save money, time and calculation,” said Choshen.


Original story reprinted with the permission of How many magazine,, an independent editorial publication of Simons Foundation whose mission is to improve the understanding of the public of science by covering the developments of research and the trends of mathematics and physical sciences and life.

Source Link

You may also like

Leave a Comment