Last month, US financial markets collapsed after a Chinese start -up called Deepseek, he said he had built one of the most powerful artificial intelligence systems in the world using much less computer chips than many experts believed.
AI companies usually train their chatbots using supercomputers packaged with 16,000 specialized chips or more. But Deepseek said it only needed about 2,000.
As Deepseek engineers were analyzed in a research document published shortly after Christmas, the start used enough technological tricks to significantly reduce the cost of manufacturing its system. Its engineers only needed about $ 6 million in raw computational power, about one tenth of what Meta is spent on building the latest AI technology.
What exactly did Deepseek did? Here’s a guide.
How are AI technologies manufactured?
Top AI technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing huge amounts of data.
The most powerful systems spend months by analyzing almost all the English text on the internet, as well as many images, sounds and other media. This requires huge amounts of computational power.
About 15 years ago, AI researchers realized that specialized computer chips called graphics or GPUs were an effective way to make this type of data analysis. Companies like Silicon Valley Nvidia’s chipmaker first designed these brands to make computer video game graphics. But GPUs also had the ability to function in mathematics that supplied neural networks.
As companies packaged more GPUs in their computer centers, their AI systems could analyze more data.
But the best GPU costs about $ 40,000 and need huge amounts of electricity. Shipment of data between chips can use more electricity than chips themselves.
How can Deepseek reduce costs?
He did a lot of things. More specifically, it embraced a method called “an expert mixture”.
The companies usually created a single neuronal network that learned all the standards in all internet data. This was expensive because it required huge amounts of data to travel between GPU chips.
If one chip learned how to write a poem and another learned how to write a computer program, they still had to talk to each other, only if there was a coating between poetry and programming.
With the mixture of expert method, the researchers tried to solve this problem by dividing the system into many neural networks: one for poetry, one for computer programming, one for biology, one for natural and so on. There may be 100 of these smaller “expert” systems. Every expert could focus on its particular area.
Many companies have competed with this method, but Deepseek was able to do it well. His trick was to combine these smaller “expert” systems with a “general” system.
Experts were still needed to exchange some information with each other and the general – who had a decent but not detailed understanding of each subject – could help to coordinate the interactions between experts.
It’s a bit like an author who oversees a newsroom full of specialist journalists.
And is this more effective?
Many more. But this is not the only thing Deepseek. It has also dominated a simple trick that includes decimal places that can be understood by anyone who remembers the class of elementary school mathematics.
Are there maths involved in it?
Remember your math teacher explaining the concept of Pi. Pi, which is also indicated as P, is a number that never ends: 3.14159265358979 …
You can use P to make useful calculations, such as determining the circumference of a circle. When you make these calculations, you shorten in a few decimal places: 3.14. If you use this simplest number, you have a pretty good estimate of the circumference of a circle.
Deepseek did something similar – but on a much larger scale – in AI technology training.
The mathematics that allows a neuronal network to detect standards in the text is really just proliferation – many and many multiplication. We are talking months of multiplication in thousands of computer chips.
Usually, chips multiply numbers that match 16 bits memory. But Deepseek compresses each number in only 8 bits memory – half of the space. In essence, he threw many decimal places from each number.
This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a truly powerful neuronal network.
Is that?
Well, they added another trick.
After squeezing each number into 8 bits memory, Deepseek took a different path when multiplying these numbers together. In determining the answer to any multiplication problem – making a basic calculation that would help decide how the neuronal network would work – stretches the answer to 32 bits memory. In other words, it kept many more decimal places. Made the answer more accurate.
So, any high school student could do that?
Well, no. Deepseek engineers showed their paper that they were also very good at writing the very complex computer code that tells the GPUs what to do. They knew how to push even greater performance than these chips.
Few people have this type of skill. But the serious AI laboratories have the talented engineers needed to match what Deepseek has done.
Then why didn’t they already do it?
Some AI laboratories may use at least some of the same tricks already. Companies like Openai do not always reveal what they do behind closed doors.
But others were clearly surprised by Deepseek’s work. Doing what the boot did is not easy. Experimenting needed to find a revolution like this includes millions of dollars – if not billions – in electricity.
In other words, it requires huge amounts of risk.
“You have to put a lot of money on the line to try new things – and often, they fail,” said Tim Dettmers, a researcher at the Allen Institute of Intelligence in Seattle who specializes in building effective AI systems and previously worked as an AI researcher in Meta.
“That is why we do not see much innovation: people are afraid to lose many millions only to try something that does not work,” he added.
Many experts pointed out that Deepseek’s $ 6 million were only covered by what started starting with the final version of the system. In their work, Deepseek engineers said they had spent additional research and experimentation before final training. But the same is true for any AI cutting -edge project.
Deepseek experimented and rendered. Now, because the Chinese start has shared its methods with other AI researchers, its technological tricks are ready to significantly reduce the AI ​​building costs