💡 This is the Part 2 of the Decentralized AI series. Check out the Part 1 here.
Building models is, obviously, one of the most important stages of the AI pipeline. While the research and the progress have been going on for many years, the recent boom of consumer-facing AI applications and accelerating LLM adoption increased the focus on the development of new and better AI models.
Taking the AI models to their next level comes with a set of challenges such as:
While throwing more resources at the problem may sound good, it doesn't typically address the efficiency, scalability, and accuracy-related issues that developers encounter.
In this article, we will explore some of the use cases and examples of how decentralization can be beneficial to AI model building, and what are some of the existing or emerging projects and technologies that are working in this direction. We will be approaching the existing problems by only applying decentralization concepts where they are needed instead of starting with the solutions we have and looking for problems.
Computing resources needed for the state of art models are significantly high. For this reason, a significant portion of the focus so far has been directed to decentralized compute protocols to enable a more efficient and democratic way of training AI models by allowing multiple nodes or parties to collaboratively train a model on their own data and compute resources, without sharing or transferring the data or the model. This aims to reduce the cost and time of training and increase the diversity and quality of the data and the model.
A major challenge in building AI models is the need for high-performance and scalable computing resources, which are often centralized, expensive, and limited for many users and developers. For example, running a complex or large-scale AI model can require specialized hardware or cloud services, which can be costly or unavailable for many applications or scenarios.
Decentralization can enable a more accessible and affordable way of accessing compute resources by allowing multiple nodes or parties to share or rent their idle or unused compute resources without relying on a central provider or intermediary. This can lower the barrier and cost of accessing compute resources, and increase the availability and efficiency of the compute resources. Currently, GPU concentration and, as a result, AI training are concentrated in the hands of a few companies, which creates a monopolization risk in the near future.
But there are some decentralized solutions that aim to tackle this market & dominating players. Two examples are Bittensor and Gensyn AI.
Bittensor is an open-source protocol that powers a decentralized, blockchain-based machine-learning network. The project aims to let developers train machine learning models collaboratively and get rewarded in TAO according to the informational value they offer the collective. TAO also grants external access, allowing users to extract information from the network while tuning its activities to their needs. Ultimately, Bittensor’s vision is to create a pure market for artificial intelligence, an incentivized arena in which consumers and producers of this valuable commodity can interact in a trustless, open, and transparent context.
Bittensor aims to solve some of the major problems of AI model training, such as data quality, data diversity, data privacy, model bias, model transparency, and model scalability. By leveraging distributed ledger technology, Bittensor enables:
Bittensor is still in its early stages of development, but it has already gathered a good amount of attention from crypto and AI communities, reaching billions of dollars in market cap.
Gensyn AI is a company that aims to provide a decentralized machine learning computing protocol. This protocol connects hardware that performs machine learning tasks, such as GPUs and CPUs, and makes them available to engineers, researchers, and academics. Gensyn AI claims that its protocol can lower the cost, increase the scale, and ensure permissionless access to machine learning computing.
The high cost of machine learning training is one of the main problems that Gensyn AI is trying to solve. According to Gensyn AI, traditional cloud providers like AWS charge high margins for renting GPUs and other hardware, making machine learning training expensive and inaccessible for many users. Gensyn AI’s protocol eliminates these margins by allowing users to access hardware directly from other users without intermediaries. Gensyn AI says that its protocol can offer up to 80% cheaper computing than AWS.
Another challenge that Gensyn AI is addressing is the limited scale of machine learning training. Gensyn AI argues that cloud providers cannot meet the growing demand for machine learning computing, especially for large-scale models that require massive amounts of data and computation. Gensyn AI’s protocol leverages the underutilized devices in the world, such as consumer GPUs, custom ASICs, and SoC devices, and connects them into a global supercluster that can provide more GPUs than the cloud. Gensyn AI uses advanced techniques such as model and data parallelization, distributed computing, and fault tolerance to enable efficient and reliable training over the network.
They believe that machine learning training should be decentralized and uncensored and that anyone should be able to own and control their own AI models. Gensyn AI’s protocol allows users to train models without interruption on any available GPU on the network without relying on centralized authorities or intermediaries. Gensyn AI also uses cryptography and privacy-preserving techniques to ensure the security and integrity of the data and models on the network.
Reinforcement Learning from Human Feedback (RLHF) is a technique that trains an AI system to optimize its behavior based on human preferences. RLHF is needed because many AI tasks, especially those involving natural language processing, are difficult to define or measure using algorithmic criteria or metrics. For example, how can we evaluate the quality of a generated story, a conversational agent, or a code snippet? These tasks depend on subjective and context-specific human values and expectations, which are hard to capture in a predefined reward function.
RLHF works by collecting human feedback on the AI system’s outputs and using it to train a reward model, which predicts how good or bad output is according to human standards. The reward model is then used as a surrogate reward function to guide the AI system’s learning process using reinforcement learning. This way, the AI system can improve its performance and align its behavior with human preferences without requiring explicit rules or references.
One of the challenges of RLHF is to gather high-quality and diverse human feedback that can cover a large and complex output space. One possible solution is to leverage the power of the community by crowdsourcing human feedback from multiple sources and aggregating them in a principled way. For example, Hugging Face has created a platform that allows users to provide feedback on language models and their outputs, and to share their feedback with other users. This can create a virtuous cycle of data collection and model improvement, as well as foster collaboration and trust among the community.
RLHF is effective in various domains of natural language processing, such as text summarization, natural language understanding, and conversational agents. RLHF has also been applied to other areas, such as video game bots and image generation. RLHF has enabled AI systems to generate more diverse, creative, and human-aligned outputs and to overcome some of the limitations of traditional reinforcement learning methods
Decentralization can offer some solutions to these challenges by leveraging the power of distributed networks, communities, and markets.
One example of a decentralized concept that can be used for RLHF is decentralized autonomous organizations (DAOs), which are self-governing entities that operate on a blockchain. DAOs can coordinate the actions and incentives of multiple stakeholders, such as AI developers, human feedback providers, and end-users, without the need for a central authority or intermediary. DAOs can also enforce smart contracts that specify the rules and rewards for RLHF, such as how much feedback is required, how it is verified, and how it is compensated. This is a more efficient and economically sustainable way of crowdsourcing feedback.
Another example of a decentralized concept that can be used for RLHF is information markets, which are platforms that allow participants to trade on the outcomes of events or questions. Information markets can elicit and aggregate the collective wisdom and opinions of a large and diverse crowd, which can provide valuable feedback for AI agents. Information markets can also incentivize honest and accurate feedback by rewarding those who predict correctly and penalizing those who predict wrongly.
Decentralized communities, DAOs, information markets, and other decentralized concepts can be used for RLHF and crowdsourcing purposes by providing a scalable, efficient, and reliable way of collecting and aggregating human feedback for AI agents. Decentralization can also foster a more collaborative and participatory approach to AI development, where humans and machines can learn from each other and co-create value.
How can we ensure that the data used for ML is not leaked or tampered with? How can we verify that the ML models are trained and executed correctly and honestly? How can we protect the intellectual property and ownership rights of ML developers and users?
One possible solution is to combine ML with zero-knowledge proofs (ZKPs) and blockchain. ZKPs are a cryptographic technique that allows one party to prove to another that a statement is true without revealing any information beyond the validity of the statement. Blockchain is a distributed ledger that records transactions in a secure and transparent way without relying on a central authority.
By using ZKPs and blockchain, verifiable machine learning, a paradigm that enables the verification of the correctness and privacy of ML processes and outcomes, can be created. Verifiable machine learning has several benefits, such as:
An example project working on model verifiability is Giza Tech. Giza is a company that leverages blockchain technology and zero-knowledge cryptography to enable verifiable AI model inference on-chain. They allow anyone to deploy their AI models in a serverless and secure manner and to use them in smart contracts without revealing the model details or the input data. This way, users can benefit from the power and utility of AI models while also having the assurance that the models are running correctly and producing valid outputs.
Giza Tech’s solution is based on two key components: a transpiler and a verifier. The transpiler converts any AI model in the common ONNX format into a verifiable model that can be executed on-chain. The verifier uses zero-knowledge proofs to check that the model inference has been done correctly off-chain and to provide a proof of validity that can be verified by anyone on-chain. The verifier also ensures that the model and the input data are kept private and encrypted and that only the output and the proof are revealed. The network is designed to be scalable, interoperable, and compatible with various blockchain platforms and protocols.
Modulus Labs is another company that aims to make machine learning (ML) models verifiable. They address this issue by using zero-knowledge proofs. Modulus Labs constructs and trains ML models utilizing their own data and code, converting them into zero-knowledge (ZK) circuits. These circuits are mathematical depictions of the model's logic and parameters.
They publish these ZK circuits on a public blockchain, accompanied by the model's metadata, such as its name, description, and input/output format. Modulus Labs also offers an API that allows dApp developers to access their ML models. They do this by sending queries and receiving responses via smart contracts.
They employ a specialized ZK prover, named Remainder, to create zero-knowledge proofs for each query. These proofs are also published on the blockchain. They confirm the model's output was correctly calculated from the input, based on the model's logic and parameters, without disclosing any information about the model itself.
Modulus Labs allows anyone to verify the ZK-proofs on the blockchain using a ZK verifier, which can be implemented in any smart contract language. The ZK verifier checks the validity of the ZK proof and returns true or false. This enables a new paradigm of decentralized and verifiable AI, where ML models can be verified and shared across a distributed network of nodes without compromising their security or performance.
Decentralization can address challenges in AI model building, including the need for large data and compute resources, privacy and security concerns, and model verification. Many of the problems that we covered haven’t been able to find sufficient solutions yet, which shows both the great potential of the possible solutions and how tricky the situation is in the first place. As a core part of the AI process, model building is a step that deserves attention in terms of the decentralized AI landscape and AI discussions in general.
In the next article, we will explore how retrieval augmented generation can complement model building and the decentralized solutions that can further increase RAG’s potential impact.