Throughout history, people have constructed knowledge collaboratively as a public good, and the internet has made it accessible to everyone.
Thanks to the Internet, we are approaching an era where knowledge doubles every 12 hours. For the sake of comparison, in 1900, human knowledge had been expanding roughly every century. By the end of 1945, this pace had accelerated to every 25 years.
Wikipedia stands as a prime example of this collaborative development of knowledge. Now, in the era of AI, we are witnessing a paradigm shift.
Developers' adoption of AI models grows daily, i.e., 250,000+ models on Hugging Face, fueling the demand for ever-evolving knowledge to be integrated with LLMs.
Despite the exponential growth of AI, there remains a significant gap in accessible, collaborative platforms for knowledge sharing and retrieval.
Training new LLMs or fine-tuning them with fresh knowledge is insufficient to keep LLMs up with the expanding human knowledge.
The landscape of information has continually evolved. In the digital age, the internet revolutionized the dissemination of knowledge, transforming paper documents into web pages. Google emerged as a dominant force, indexing websites and curating content, establishing a well-understood digital creation and discovery cycle.
Now, we stand on the brink of another transformative era with LLMs. LLMs are redefining the paradigms of information creation, access, and sharing. As we shift, traditional websites are transitioning to Public RAG models, where AI intricately interfaces with information.
In a pivotal moment in the annals of internet history, we find ourselves at a crossroads regarding the status of knowledge as a democratized public asset. We stand on the brink of an era where knowledge, once freely accessible and shared, risks becoming a centralized commodity.
This new reality threatens to transform knowledge into a private, costly resource accessible only to a privileged few.
It's a time when open science, collaboration, and unrestricted knowledge sharing are more vital than ever. These practices have long fueled human innovation, breaking down barriers and fostering groundbreaking discoveries.
The story of our future hinges on maintaining and enhancing access to global knowledge. It's not only a matter of principle but a foundational necessity for future generations. The ability for humans to share and retrieve the vast expanse of world knowledge freely must remain a fundamental right.
In this way, we can ensure that the democratization of knowledge, a principle so integral to our success, continues to guide us into a future marked by inclusivity, advancement, and open connectivity.
Cost of Knowledge:
RAG (Retrieval-Augmented Generation) is more effective at incorporating knowledge into LLMs compared to unsupervised fine-tuning (USFT). It has also been observed that using RAG on its own outperforms the combination of RAG and fine-tuning.
Dria serves as a collective memory for AGI, where information is stored permanently in a format understandable to both humans and AI, ensuring universal accessibility.
Dria is a collective Knowledge Hub.
Knowledge Hub consists of small-to-medium-sized public vector databases called knowledge.
A knowledge can be created from a pdf file, a podcast, or a CSV file.
Vector databases are multi-region and serverless.
Dria is fully decentralized, and every index is available as a smart contract. Allowing permissionless access to knowledge without needing Dria’s services.
Dria provides:
Knowledge uploaded to Dria is public and permanent.
Dria supports index types: HNSW, ANNOY
Annoy is static.
HNSW is dynamic.
Vector databases are multi-region; requests are geo-steered. Regions currently include: **
us-east-1
us-west-1
eu-central-1
ap-southeast-1
Dria has no cold start for retrieval.
The benchmark for deep-image-angular-96 with top_n=10, requests sent within the region. QPS wont throttle but performance may reduce after a certain threshold.
1-) Cost of Knowledge at Its Lowest:
Dria modernizes AI interfacing by indexing and delivering the world's knowledge via LLMs.
Dria’s Public RAG Models Democratize knowledge access with cost-effective, shared RAG models.
Today, Dria efficiently handles Wikipedia's entire 23GB database and its annual 56 billion traffic at just $258.391, a scale unattainable by other vector databases.
Dria operates as a Decentralized Knowledge Hub serving multiple regions, offering natural language access and API integration.
Dria supports multiple advanced indexing algorithms and embed models. This offers the flexibility to seamlessly switch between algorithms or embed models using the same data, ensuring consistently state-of-the-art retrieval quality.
2-) Contributing is easy and incentivized for everyone:
Dria’s Drag & Drop Public RAG Model effortlessly transforms knowledge into a retrievable format with an intuitive drag-and-drop upload feature.
As a permissionless and decentralized protocol, Dria creates an environment where knowledge uploaders can earn rewards for the value of their verifiable work:
Users worldwide can contribute valuable knowledge with permissionless access to shared RAG knowledge for LLMs, applications, and open-source developers.
If other participants query the knowledge and produce valuable insights into AI applications, the users will earn rewards for their verifiable contributions.
Users can then use these rewards to upload or query more knowledge into the collective memory.
3-) Safe, Trust Minimized, and Open Collaboration:
Anyone can run RAG models locally through smart contracts, enabling permissionless access to world knowledge.
Dria stores all of the world's knowledge into a public ledger called Arweave, a decentralized storage network designed to offer a platform for the permanent storage of data.
Arweave's main value proposition is that it allows users to store data permanently. Once something is uploaded to the Arweave network, it's intended to be stored forever without the risk of data loss or degradation.
Full persistence ensures accountability against manipulative AI technologies that proliferate during elections, wars, and existential crises. Communities can reevaluate AI knowledge, establishing transparent mechanisms for safeguarding against manipulation.
Another use case centers around deepfakes; by uploading detected deepfakes to DRIA, users and applications worldwide can query suspicious videos to ascertain their authenticity.
Search through all public knowledge with a single query. The Librarian is a model of models. A macro index to propagate a query to related knowledge and perform retrieval.
Search through multiple knowledge with a single query enabling more flexible retrieval.
It’s essential for Dria to verify that:
Python and JS clients for:
Allowing multiple parties to contribute to a single knowledge without compromising the control over the content. This feature will enable a dynamic and collaborative way to enrich the public vector databases. Insertion won’t depend on Dria services but will be fully executed on the contract level.
Contributing this article to the Collective AI Memory:
We are contributing this article to Dria, where it will be accessible to everyone, can be run locally, and will live forever.