Wikipedia may be the go-to resource on almost everything these days, but according to Meta, it's filled with dodgy, inaccurate citations.
But don't worry, the company says its AI is here to help, having developed Sphere, a model capable of automatically scanning hundreds of thousands of citations at once to check whether they truly support the corresponding claims.
Meta claims it created a new dataset of 134 million public web pages as a knowledge source for the model, which says is "an order of magnitude larger and significantly more intricate than ever used for this sort of research".
Sphere uses open web data rather than traditional, proprietary search engines such as Google, and has already compiled 134 million documents from across the web.
Built using CCNet, a variant of Common Crawl, Meta says Sphere will help other AI researchers working on knowledge retrieval projects.
Meta says the eventual goal of the project is to build a platform to help Wikipedia editors systematically spot citation issues and quickly fix the citation or correct the content of the corresponding article at scale.
The company is not partnering with Wikimedia on the project, which is still in the research phase and is not being used to automatically update any content on Wikipedia.
The tool reportedly calls attention to questionable citations, allowing human editors to evaluate the cases most likely to be flawed without having to sift through thousands of properly cited statements.
If a citation seems irrelevant, Meta says its model will suggest a more applicable source, even pointing to the specific passage that supports the claim.
You can grab the source code for the project on GitHub here, and interested parties can also read a full write-up of the project's findings here or access the demo here.
- Implementing your big data projects in the cloud? Check out our guide to the best cloud hosting.