Wikipedia:Wikipedia Signpost/2016-02-10/In focus

{{Wikipedia:Wikipedia Signpost/Templates/RSS description|1=An in-depth look at the newly revealed documents: Three internal communications from the Wikimedia Foundation that shed light on the history of the Knowledge Engine project.}}{{Wikipedia:Signpost/Template:Signpost-header|||}}

{{Wikipedia:Signpost/Template:Signpost-article-start|{{{1|An in-depth look at the newly revealed documents}}}|By Andreas Kolbe| 10 February 2016}}

{{signpost inline image|image=File:Discovery Example Breakdown.png|caption=A slide from an already-public November 2015 presentation on Discovery by the Wikimedia Foundation}}

This week's "special report" discusses three internal documents from the Wikimedia Foundation that shed light on the history of the Knowledge Engine project. Here, we examine each one in depth.

="April 2 – FINAL – Knight Search Presentation – 04.02.15"=

{{Signpost series|type=sidebar|tag=knowledgeengine|seriestitle=Knowledge Engine|break_date=}}This is a short, 12-slide presentation arguing that commercial search engines "decide and determine" "how people find information" and "what they find", adding that they "highlight paid results, track users (sic) Internet habits, sell information to marketing firms" and are "biased towards profit over communities".

Wikipedia, on the other hand, is characterised as follows:

{{Wikipedia:Wikipedia Signpost/Templates/Quote|Wikipedia's Roots

Protects consumer/user profit
Community-built
De-biased by design
Local in 287 languages

No other search engines carry these ideals

Wikipedia Search Originates

Private and secure
Transparent results rankings
Locally relevant information
Global representation in all languages and cultures

No other search engines carry these ideals

Wikipedia Search is ...

Trusted. Private. Open.

Wikipedia Search

Globally democratizes knowledge.

}}

The presentation concludes with screen mock-ups of what a Wikipedia search engine could look like, highlighting content from Wikivoyage, Openmaps, Fox News, Wikipedia and Wikidata.

="June 24 Attachment 1 of 2 – Knowledge Engine by Wikipedia"=

Marked "CONFIDENTIAL – DRAFT", this 11-page document addressed to the Knight Foundation has the headline "Knowledge Engine by Wikipedia: A Proposal from the Wikimedia Foundation".

After briefly describing the history and achievements of the Wikipedia project, the document states:

{{Wikipedia:Wikipedia Signpost/Templates/Quote|The Wikimedia Foundation is embarking on a new global project that will once again change the way people access knowledge on the Internet. Knowledge Engine By Wikipedia is a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web, applying fundamentals of transparent Wikibased systems to surfacing the most relevant and important information. Knowledge Engine By Wikipedia will democratize the discovery of media, news and information – it will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests. Our new site will be the Internet’s first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation.

The Problem

The emergence of the Internet had promised massive democratization of content delivery. On

the creation side, that promise has been largely fulfilled. Any person can easily add content to

the enormous internet system.

Simultaneously, as the availability of this information exploded, a few proprietary technologies

began to consolidate channels of access to this data. This is accomplished through

consolidation of access points into giant enterprises that today control user interfaces through

device access, search, and media networks. The mechanisms by which the information on the

internet is collected and displayed is largely obscured by proprietary algorithms.

An exception to this pattern is Wikipedia. As a nonprofit, ad-free and collaboratively built site it

has no incentives leveled upon the commercial systems. It is fully transparent in what

information takes precedence, and how it is produced. It does not use personal data to market

or sell to users or to optimize for ad revenue, and it prioritizes personal information security to

avoid undue bias or censorship. In other words, it is aligned with user needs for transparency,

clarity and trust.

The Solution

Knowledge Engine By Wikipedia will differ from commercial search engines in key areas:

Public curation mechanisms for quality
Transparency
Open data access to metadata
Protected user privacy
No advertisement
Internalization

Knowledge Engine By Wikipedia will surface important noncommercial results that are:

High quality, highlighting web pages that have depth and factual currency.
Credible, with knowledge sources that have earned readers’ confidence.
Trustworthy, with pages that are elevated for accuracy and curated publicly.
Transparent, giving users an open and truthful assessment of what they’re reading.
Publicly curated, with users helping designate the most reliable pages.
Open source, so anyone can use the results (and our software) without restriction.
Secure, so users know we won’t mine their searches and sell that information for profit.
Unbiased by commercial concerns. That’s the Wikipedia way.

How Is It Different?

The goal of today’s commercial engine is to give the user what they (or the interested party) think they want to know – the fact and data about a query: a medicine sold by a drug company, a movie ticket, or a most popular result.

The knowledge engine of tomorrow will guide the user to discover what they need to know that is only available with a crowd-based knowledge engine: a new or alternative medicine producing better results at a lower price point, a book summary and source language and versions of the movies based on it, the most relevant result to the user’s area of exploration.

Current engines rely on indexing and interlinking as the primary method for identifying and highlighting relevant results. In a world where data proliferation is rapid and unabiding, Wikipedia has a few advantages:

Federating all open data via a structured index (Wikidata) into distributed data sources (both on and outside current Wiki projects) allowing for ease of translation, formatting a quality ranking.
Open curation via vast, international community of editors.
A global network of partners contributing information to the engine (galleries, archives, institutions, governments, etc.)
User-centric privacy mechanisms and interests that allow users to easily contribute knowledge and donate their own information.
Open data and metadata access for any party to develop interfaces and research based on the knowledge data.

Our Knowledge Engine Will Be:

Performance Based

We are building a knowledge engine that has speed, open data, and relevance at its core. A

new entry point to the sum of all knowledge, Knowledge Engine By Wikipedia has the

responsiveness of commercial search engines and the ethos of Wikipedia and the Wikimedia

Foundation.

An Efficient Experience

Quality is more important than quantity. The user doesn’t always need 10 or 20 or 200 results – they need the right set or even one result that provides a sufficient amount of knowledge with the contextual discovery to dig deeper. Still, in most searches, our knowledge engine will uncover a multitude of quality results, which should encourage a “down the rabbit-hole” discovery experience. The engine’s speed will bring consistency across the user interface, configuration options that adapt to users’ preferences, and an ease of experience that lets the user concentrate on the discovery task rather than the interface. Speed is crucial for global enablement but also for getting things done. Quickness and quality will be hallmarks of Knowledge Engine By Wikipedia.

Openly Curated

We are building a unique engine that sets us apart from commercial engines. Our knowledge engine leverages open data sources and champions an open understanding of where and how the results are calculated and curated. We have the unique opportunity to merge open knowledge graphs and data sources in a federated landscape. By combining human and machine curation, we are forming a holistic, usercentered model to drive our knowledge engine.

A Multifaceted Tool

Knowledge Engine By Wikipedia is much more than a search input – it’s like a collection of powerful apps and portals rolled into a singular interface and input. We’re creating a tool where questions like “show me the progress of an event” display contextual maps and timelines, and where a query reveals multiple types of media and data displayed with charts and visualizations – all in a way that illustrates quicker and more completely than text alone. With Knowledge Engine By Wikipedia, the user instantly gets the context of a query in a larger perspective.

From an Open Community

We’re focused on creating resources and tools for an open knowledge-engine community, and building on the input of an advisory team. We will strengthen the Application Programming Interface and the resources around the knowledge engine to enable us and others to build, contribute to, and extend the engine. “Openness” – through curation, sourcing, and community – means everyone can contribute to Knowledge Engine By Wikipedia, and everyone can use the results and software without restrictions. It's what the Internet was meant to be and it’s what Wikipedia is, and what our knowledge engine will be, too.}}

This is followed by a set of screen mock-ups labeled "Trending", "Multimedia Content", "Smarter Answers" and "Nearby" and an outline of the four stages of the plan:

{{Wikipedia:Wikipedia Signpost/Templates/Quote|The Plan in Four Stages

We anticipate each stage will take 16–18 months to develop and transition into the overlapping stages. The Discovery stage has already begun, and each stage has the potential to overlap with other stages.

Discovery: Instrument user flows, performance and API usage of existing engine. User labs and testing of concepts. Prototype engine concepts and stabilization of api using multiple internal assets.
Advisory: Advocacy and review of engine. Open and anonymous knowledge resources added to engine. Promote embed of engine in additional platforms.
Community: Establish an open source project group for discussion and advisory, and dedicated development portal. Expand usage of api and engine to wider community adoption. Establish curation process.
Extension: Strengthen API support and standards. Integration of external sources into core search. Expand curation efforts. Expansion of features and widgets to promote engine.}}

There follows a timeline graphic and a more detailed description of these four stages, each comprising an introductory paragraph followed by an average of half a dozen bullet points. The document concludes with the table of costs reproduced on [https://wikimediafoundation.org/w/index.php?title=File%3AKnowledge_engine_grant_agreement.pdf&page=9 page 9] of the Knowledge Engine grant agreement, appended to which is the following:

{{Wikipedia:Wikipedia Signpost/Templates/Quote|If we see significant progress on the project during the first six months of the fiscal year (July December 2015), we may petition the Wikimedia Foundation Board of Trustees for permission to seek and spend additional resources in support of the project.

Future Fiscal Years

We anticipate future years’ budgets to increase by 20% per year as we accelerate the growth of

the program.

Projected future budgets

FY 16–17: $2,900,000

FY 17–18: $3,500,000

Request of the Knight Foundation

To support the project, we respectfully request $2 million per year for three fiscal years, which

would make the Knight Foundation Knowledge Engine By Wikipedia's primary initial sponsor.

The remaining initial support will come from the Wikimedia Foundation's general fund or from

additional restricted grants. To identify other foundations that would support Knowledge Engine

By Wikipedia, we welcome your suggestions and assistance. Thank you.}}

="August 2015 – WMF Submission to Knight"=

The formal grant application, requesting a much reduced $250,000 from the Knight Foundation, summarizes the proposal as follows:

{{Wikipedia:Wikipedia Signpost/Templates/Quote|Knowledge Engine By Wikipedia is a federated knowledge engine that will give users the most reliable and most trustworthy public information channel on the web, applying fundamentals of transparent Wiki-based systems to surfacing the most relevant and important information.

The funds requested are in support of Stage One of this project.}}

The remainder of this document is largely reproduced on the latter pages of the grant agreement itself.

{{Wikipedia:Signpost/Template:Signpost-article-comments-end||2016-02-03|2016-12-22}}