How Microsoft Start Ranks Content
Microsoft Start publishes news stories, photo galleries, and videos from thousands of publishers globally and promotes this content across Microsoft products, including Microsoft Edge, Microsoft Windows, MSN.com, and the Microsoft Start mobile app.
Each time a consumer views the Microsoft Start feed, it refreshes with the latest personalized content. Based on various signals, algorithms select and order content in the feed with editorial oversight. This ranking content system is designed to engage and inform, choosing stories that are the most relevant to each person, while also ensuring the content is timely, newsworthy, high-quality, and safe for work and home.
The relative importance of these parameters may vary each time a news feed is viewed by a user. The algorithms are always evolving as we continually identify and improve signals and experiment with new features.
Microsoft delivers every consumer a personalized news feed to meet each person’s unique set of interests and preferences for content. At the core of this personalization are algorithms that match user preferences with document understanding. These algorithms are designed to select the most relevant content for each user.
A user’s preferences are learned over time by the system through two approaches:
- Explicit personalization. The algorithm respects how users manually configure their settings, including actions like following certain topics, liking or disliking specific content, or indicating a publisher preference.
- Implicit personalization. In compliance with a user’s privacy settings, as a person reads content and engages with Microsoft’s products, the stories are analyzed for patterns to better understand the user’s preferences. The algorithms look both for long-term and short-term patterns for each user, acknowledging that content interests may vary in the short term, while exhibiting different long-term tendencies. (Read more about Microsoft Privacy here)
Machine-learning algorithms drive deep document understanding beyond simply recognizing ‘topics’: The system performs analysis on each document to get insights based on text and metadata and converts the content into a mathematical model.
The two mathematical models – user preferences and document understanding – can be compared to select content that is the closest match for each person.
Besides directly matching content with each user, the algorithms also search for content that engages users with similar preferences.
We want to promote the quality content in our feed that has no visual defects that result in a poor user experience. To avoid having your content see limited exposure make sure you are following the publishing guidelines including the following:
Long Blocks of Unformatted Text
Content that is high quality and without defects such as unformatted text has a higher probability of getting visibility in the feed. Make sure it has the proper sentence and paragraph structure and coding when you upload it. Articles without any line breaks or paragraph breaks will not be promoted in the feed.
Example of formatted text: <p>This is a paragraph.</p> (Note the opening and closing code).
Original Article Links
Links back to the original article can only appear at the bottom of the article. Posts that link back to the original within the main body of the text will have limited reach.
If you have links in your content you need to make sure they are formatted appropriately. For instance, if links in your content look like this: https://www.conotoso.com/r/linden-new-york then your content is not formatted correctly and may have limited exposure or be removed accordingly.
Valid Date Format
To ensure our content is relevant to our consumers we must make sure the articles we show them are accurately dated. Dates must be expressed using RFC 3339 or RFC 822 date formats.
Valid Date Format Examples
Wed, 04 Oct 2017 15:00:00 +0200
Headlines and Images
Click-through rate (CTR) — the number of clicks divided by the number of impressions — is one measure of engagement used in determining content ranking. CTR is primarily influenced by the elements of content that are shown when promoting the link, including the title/headline, image, and abstract. Machine learning judges the CTR potential for each piece of content.
Content with high CTR is generally good, although there is also a category of content that may have high CTR but also generates dissatisfaction from readers - clickbait. See more on this below in our section on Negative signals.
Freshness and Timeliness
Content in a news feed is expected to be “fresh” and timely. As a result, newer content ranks higher than older content, on average. The latest in news, finance or sports stories are important because these verticals tend to have stories that age quickly. The algorithms do recognize other topics tend to be more evergreen and allow this content to be older, recognizing that. Content with inaccurate published dates may be ranked lower.
Trends and Newsworthiness
On average, stories about trending topics, breaking news, and headline news are ranked more highly. The top positions in the feed are often reserved for the leading newsworthy news of the day.
Trends are observed by monitoring multiple external data sources, both public and proprietary. The system monitors what is trending on the internet at large, as well as what is trending on Microsoft canvasses and Bing search. These signals are combined and averaged to rate each content item on its potential to be trending: Items with more potential are ranked higher in the feed.
Stories from well-known national or global news publishers have a heavier weight, because consumers and publishers alike view them as more authoritative and trusted. However, stories from local or less well-known brands are also important components of personalized feeds and are often ranked highly due to other signals.
The algorithms do not yet consider authority by topic: for example, some publishers are more authoritative in the area of sports, while others specialize in politics. This is an area Microsoft expects to improve in future ranking updates.
Negative Signals and Clickbait
Some content generates clicks, but also generates dissatisfaction from users who perceive a headline to be misleading (not delivering the content the headline promised) or the story of lower quality: Examples include headlines that are misleading, exaggerate the story, or are overly shocking or emotional. Popularly known as clickbait, this content may be ranked lower based on user behavior that suggests dissatisfaction through actions such as high bounce rate.
Specific patterns include:
A headline that goes beyond a teaser, especially with the overuse of the adverb “this” (e.g., Never Drink This on a Plane).
A headline that misrepresents the actual story content and/or import, thereby failing to meet reader expectations. A broken promise may range from omitting an asset (such as a video referenced in the headline) or not addressing the key information at all.
With our continuing effort to improve product quality, we recently raised the bar for showing content which could drive discomfort to our audience. Your content may be affected if the title, body, or image is disgusting, or titillating. This includes topics like:
· Bodily functions (e.g., flatulence, excrement, urination, pimple popping)
· Sexually transmitted diseases
· Lewd encounters (e.g., meetings with sex workers, public sex, public nudity) where there is no broader societal relevance such as a political scandal.
· Stories about crimes that describe excessive detail (for instance, the specifics of a sexual act or a gruesome murder) that go beyond the bare facts.
· Titillating stories (e.g., sexual behaviors, adult sexual advice) which can be inappropriate to serve to all audiences.
· Sexual deviance and bestiality
· Animals having sex
· Decomposition (e.g., meat filled with maggots)
Depending on severity, such content may have limited exposure or removed accordingly, leading to an article-level impression decrease.
Celebrity Gossip: We are now limiting exposure of celebrity content focusing on (but not limited to) celebrity fights, sexualizing what celebrities are wearing, wardrobe malfunctions, day to day celebrity activities, relationship issues, etc. This content will still appear on Microsoft Start pages, however, will only be exposed to users who seek this type of content.
While engaging in nature, this type of content could have sensational titles that invite the user to click and read about a relatively minor event.