Documentation: Article Recommendation Files
Published on March 1, 2020 by Wojciech Gryc
If Chimera Information Systems (CIS) is generating article recommendations for you, you're likely getting a file with bullet points, article titles, and a few other things. We've designed this format to be easy-to-use and maximize our ability to iterate on the actual underlying models. This document will help explain the format and how you can use it.
When receiving an HTML file from us, each article will look like the one below.
Here is how to interpret each part of the document:
- Article Title: this is self-explanatory. Note that clicking the link will take you to a stochasticfutures.com URL. We use this server to redirect articles in case URLs change. Note that we don't track any personal information if you are clicking from an HTML file we've sent you.
- Article Source: this is shown as well, in brackets. Note that this is the source of the news we have found, but in cases where we are crawling other content aggregators or republishers of content, we attribute the article to where it was found, not its original source. Thus, if we are recommending an article found on your local news site, we'll attribute the article to that site, even if it was republishing an AP or Reuters article.
- Published Date: this is when the article was published based on the information given to us by the content source. In some cases, the published dates given by content sources are different from when they are actually published, or there is a delay between when something is published versus when we crawl it (either due to our infrastructure, or due to the way the content producer decides to publish things). We crawl every content source multiple times per day, but if you are having issues with this, then let us know and we can use other date rules (e.g., when we found the article, when we crawled it, or other approaches). We strive to use UTC for our time stamps.
- Article ID: the article ID is a universal identifier for each article. We use this to make it easy to reference articles. When giving us feedback about a specific article, please use this ID.
- Bullet Points: then come the actual bullet points! These are selected and extracted by the algorithm based on your specific needs. Depending on your goals, we can customize the model and reorder points. By default, we show sentences in order of appearance in the article, but content to three bullet points to keep each piece short.
Our broader modelling technology is also documented. If you have additional questions, please let your account manager know.