When you go to curate content from your training posts, it seems that you aren’t seeing the articles that you think you should. Maybe you’ve seen some posts on a blog whose RSS is in your sources, yet they don’t show up in the Training page. Or you’ve go some Google Alerts but they don’t seem to bring anything back, yet a search of the keywords shows some results.
There is an easy way to see what is happening – using the Logs menu item for MyCurator. Every article that MyCurator sees from your RSS feeds shows up somewhere in these logs, either in the Activity or the Error section. It may be that your Topic search terms are too restrictive and you are excluding too many articles. The sections below will help you understand how to read and interpret the logs.
If the logs have very few or no articles in them and no errors with the feed in the Error Log, what does that mean? The content of the feed is controlled by the site that hosts the feed. They may be limiting articles to their feed or may have a technical problem. In either case, you would have to work with the host site to try and figure out the problem.
When you first get to the Logs page, you will see the Activity log. This log shows what happened to each article that was read in your feed sources. The log shows the date/time, Topic, a Message and the URL of the actual article that was processed. You can choose a specific Topic to filter the activity for just that Topic using the drop down in the upper left of the page and then clicking Select Filter. Some of the messages and their meaning that you might find are:
- New live/good/bad/not sure post – This means the article was classified as listed and posted to the Live site (New Live post) or the training page for all other types.
- Post too short – this will include the length of the actual article in words for your review, and means the article was not posted because it was too short.
- No Search 1 word – this will include the word that was not found. The article was not posted.
- No Search 2 words – none of the words listed in the Search 2 keywords for this topic were found so the article was not posted.
- Found excluded word – this will include the word found. No article was posted.
- No Image found – the article was excluded because MyCurator couldn’t find an image or couldn’t extract an image.
The error log is found by choosing Error from the drop down (usually says Activity) from the top left of the page and then clicking Select Filter. The error log shows the date/time, a Topic, a message and the URL of the feed or article that caused the error. You can filter errors for a specific Topic using the Topic drop down. Some of the error types and their meaning are below:
- Error Rendering Page – This is the most common error and means that MyCurator could not extract the text from the web page. This happens most often if the page is a PDF, but the technology we use – DiffBot, while highly accurate, does have problems with some web pages and certain types of pages. Sometimes, a web page is not able to be rendered the first time, but it is then captured correctly in subsequent processing. Because of that, MyCurator will keep trying to process the page until it drops out of the feed. This means you will see the same page show up in the error log many times. You can click on the URL to see the original web page (and try to guess why the software couldn’t ‘see’ the text!).
- No rss link for feed – The entry in the Links page doesn’t have a Feed URL entered. Using the URL entry in the log, you should update the Links entry for this feed.
- A feed could not be found or Invalid feed or some other feed error message – The Feed URL was not able to be processed. Check the Feed URL in the Links table by trying to Browse it and see if it looks like a feed (all the text thrown up on the screen). Maybe its the web page URL and not the Feed URL? Maybe it is actually a broken feed.