Overview
Mainstay's generative AI features are able to provide institution-specific responses based on content we scrape from your websites and documents. This tool allows you to indicate which pages we should pull information from, and lets you test the generative AI by asking it questions.
How You Can Use Scraped Content
You can use scraped content during AI-assisted Live Chat or when creating or updating Understandings by using the Firefly / AI button (Knowledge Base Responses). However, scraping alone does not automatically generate responses, update existing Understandings, create new Understandings, or respond directly to Contacts. The bot will only use scraped information if it has been manually reviewed and added to an Understanding.
If you want the bot to generate responses based on scraped content automatically, you’ll need to enable Flash Responses. This feature allows the bot to pull from your scraped knowledge sources without requiring manual approval. Flash Responses is an experimental feature in beta testing. To enable this for your institution, please reach out to your Partner Success Manager to learn more and activate it.
Ask the AI
After selecting and scraping knowledge sources (see below), you can test your coverage by asking the AI a question. This tool uses the same AI settings/prompt as KB Response Generation and AI-Assisted Live Chat.
When the suggested response was crafted using your scraped Knowledge Sources, those will be indicated for reference:
Knowledge Sources
Adding Sources
To add a new knowledge source, click + New Source:
There are four types available:
- Specific Webpages: one or more URLs. Mainstay will scrape the content from these individual pages.
-
Domain/Site Section: a full URL. Mainstay will scrape the content from this page and other pages on the site that include this URL.
- So for example, if you input "https://example.com/a", we will scrape that page, as well as "https://example.com/a/b" and "https://example.com/a/b/c".
- However, we would not scrape "https://example.com/x", "https://something.example.com/a" or even just "https://example.com" by itself.
- PDF: a URL to a hosted PDF document. At this time, Mainstay does not allow you to upload a document directly, but you can reference a PDF document that is available online, such as one on your domain or one you upload and make public on Google Drive or similar.
-
Google Document: a URL to a Google Doc that is publicly accessible, or shared with an @mainstay.com email address.
A new knowledge source will be scraped immediately. You can also trigger a re-scrape of all existing sources by clicking Scrape All.
Scraping Queue
If you add or rescrape multiple sources at once, they will go into a Queue:
The system will process these one at a time, oldest to newest:
When a source is done scraping, its Status will change to either "Success" or "Fail", its Last Scraped will change to a timestamp indicating when the scrape completed, and the header text may change to the webpage's title.
The page will automatically update as these statuses change, meaning you will see a pattern of statuses like this gradually moving upwards:
Filtering & Bulk Editing Sources
You can search for sources by URL and/or title using the Search input.
You can also filter sources by status or failure message. (See "Scraping Errors" below for more details.)
When a search or filter is applied, you can also Rescrape just these sources, or instead Delete just these sources.
Individual Sources
Below these options is the list of all knowledge sources you have selected. Each includes:
- Title: the <title> element from the webpage or the name of the PDF file.
-
URL: the full link to the webpage, Google Doc, or PDF file
- Note: if you selected "Domain/Site Section" above, that will become multiple knowledge sources, where each one represents a specific page we've scrape.
- Last Scraped: the date and time that Mainstay last scraped the webpage or document
- Status: Queued | Started | Success | Fail
From the ... menu on each knowledge source, you can take the following actions:
- Edit: Update the Title and add an optional Description.
- Scrape: Trigger a re-scrape of this knowledge source. This is helpful if that page or document was recently updated.
- Delete: Remove this knowledge source and all content we've scraped from it.
Note: Only admin users are able to add, edit, and delete knowledge sources.
Scraping Errors
Here are the possible Fail statuses you may encounter while scraping sources:
403. Forbidden.
The webpage is not accessible to our scraper. This may be because it's password-protected or intentionally blocking bots. If you control this website, investigate whether you have any bot restrictions in place.
403. This Google doc is not shared.
The Google doc must be publicly accessible in order for the scraper to view its contents. Update the sharing settings so "Anyone with the link can view".
404. Content not found.
The URL provided does not resolve to an accessible webpage. Check that the URL is valid by entering it into your browser directly. If you selected "Domain/Site Section", we get a list of top pages from Bing and attempt to scrape those, so if you're seeing this, it means Bing has indexed pages that no longer exist.
404. This file is currently not accessible.
The PDF URL provided does not resolve to an accessible file. Check that the URL is valid by entering it into your browser directly. An alternative solution is to copy the text content into a Google doc and use that instead.
408. Attempting to load page timed out.
The webpage took too long to load, so the scraper was not able to parse its contents. Check that the URL loads when entering it into your browser directly.
422. Foreign language webpage detected.
The webpage has an explicit lang attribute set to something other than en. If the webpage contents are actually in English, ask your site admins to update the lang setting.
500. Scraper API ran into error.
500. Scraper API terminated connection.
500. Could not set up a secure connection.
These indicate an issue with the system we use for scraping webpages - often a "too many requests"-type problem. This is usually temporary, so if you try again later, the page should scrape.
Comments
0 comments
Article is closed for comments.