Product Selector

Fusion 5.12
    Fusion 5.12

    Data SourcesApplications UI

    A data source is a configuration that manages the import and indexing of data into an application. In the Springboard UI, data sources contain the information you want Springboard to search. For conceptual information, see Data sources.

    Data Sources screen

    To open the screen, select the Data Sources icon from the Applications UI sidebar or Manage from the Data sources section of the Hub screen.

    The Data Sources screen displays a list of existing data sources with the following information:

    Field Description

    Name

    The unique name that identifies the data source configuration in that Springboard application.

    Labels

    Optional short identifier. The first label is used when sorting the data sources by the Labels column.

    Status

    The current status of the data source. To view the Last run date and time, point to the status text. The statuses are:

    • Creating. The data source is being created.

    • Queued. The data source is in the queue for data ingestion.

    • Running. The data source is ingesting the data.

    • Complete. The data source has completed ingesting data.

    • Failed. The data source encountered an error during processing.

    • Authentication Failed. The authentication credentials for the data source are incorrect, and the data source could not be crawled.

    • Cleanup Skipped. If a crawl detects changes that reduce the overall size of the data source by 30% or more, the crawl errors out and the deletion process does not proceed. For more information, see Existing URL crawl logic.

    • Active. This status only applies to push data sources when the data source has been created successfully and is ready to receive data.

    Next Run

    The date and time of the next scheduled crawl of the data source. The format is MONTH DAY, YEAR at hh:mm AM/PM in the time zone of the user’s browser session. For example, November 21, 2022 at 8:30 AM.

    The Next run field is empty for push data sources because it is not applicable.
    If you haven’t configured any data sources yet, a prompt to configure your first data source displays.

    To view detailed information or edit or delete the data source, point to the entry in the list and click View/Edit. The Edit Data Source screen displays.

    Details tab

    The Details tab displays the current configuration. For more information about how to edit or delete the data source, see Manage Springboard data sources.

    History tab

    History tab for non-push data sources

    For web and other non-push data sources, the History tab displays the 20 most recent data source crawl entries.

    Click the arrow beside a column title to change the sort order.
    Field Description

    Job ID

    The unique identifier for the data source crawl job. You can click the Job ID to display the Job ID detailed run report.

    Start Time

    The date and time the crawl started based on the user’s browser timezone settings. The format is MONTH DAY, YEAR at hh:mm AM/PM. For example, January 21, 2022 at 6:12 PM.

    Duration

    If the status of the crawl is Finished, Failed, Authentication Failed, or Cleanup Skipped, the column displays the length of time from Start Time until the crawl finished or generated the error. The format is hh:mm:ss. The column displays an ellipsis …​ if the status is Running.

    Status

    The current status of the data source crawl. The statuses are:

    • Running. The current data source crawl is in progress.

    • Finished. The specific occurrence of the data source crawl finished.

    • Failed. The data source encountered an error during processing.

    • Authentication Failed. The authentication credentials for the data source are incorrect, and the data source could not be crawled.

    • Cleanup Skipped. If a crawl detects changes that reduce the overall size of the data source by 30% or more, the crawl errors out and the deletion process does not proceed. For more information, see Existing URL crawl logic.

    Job ID detailed run report

    The Job ID detailed run report only applies to non-push data source crawls, and provides job information about pages or files that were:

    • Processed and indexed

    • Excluded because the crawl encountered an access, request, or response error

    • Skipped due to an issue that prevented the page or file being indexed

    One important benefit of this report is that you can investigate why specific documents were not indexed, and make any necessary corrections.
    Field Description

    Job ID

    The unique identifier for the data source crawl job.

    Back arrow

    Displays the History tab for the data source.

    Search

    Enter a term or phrase from the URL, Referrer URL, or Message columns and press Enter to display the page and file entries containing that text.

    To clear the Search field, click X. When the field is cleared, all entries for that report display.

    Crawl Status filter

    Select a status to display only those entries. Options are: All, Error, Indexed, and Skipped. All displays every Error, Indexed, and Skipped page and file entry for that Job ID.

    Crawl Status

    Indicates the status of the page or file entry for that Job ID. Statuses are:

    • Error. Failed due to an error. For example, if the page or file cannot be discovered, a 404 Not Found error displays.

    • Indexed. The page or file processed and indexed without errors.

    • Skipped. The page or file was not processed, and is excluded from search engine results. For example, the page or file is listed as an excluded link or is not in an allowed domain.

    URL

    The URL the job crawls.

    Referrer URL

    The URL where the page or file entry was discovered.

    Message

    Information about the page or file and any processing details pertinent to the entry.

    • Error status messages include response codes such as 404. Not found, 429 - Too many requests, and 5xx messages.

    • Indexed status messages include information such as page or done-processing.

    • Skipped status messages include:

      • No index. The crawl determined the entry should not be indexed.

      • No follow. The crawl determined that search engines should not follow the links on this entry.

      • Exclude links. This entry is associated with a value entered in the Exclude links field.

      • Missing title. The entry’s document.title does not exist.

      • Out of domain. The entry is in a domain excluded from the crawl based on the data source configuration.

      • Size exceeded. The total content size of the entry is larger than 1 million bytes.

      • Crawl depth exceeded. The entry depth level is higher than the value in the Limit crawl levels field.

        The message may also specify a URL associated with the reason the crawl skipped the entry.

    Page selection

    Select Previous, Next, or a specific page of results.

    Jump to

    Select a specific result page number to display.

    History tab for push data sources

    For push data sources, the History tab displays the 20 most recent entries for batches that are submitted for processing.

    Click the arrow beside a column title to change the sort order.
    Field Description

    Job ID

    The unique job identifier for the specific batch associated with the push data source.

    End Time

    The date and time the push data source batch either failed or was successfully processed.

    Status

    The status of the specific batch associated with the push data source. The statuses are:

    • Finished. Processing for a specific batch associated with the push data source finished.

    • Failed. One or more errors were detected in the specific batch associated with the push data source and the batch could not be processed.