Reference Information: About the Number of Sources and Content Retrieval

This topic provides information on how many sources you can have in Corpus Manager and on how long Corpus Manager keeps the content it has retrieved.

This page discusses:

About the Number of Sources

The maximum number of sources you can add depends on the role of the platform you are on. This number is either 5,000, 25,000, or 100,000 sources per platform and is shared across all the users of the platform.

When you or another user reaches the platform's maximum number of sources, the following icon appears on all the Corpus Manager widgets of the platform: e.

Only the sources added before reaching the limit are crawled. Sources added after reaching the limit are not crawled.

You can see the number of sources contained in your Corpus Manager, above its list of sources. However, you cannot see the total number of sources that have been added across your platform.

The number of sources is determined as follows:

  • Adding a source to the root of your Corpus Manager counts as one source.
  • Adding a source to a Library always counts as one source, regardless of the number of members for this Library.
  • Adding the same source to several Libraries counts as one source per Library. For example, adding the same source to three different Libraries counts as three sources.
  • Adding the same source to the root of your Corpus Manager and to one or several Libraries counts as one source for the root and one source per Library. For example, adding the same source to the root of your Corpus Manager and to three Libraries counts as 4 sources.
  • Adding a topic search counts as one source per service for each searched keyword. For example, adding a 3DS and Dassault Systèmes search on two services counts as 4 sources.
  • Sources from Industry Libraries (that you can select when configuring Social Analytics and Tracked Topic widgets) do not count toward your platform limit.

Content Retention

Once you have added a source to your Corpus Manager and selected it as a Social Analytics or Tracked Topic source, Corpus Manager regularly checks this source for new items. Depending on the source, items can be an article, a Tweet, an image, etc.

Corpus Manager saves all the items it has retrieved for a default retention period of 90 days, meaning that all items remain available in Corpus Manager for 90 days even when they have been deleted from their sources.

Once an item is 90 days old, Corpus Manager deletes it when it retrieves a new item (FIFO system).

Note: The retention period of your own 3DEXPERIENCE platform may differ from the default retention period. For more information, contact your administrator.

Crawling Frequency

Corpus Manager automatically checks for new items by regularly reading the data of the sources you added to your Corpus Manager. This is called crawling. The crawling frequency depends on the source popularity, activity, or rules.

Note: You cannot manually make Corpus Manager check for new items.
Source Characteristics Crawling Frequency
Active sources; that is, the sources that publish the highest number of items. Usually varies from every 10 minutes to every 4 hours. The more active a source, the more often it is crawled.
Inactive sources; that is, sources that have not published any new items in a very long time. Varies from a few times a day to a few times a week.
Sources in error. Varies from a few times a day to a few times a week.
Sources that have not been used by any Social Analytics or Tracked Topic widget for a period that exceeds the platform's retention period multiplied by 4. Never crawled.

Limitations

Some sources, especially RSS feeds, limit the number of times they can be crawled over a certain period of time. In this case, Corpus Manager automatically adjusts its crawling frequency to abide by the RSS feed rules.

A source can forbid Corpus Manager from crawling it at any time. In this case, Corpus Manager stops trying to crawl this source and displays the following icon to inform you that this source has become unreachable: e.