Thursday, June 18, 2015

BCS Models - Part 5: The Bigger Database

This post is part of an eight part series describing the process I have followed to be able create a BCS model capable of indexing well over 10 million items.

Series Index:
BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database
BCS Models - Part 5: The Bigger Database  <-- You are here
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl
BCS Models - Part 8: Crawl Results

From here on I'm going to use the full size StackOverflow extract.  This ramps up the record count significantly!

View  Row Count
uv_AllAcceptedAnswers  
5,108,765
uv_AllAnswerComments  
9,912,629
uv_AllComments  
36,298,959
uv_AllQuestionComments  
15,459,565
uv_AllQuestions  
9,045,951
uv_AllResponseComments  
10,926,396
uv_AllResponses  
9,965,807
uv_AllUsers  
4,018,611

Clearly no amount of cheating and upping the BCS Throttle Threshold will suffice for this much data.  We need to take smaller bites.  This is where the concept that Brian Pendergrass document in his blog post comes into play and the blog post is a must read.  Go read it.  Now.

...

Now that you've read that blog post, the key is that much of the 'magic' happens within the BCS Model construction.  There's a requirement that the model be able to return an enumerator of Containers and that each container in turn is associated with the Line of Business objects, in my case the various views above.  Interestingly the best documentation I found for this piece of the puzzle was in comments for some sample code that the MS Office Development team posted.

The magic is to have a single Finder on a Single entity in the entire model that has the RootFinder Property set on its MethodInstance.  As you may have noticed with BCS crawls, the Search Gatherer executes all of the Finders with the RootFinder property at the same time.  This is just half the answer though.

The other half of the solution is to create an Association of type AssociationNavigator (default that SPD creates) and include a DirectoryLink property on the MethodInstance.  This bit of magic tells BCS to treat the Source of the Association as a Container.

A significant item here is that although you can place Containers within Containers, the Container Entities cannot be crawled incrementally.  They are always crawled with a Full Crawl.

The cool thing though is that you can create an artificial entity to generate the Container groupings and Associate them to the various other Entities and at the same time define Associations between the Entities that are not used by crawl, but rather the Business Object Web Parts.

This means we don't have to create a Custom Connector for SQL Content just because we have a lot of data.  We don't have to worry about implementing ILobUri or INamingContainer or any other .Net abstract class.  It probably wouldn't matter a whole lot because SharePoint's going to call it the same way it calls the existing SQL Connecter since BCS is abstracting the nasty implementations away from Search.

In the next posts I'll modify the Model and create the supporting SQL Objects to support the model.  At this point I doubt that I'll be able to actually run the crawl to completion in my VM, especially as it blows right through the 10 million item limit for a single index column.

No comments: