Tuesday, June 16, 2015

BCS Models - Part 4: Bigger Database

This post is part of an eight part series describing the process I have followed to be able create a BCS model capable of indexing well over 10 million items.

Series Index:
BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database  <-- You are here
BCS Models - Part 5: The Bigger Database
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl
BCS Models - Part 8: Crawl Results

So now I'm going to take my SQL Objects and BCS Model and adapt them for a new database, this one based upon Serverfault.StackExchange.com.  I did the same process as described in BCS Models - Part 1: Target Database, except that I could reuse my work.  Yay!

I used SQL Management Studio to generate a script of the Views, Stored Procedures and DB Role I'd created to assign rights.  I then modified the script to reference the new DB instead of the old, added the various grant commands to grant the DB Role access to the objects and ran it.  Instant Objects!

The BCS Model will be a bit more complex, but not much.  There's a few things to change within the Model to get it to connect to the new SQL Database.

  1. First I'm copying the existing model to a new one and changing the file name, in my case I'm naming the model ServerFault.StackExchange.bdcm.
  2. Now I'm going to open the model in a text editor.  I'm going to use Notepad++ and tell it it's XML since it won't detect it by extension.
    1. Change the Name property of the <Model > element to match the file name, without it's extension.  
      1. In my case I changed SharePoint.StackExchange to ServerFault.StackExchange
    2. Perform a Replace in the entire model to change the DB Name.  
      1. I changed three occurrences this way.
    3. Perform a replace in the entire model to change the Name Space.  
      1. I changed 47 occurrences of stackexchange.sp to stackexchange.sf
These were all of the changes I needed to make in the model before I imported it.  If any of the global replaces touched the Entity Actions, they would be incorrect.  I'll regenerate those from within Central Admin.

Since I was already in the BCS Administration area to import the model, I changed my view to External Content Types, sorted by the External System column, clicked the checkbox for the various content types for my model and clicked the Create/Upgrade button in the ribbon.  After a confirmation dialog was accepted SharePoint created the new profile pages in the proper path and updated the ECTs.  The only issue is that now my Model on my File System is out of sync with SharePoint.

I fix this by using SPD to export the model again.  Unfortunately I have to choose a new name since SPD knows the current models name is already in use.  I simply append a 2 to the end and click through the dialog and save the model.  I then rename the old model file out of the way and rename the new one to its proper name.  Edit the file and change the model name in the XML per step 1 above again and save.  I then go back into Central Admin, delete the old model and re-import the model to make sure the file I have on disk is in fact the same as what SharePoint's got.  It's complex enough without chasing files that are out of sync with SharePoint.

Now just set the rights on the new external system this model created and we're ready to crawl.

Or are we?

If we check the row counts in the various views we see lots of rows:

uv_AllAcceptedAnswers  100,937
uv_AllAnswerComments  157,227
uv_AllComments  615,595
uv_AllQuestionComments  233,554
uv_AllQuestions  193,889
uv_AllResponseComments  224814
uv_AllResponses  242,283
uv_AllUsers  200,223

The stock BCS throttle is at 1,000,000 items per single query result and these are all well within that boundary, so we should be good to go.

I create a new content source in Search for this database and initiate a full crawl.  I like to watch the crawl progress using ULSViewer and creating a filtered view on event id c73i.  If the method instances are well named you can get a pretty good idea of what's happening with the crawl over BCS.

To be updated once the full crawl completes...

... next day...
And when I search for 'stuff' again I get a blended result:

So this worked very well.  Cloning the SQL Objects and BCS Model worked great.  But I still want more.  In the next posts I'll take on the grand daddy of the stack exchange extracts, stack overflow.

5/7/2017- Updated link to SQL Script

No comments: