Sunday, June 7, 2015

SharePoint 2010/2013 Crawling BCS SQL Server-based External Content

Here recently I've been working on crawling a SQL Server-based data warehouse with several tables having more than a million rows and one with more than eight million.

My initial solution was to modify the configurable limits in SharePoint 2010 to allow queries of up to 10 million rows and set the SQL timeouts to 5 minutes.  This worked but doesn't scale as the either the search process or gatherer process will run out of memory and fail (I've been lucky so far).  I now have a solution that crawls the data and bring back no more than 100,000 records at time (the recommended maximum), supports incremental crawls and crawl-time security trimming.

I've decided to post a series of blog posts describing the various solutions I came up along the way to my current solution.  I've gleaned information and clues from numerous sources on the internet, a book and finally from comments in a random sample custom connector the office developer team posted.

I'm going to work through a series of BCS Models based on accessing a data dump from SharePoint.StackExchange.com.  You can look elsewhere on the webs for instructions on how to create your own repository.

I'm working with a SP2013 Trial Installation patched to May 2015 CU on Win2008R2 (since the published trial ISO doesn't have SP1 slipstreamed I couldn't use the Win2012R2 trial).  My SQL Server is a SQL 2014 SP1 trial.

BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database
BCS Models - Part 5: The Bigger Database
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl
BCS Models - Part 8: Crawl Results
BCS Models - Part 9: Crawl-Time Security

Resources I used:

2 comments:

Luca Noè said...

Hi, your solution is more interesting. Can you send me complete model file? Dropbox link don't work. TKS

Chad said...

I've updated the link from dropbox to github