Friday, September 4, 2015

Removing orphaned resources left by Sandboxed Solutions

I recently experimented with a Sandboxed Solution to insert some javascript files into all pages of a site collection.  This worked quite well, even to the point of the files continuing to be requested by the pages after the solution was removed and deleted.

I finally grew tired of the multiple unnecessary 404 errors being generated on the page views of the site collection.  I'd spent a fair bit of time looking for how to remove the orphaned items before I opened MS case.

Eventually we discovered the solution and again it was quite easy once we knew what to do.

For some background, here's the Elements.xml of the Sandboxed Solution:


 
   
   
   
   
   
 

      ScriptSrc="~sitecollection/SiteAssets/js-class.js"
    Location="ScriptLink"
    Sequence="105">
 

      ScriptSrc="~sitecollection/SiteAssets/bluff-min.js"
    Location="ScriptLink"
    Sequence="110">
 

      ScriptSrc="~sitecollection/SiteAssets/jquery.min.js"
    Location="ScriptLink"
    Sequence="100">
 
To remove these orphaned files I used powershell:
$s = get-spsite https://foo.com 
$uca = s.UserCustomActions | select -first 1 
$uca  # Visually validate this one to delete before running the next command 
$uca.delete()

Tuesday, July 7, 2015

PerformancePoint connections to PowerPivot quit working after patching!

I recently patched a Data Warehouse farm (March 2015 CU + several security hotfixes) and users reported that their connections from PerformancePoint to PowerPivot workbooks quit working.  Error messages indicated that the PerformancePoint Unattended Service Account were getting accessed denied.

There were no Failure Audits in the Security Event log.  I validated kerberos settings such as SPNs for everything, web app authentication, AD delegations, etc.  I couldn't find anything.

Involved MS and we found the issue within 45 minutes and reminded me of a Kerberos configuration I'd forgotten about.

PowerPivot Redirector
There's a PowerPivot redirector that lives on each server in the farm, or at least the configuration for one does.  It has its own private web.config file that also has to be set to Negotiate from NTLM to get the PerformancePoint -> PowerPivot connection to work.  According to timestamps the last time I'd updated the web.config was a couple years ago.

The web.config for the redirector lives in <14hive>\ISAPI\PowerPivot

There are two custom bindings that need to be updated to look like below:

<customBinding>
<binding name="RedirectorBinding">
<webMessageEncoding webContentTypeMapperType="Microsoft.AnalysisServices.SharePoint.Integration.Redirector.RawContentTypeMapper, Microsoft.AnalysisServices.SharePoint.Integration" />
<httpTransport manualAddressing="true" authenticationScheme="Negotiate" transferMode="Streamed" maxReceivedMessageSize="9223372036854775807"/>
</binding>
<binding name="RedirectorSecureBinding">
<webMessageEncoding webContentTypeMapperType="Microsoft.AnalysisServices.SharePoint.Integration.Redirector.RawContentTypeMapper, Microsoft.AnalysisServices.SharePoint.Integration" />
<httpsTransport manualAddressing="true" authenticationScheme="Negotiate" transferMode="Streamed" maxReceivedMessageSize="9223372036854775807"/>
</binding>
</customBinding>

By default this is set to NTLM.

After making the change on a server, perform an IISReset.  Make this change on each server in the farm.

EDIT:
thought it would be good to include the file location.

EDIT2:
Given this is in the PowerPivot Redirector, I believe this was probably caused by the SQL Patches that were also applied.  SQLServer 2012 SP2 and CU6 were also applied.

Wednesday, July 1, 2015

BCS Models - Part 9: Crawl-Time Security

This post is part of an eight nine part series describing the process I have followed to be able create a BCS model capable of indexing well over 10 million items.

Series Index:
BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database
BCS Models - Part 5: The Bigger Database
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl
BCS Models - Part 8: Crawl Results 
BCS Models - Part 9: Crawl-Time Security  <-- You are here

In this final post of the series I add the crawl-time security that I mentioned in the very first post.  This feels like an easy step after all the prior work.

SharePoint can perform the security trimming either at crawl time or query time.  With crawl time security trimming a Windows SID is captured for each item being crawled, one time for each item crawled.  With query time security, the back-end system is queried for security information when each and every query request is being processed, while the user is waiting.  I've always chosen to use crawl time security as the security of the items I've dealt with haven't been dynamic to warrant the query-time security.

With Crawl-Time security there are again a couple options.  One is to provide a WindowsSecurityDescriptorField on the finder, or AssocationNavigator, and the other is to provide an additional Method/MethodInstance to return the WindowsSecurityDescriptorField.  

Initially it seems like including the WindowsSecurityDescriptorField on the Finder/AssociationNavigator would be more efficient, however if you do the SpecificFinder is called for every item being crawled.  If the Finder/AssocationNavigator provides all the data needed to populate the entity, the call the SpecificFinder strictly speaking isn't needed, except for one detail.  The Windows security construct can easily grow to consume all the item cache, so BCS just doesn't use it, even if specified.  I've run multiple test iterations to see if I could come up with a combination of Properties, etc. that would enable this to work better but have failed.

The usually most efficient solution is to create a new Method with a BinarySecurityDescriptorAccessor.  The Search Gatherer will use this method if present.  This is the method I use.  Even better, my use cases haven't required unique SecurityDescriptors to be returned per row, but rather per Entity.  To simulate this I'm going to inject a new requirement for the model that was completed in BCS Models - Part 7: Changes to the BCS Model to support segmented crawl.  I'm going to say that the business group has decided that the Response Entity is to be secured with the AD group named 'SuperSecretStuff'.

I need just a few things to continue:


  1. A hexadecimal representation of the SecurityDescriptor
  2. A stored procedure that takes a RequestID and returns the SID for the requested ID.
  3. The new BCS Method and MethodInstance.

Getting the Windows SID

The SecurityDescriptor isn't too hard to get.  I don't recall where I got this powershell, I may have created it based upon some c# example.  It creates a security descriptor, removes all rights for Everyone and adds a specific grant for the requested user or group.  The resulting security descriptor is then dumped out as a hex value


 param($domain, $username)

function Convert-ByteArrayToHexString
{
[CmdletBinding()] Param (
 [Parameter(Mandatory = $True, ValueFromPipeline = $True)] [System.Byte[]] $ByteArray,
 [Parameter()] [Int] $Width = 10,
 [Parameter()] [String] $Delimiter = ",0x",
 [Parameter()] [String] $Prepend = "",
 [Parameter()] [Switch] $AddQuotes )

if ($Width -lt 1) { $Width = 1 }
 if ($ByteArray.Length -eq 0) { Return }
 $FirstDelimiter = $Delimiter -Replace "^[\,\\:\t]",""
 $From = 0
 $To = $Width - 1
 Do
 {
 $String = [System.BitConverter]::ToString($ByteArray[$From..$To])
 $String = $FirstDelimiter + ($String -replace "\-",$Delimiter)
 if ($AddQuotes) { $String = '"' + $String + '"' }
 if ($Prepend -ne "") { $String = $Prepend + $String }
 $String
 $From += $Width
 $To += $Width
 } While ($From -lt $ByteArray.Length)
}

$acct = new-object System.Security.Principal.NTAccount($domain, $username)
[System.Security.Principal.SecurityIdentifier]$sid = $acct.Translate([System.Security.Principal.SecurityIdentifier])
$controlFlagNone = [System.Security.AccessControl.ControlFlags]::None
$sd = new-object System.Security.AccessControl.CommonSecurityDescriptor($false, $false, $controlFlagNone,$sid,$nil,$nil,$nil)
#define some enums...
$worldSID = [System.Security.Principal.WellKnownSidType]::WorldSid
$accessAllow = [System.Security.AccessControl.AccessControlType]::Allow
$inheritanceNone = [System.Security.AccessControl.InheritanceFlags]::None
$propagationNone = [System.Security.AccessControl.PropagationFlags]::None
#get the wellknown sid for everyone
$everyone = new-object System.Security.Principal.SecurityIdentifier($worldSID,$nil)
#Deny access to all users...
$sd.DiscretionaryAcl.RemoveAccess($accessAllow,$everyone,-1, $inheritanceNone,$propagationNone)
#Grant full access to the specified user/group.
$sd.DiscretionaryAcl.AddAccess($accessAllow,$sid,-1, $inheritanceNone,$propagationNone)
#Now get the binary representation of it and return that
$secDes = new-object byte[] $sd.BinaryLength
$sd.GetBinaryForm($secDes,0)
Convert-ByteArrayToHexString -bytearray $secDes -width $sd.BinaryLength -delimiter "" -prepend "0x"






This emits the security descriptor to the powershell window.  

Security Stored Procedure

The stored procedure for my example is really simple.  Regardless of what ID is passed it will always return the same value and name the column SecurityDescriptor.
CREATE PROCEDURE usp_getPostSecurity @postID integer as 
-- the binary representation of the SuperSecretStuff group's SID
select 0x0100048014000000000000000000000030000000010500000000000515000000DAC6A38FD11C9E4A6C3101385B04000002002C000100000000002400FFFFFFFF010500000000000515000000DAC6A38FD11C9E4A6C3101385B040000 as SecurityDescriptor

GO
grant execute on usp_getPostSecurity to spSearchCrawl
go

With a BCS model based upon SQL Server, we cannot reference a .net object to calculate the security descriptor on the fly.  If you truly need to implement item level security you may be best served by adding the security descriptor to the underlying data and pre-calculate it.

New BCS Method

This is pretty straightforward by this time too.  I generally take a SpecificFinder and clone it and then twist it to my needs.

Here's my end result:
<Method IsStatic="false" Name="usp_getPostSecurity Response">
  <Properties>
    <Property Name="BackEndObject" Type="System.String">usp_getPostSecurity</Property>
    <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
    <Property Name="RdbCommandText" Type="System.String">usp_getPostSecurity</Property>
    <Property Name="RdbCommandType" 
      Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">
        StoredProcedure
    </Property>
    <Property Name="Schema" Type="System.String">dbo</Property>
  </Properties>
  <Parameters>
    <Parameter Direction="In" Name="@postID">
      <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" IdentifierName="ID" Name="@postID" />
    </Parameter>
    <Parameter Direction="Return" Name="usp_getPostSecurity Return">
      <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getPostSecurity DR">
        <TypeDescriptors>
          <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getPostSecurityElement">
            <TypeDescriptors>
              <TypeDescriptor Name="SecurityDescriptor" TypeName="System.Byte[]" IsCollection="true" ReadOnly="true"/>
            </TypeDescriptors>
          </TypeDescriptor>
        </TypeDescriptors>
      </TypeDescriptor>
    </Parameter>
  </Parameters>
  <MethodInstances>
    <MethodInstance Type="BinarySecurityDescriptorAccessor" ReturnParameterName="usp_getPostSecurity Return" ReturnTypeDescriptorPath="usp_getPostSecurity DR[0].SecurityDescriptor"  Name="usp_getPostSecurity Response Instance">
      <Properties>
        <Property Name="WindowsSecurityDescriptorField" Type="System.String">SecurityDescriptor</Property>
      </Properties>
    </MethodInstance>
  </MethodInstances>
</Method>
While I don't plan on implementing this on other Entities, I named the stored procedure fairly generic and I like to incorporate the stored procedure name in the method name. Here I added the Entity name to the method name so it would be more obvious in the ULS.

A significant difference with this MethodInstance is that it's returning a single value to the caller instead of a DateReader.  Check the highlighted section that makes that happen.

Crawl Time

The proof is in the crawl.  So what's this look like in the ULSViewer?  As is my custom when dealing with Search Crawls over BCS, I filter by event id c73i and see this:

We see the usual methods executing as always followed by masses of calls to the new MethodInstance.  Each item won't be considered 'crawled' in the SharePoint Search Log until the item-level security has been crawled, same as any other link.

Search Time

I created two new users, Joe and Ralph, and made Joe a member of the SuperSecretStuff group.  I did a search query for jQuery and here's the results:

If you click to expand the picture I also show the AD members of the SuperSecretStuff AD group.

Crawl-time security is working.  To ensure that the users have a good experience however, ensure that proper security groups are granted execute rights on the proper BCS entities so they don't see screens like
but if you're still here you probably know this already.  The rights granted on the BCS Entities within BCS probably need to align with whatever crawl-time security you've configured.  It doesn't help much to trim the search results if the user can manipulate the URL on the profile page or get a link from a friend and see the data that was crawled.  Also it's nice to use security or audiences to hide the web parts that will continue to throw this error when they get to one of these they shouldn't.

Monday, June 29, 2015

BCS Models - Part 8: Crawl Results

This post is part of an eight part series describing the process I have followed to be able create a BCS model capable of indexing well over 10 million items.

Series Index:
BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database
BCS Models - Part 5: The Bigger Database
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl
BCS Models - Part 8: Crawl Results <-- You are here
This series of blog posts reaches its culmination here.  The proof that the BCS SQL Connector can be used to crawl extremely large LOB datasets in a way that doesn't kill the crawler.

I'm using the BCS Model created in BCS Models - Part 7: Changes to the BCS Model to support segmented crawl to crawl a database created from an extract from the March 2015 export from StackOverflow.StackExchange.com.

In order to have the crawls run quickly I've limited by segment size to 1000 rows and will be crawling the first few segments to show the mechanics of the process.  Once I've done that I'll try letting my little single-server farm attempt to index the entire corpus.

I'm going to use both SQL Server Profiler to capture the stored procedure invocations and ULSViewer to capture the c73i events.

Here's the results of my Full Crawl (limited to the first 5 segments on each of the segment entities):
In this trace we see that within 6/100's of a second all seven of the segments are being enumerated.  About a minute later we see the seven child Entities being crawl via their AssociationNavigator and finally SharePoint needs to query three specific posts.  This is where giving the MethodInstance a meaningful name is helpful.

In the crawl log we see that we indexed 35,043 items


The Crawl Queue Health Report shows how fast the links from the segment accumulated.


Now to investigate the impact of the segments on the crawler.  Brian Pendergast shows we can use eventid dw3a if we enable verboseex on the search crawler stuff at Crushing the 1-million-item-limit myth with .NET Search Connector [BDC].

Looking in the MSSCrawlURL DB with a series of queries: 
select * from msscrawlurl where ParentDocID = -1
select cu.* 
  from MSSCrawlURL cu
join MSSCrawlURL cu2 on cu2.docid = cu.ParentDocID
where cu2.ParentDocID = -1

select cu.*
  from MSSCrawlURL cu
join (
select top 1 cu.* 
  from MSSCrawlURL cu
join MSSCrawlURL cu2 on cu2.docid = cu.ParentDocID
where cu2.ParentDocID = -1
) a on a.DocID = cu.ParentDocID

select cu.parentdocid, count(cu.docid)
from MSSCrawlURL cu
join (
select cu.*
  from MSSCrawlURL cu
join (
select top 1 cu.* 
  from MSSCrawlURL cu
join MSSCrawlURL cu2 on cu2.docid = cu.ParentDocID
where cu2.ParentDocID = -1
) a on a.DocID = cu.ParentDocID
) b on cu.ParentDocID = b.DocID
group by cu.ParentDocID

select cu.DocId, cu.ParentDocID, cur.DisplayURL, cur.ErrorLevel,cur.ErrorDesc
from MSSCrawlURL cu
join (
select top 1 cu.*
  from MSSCrawlURL cu
join (
select top 1 cu.* 
  from MSSCrawlURL cu
join MSSCrawlURL cu2 on cu2.docid = cu.ParentDocID
where cu2.ParentDocID = -1
) a on a.DocID = cu.ParentDocID
) b on cu.ParentDocID = b.DocID
join MSSCrawlURLReport cur on cur.URLID = cu.DocID
I get results like:
  1. The first result set is the root of the crawl.
  2. The second result set is the records generated by enumerating the seven segment entities (Finder).  
  3. The third result set is showing the first 5 segments to be enumerated from the first segment entity.
  4. The fourth result set is showing the number of items enumerated in each of the segments for the first Entity
  5. The fifth result is the first few of the 1000 items crawled.

I ran a full crawl and enabled verboseex tracing on all trace entries with *crawl* in the name.  The ULSViewer shots below are filter to event ids c73i and dw3a.

Here's the first link being written (dw3a) that the whole crawl seems to hang off of.  This record has the -1 SourceDocID:

This is followed by seven more links being written, one for each of the Segment Entities.  The line highlighted shows the SourceDocID of 1. All seven of these do.

This is followed by the seven Segment Entities' MethodInstances being invoked

These are followed by more link inserts, for the Entities in each Entity Segment

And the ULS continues in similar fashion until I stopped the crawl and it cleaned itself up.  The Crawl Queue Chart looks like this:

So here we see that crawl has identified nearly 8 million links in less than three hours on my small VM and was also crawling the items.  With an appropriately sized farm, I'm certain this would successfully crawl all the items without issuing any queries that brought more than 100,000 results back while also inserted the items into the temp table for further processing without first accumulating all of the rows for each Entity.  

I believe this demonstrates that we can configure BCS to crawl large SQL Server data sets without having to write code to replace the delivered SQL Server Connector. 

Sunday, June 28, 2015

BCS Models - Part 7: Changes to the BCS Model to support segmented crawl

This post is part of an eight part series describing the process I have followed to be able create a BCS model capable of indexing well over 10 million items.

Series Index:
BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database
BCS Models - Part 5: The Bigger Database
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl <-- You are here
BCS Models - Part 8: Crawl Results

In BCS Models - Part 6: How to eat this elephant? I created a number of stored procedures to be used to segment the data so we can take smaller bites of our huge elephant of data, stackoverflow.

Here's a representative entity describing the QuestionSegment Entity.  There doesn't seem to be a good way to deal with the wide XML, so it's a bit ugly.  

<Entity Namespace="stackexchange.so" Version="1.0.0.0" EstimatedInstanceCount="10000" Name="QuestionSegment" DefaultDisplayName="QuestionSegment">
  <Properties>
    <Property Name="DefaultAction" Type="System.String">View Profile</Property>
  </Properties>
  <Identifiers>
    <Identifier TypeName="System.Int32" Name="segmentNumber" />
  </Identifiers>
  <Methods>
    <Method IsStatic="false" Name="usp_getQuestionSegmentsRead List">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_getQuestionSegments</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">[dbo].[usp_getQuestionSegments]</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <Parameters>
        <Parameter Direction="Return" Name="usp_getQuestionSegmentsRead List Return">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getQuestionSegmentsRead List Collection">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getQuestionSegmentsRead ListElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="segmentNumber" Name="segmentNumber" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="lowerID" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="upperID" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <MethodInstance Type="Finder" ReturnParameterName="usp_getQuestionSegmentsRead List Return" Default="true" Name="usp_getQuestionSegmentsRead List Instance">
          <Properties>
            <Property Name="RootFinder" Type="System.String"></Property>
            <Property Name="UseClientCachingForSearch" Type="System.String"></Property>
          </Properties>
        </MethodInstance>
      </MethodInstances>
    </Method>
    <Method IsStatic="false" Name="usp_getQuestionSegmentRead Item">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_getQuestionSegment</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">[dbo].[usp_getQuestionSegment]</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <Parameters>
        <Parameter Direction="In" Name="@segmentNumber">
          <TypeDescriptor TypeName="System.Int32" IdentifierName="segmentNumber" Name="@segmentNumber" />
        </Parameter>
        <Parameter Direction="Return" Name="usp_getQuestionSegmentRead Item Return">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getQuestionSegmentRead Item Collection">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getQuestionSegmentRead ItemElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="segmentNumber"
                     Name="segmentNumber" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="lowerID" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="upperID" />
                  <TypeDescriptor TypeName="System.Int64" ReadOnly="true" Name="DeletedCount" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <MethodInstance Type="SpecificFinder" ReturnParameterName="usp_getQuestionSegmentRead Item Return"
          ReturnTypeDescriptorPath="usp_getQuestionSegmentRead Item Collection[0]" Default="true"
          Name="usp_getQuestionSegmentRead Item Instance">
          <Properties>
            <Property Name="DeletedCountField" Type="System.String">DeletedCount</Property>
            <Property Name="UseClientCachingForSearch" Type="System.String"></Property>
          </Properties>
        </MethodInstance>
      </MethodInstances>
    </Method>
  </Methods>
  <Actions>
    <Action Position="1" IsOpenedInNewWindow="false" Url="http://sp2013lab:80/sites/ECT/_bdc/stackexchange_so/QuestionSegment_2.aspx?segmentNumber={0}" ImageUrl="/_layouts/15/1033/images/viewprof.gif" Name="View Profile">
      <LocalizedDisplayNames>
        <LocalizedDisplayName LCID="1033">View Profile</LocalizedDisplayName>
      </LocalizedDisplayNames>
      <Properties>
        <Property Name="IsTaskpaneAction" Type="System.Boolean">true</Property>
        <Property Name="Office Version" Type="System.String">15</Property>
      </Properties>
      <ActionParameters>
        <ActionParameter Index="0" Name="segmentNumber[0]">
          <Properties>
            <Property Name="IdOrdinal" Type="System.Byte">0</Property>
          </Properties>
        </ActionParameter>
      </ActionParameters>
    </Action>
  </Actions>
</Entity>



I've highlighted three lines.  The first is the RootFinder property on the Finder MethodInstance.  This indicates to the crawler to start crawling here.  Each of the Segment Entities will have this, causing crawl to go after all of them at the same time.

The second highlighted line, TypeDescriptor for DeletedCount is a System.Int64.  This is a field in support of incremental crawls telling the crawler how many deleted rows there are.  The data sources I've ever used didn't really delete any data so I've always made my supporting SQL return a 0 cast as a BigInt.  The BigInt is required by BCS.

The third highlight is in support of Incremental crawls and tells BCS which of the of the fields is the DeletedCountField.

Here's the Question Entity:
<Entity Namespace="stackexchange.so" Version="1.1.0.0" EstimatedInstanceCount="10000" Name="Question" DefaultDisplayName="Question">
  <Properties>
    <Property Name="DefaultAction" Type="System.String">View Profile</Property>
  </Properties>
  <Identifiers>
    <Identifier TypeName="System.Int32" Name="ID" />
  </Identifiers>
  <Methods>
    <Method IsStatic="false" Name="usp_GetQuestionsBySegment AssociationNavigator">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_GetQuestionsBySegment</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">usp_GetQuestionsBySegment</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <FilterDescriptors>
        <FilterDescriptor Type="Input" Name="LastCrawlTime">
          <Properties>
            <Property Name="CrawlStartTime" Type="System.String"></Property>
          </Properties>
        </FilterDescriptor>
      </FilterDescriptors>
      <Parameters>
        <Parameter Direction="In" Name="@segmentNumber">
          <TypeDescriptor TypeName="System.Int32" IdentifierName="segmentNumber"
           IdentifierEntityName="QuestionSegment" IdentifierEntityNamespace="stackexchange.so"
           ForeignIdentifierAssociationName="usp_GetQuestionsBySegment AssociationNavigator Instance"
           Name="@segmentNumber" />
        </Parameter>
        <Parameter Direction="In" Name="@lastRunDate">
          <TypeDescriptor TypeName="System.DateTime" AssociatedFilter="LastCrawlTime" Name="lastModifiedTime">
            <Interpretation>
              <NormalizeDateTime LobDateTimeMode="Local" />
            </Interpretation>
          </TypeDescriptor>
        </Parameter>
        <Parameter Direction="Return" Name="usp_GetQuestionsBySegment Return">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="uv_AllQuestionsRead List">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="uv_AllQuestionsRead ListElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="ID" Name="ID" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ParentId" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="AnswerCount" />
                  <TypeDescriptor TypeName="System.String" Name="Body">
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToEmptyString" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ClosedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommentCount" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommunityOwnedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.DateTime" Name="CreationDate">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="FavoriteCount" />
                  <TypeDescriptor TypeName="System.DateTime" Name="LastActivityDate">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="LastEditDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="LastEditorDisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="Score">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="Tags">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">150</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="Title">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">250</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="ViewCount">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="DisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="PostType">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">50</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="segmentNumber" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <Association Name="usp_GetQuestionsBySegment AssociationNavigator Instance" Type="AssociationNavigator" ReturnParameterName="usp_GetQuestionsBySegment Return">
          <Properties>
            <Property Name="DirectoryLink" Type="System.String"></Property>
            <Property Name="ForeignFieldMappings" Type="System.String">
              &lt;?xml version="1.0" encoding="utf-16"?&gt;
              &lt;ForeignFieldMappings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;
              &lt;ForeignFieldMappingsList&gt;
              &lt;ForeignFieldMapping ForeignIdentifierName="segmentNumber" ForeignIdentifierEntityName="QuestionSegment" ForeignIdentifierEntityNamespace="stackexchange.so" FieldName="segmentNumber" /&gt;
              &lt;/ForeignFieldMappingsList&gt;
              &lt;/ForeignFieldMappings&gt;
            </Property>
            <Property Name="LastModifiedTimeStampField" Type="System.String">LastEditDate</Property>
            <Property Name="UseClientCachingForSearch" Type="System.String"></Property>
          </Properties>
          <SourceEntity Namespace="stackexchange.so" Name="QuestionSegment" />
          <DestinationEntity Namespace="stackexchange.so" Name="Question" />
        </Association>
      </MethodInstances>
    </Method>
    <Method IsStatic="false" Name="usp_getPostByID Question">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_getPostByID</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">[dbo].[usp_getPostByID]</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <Parameters>
        <Parameter Direction="In" Name="@postID">
          <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" IdentifierName="ID" Name="@postID" />
        </Parameter>
        <Parameter Direction="Return" Name="usp_getPostByID">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getPostByID">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getPostByIDElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="ID" Name="ID" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ParentId" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="AnswerCount" />
                  <TypeDescriptor TypeName="System.String" Name="Body">
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToEmptyString" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ClosedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommentCount" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommunityOwnedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.DateTime" Name="CreationDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="FavoriteCount" />
                  <TypeDescriptor TypeName="System.DateTime" Name="LastActivityDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="LastEditDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="LastEditorDisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="Score" />
                  <TypeDescriptor TypeName="System.String" Name="Tags">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">150</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="Title">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">250</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="ViewCount" />
                  <TypeDescriptor TypeName="System.String" Name="DisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="PostType">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">50</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="AcceptedAnswerId" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <MethodInstance Type="SpecificFinder" ReturnParameterName="usp_getPostByID" ReturnTypeDescriptorPath="usp_getPostByID[0]" Default="true" Name="usp_getPostByID Question Instance">
          <Properties>
            <Property Name="LastDesignedOfficeItemType" Type="System.String">None</Property>
          </Properties>
        </MethodInstance>
      </MethodInstances>
    </Method>
  </Methods>
  <AssociationGroups>
    <AssociationGroup Name="QuestionSegment-Question">
      <AssociationReference AssociationName="usp_GetQuestionsBySegment AssociationNavigator Instance" Reverse="false" EntityNamespace="stackexchange.so" EntityName="Question" />
    </AssociationGroup>
  </AssociationGroups>
  <Actions>
    <Action Position="1" IsOpenedInNewWindow="false" Url="http://sp2013lab:80/sites/ECT/_bdc/stackexchange_so/Question_2.aspx?ID={0}" ImageUrl="/_layouts/15/1033/images/viewprof.gif" Name="View Profile">
      <LocalizedDisplayNames>
        <LocalizedDisplayName LCID="1033">View Profile</LocalizedDisplayName>
      </LocalizedDisplayNames>
      <Properties>
        <Property Name="IsTaskpaneAction" Type="System.Boolean">true</Property>
        <Property Name="Office Version" Type="System.String">15</Property>
      </Properties>
      <ActionParameters>
        <ActionParameter Index="0" Name="ID[0]">
          <Properties>
            <Property Name="IdOrdinal" Type="System.Byte">0</Property>
          </Properties>
        </ActionParameter>
      </ActionParameters>
    </Action>
  </Actions>

</Entity>

I'll enumerate the differences between this Entity Definition and the prior generation Entity Definition:

  • The Finder Method has been removed.
    • This is to ensure that the crawler won't crawl this entity directly, hence undercutting all our good work to segment the data crawl.  Remember the crawler looks for Entities that have the RootFinder property on a Finder or an Entity that has both a SpecificFinder and Finder method defined to crawl.
  • The ChangedIdEnumerator and DeletedIdEnumerator methods have been removed.
    • Even if they are provided, the crawler won't call them.  
  • A new Association Method is defined to represent the Association from the QuestionSegment Entity to the Question entity.  The AssociationMethod has a property named DirectoryLink.
    • This is the whole purpose of the new model.  
    • The presence of the DirectoryLink causes the Crawler to treat the Source of the Association as a Directory or Container.
    • Each Container Enumeration is processed independently of other Container Enumerations.  This is what gives us the multiple, smaller result sets that enables the Crawler to use less memory and survive the encounter.
  • We have a Filter and Parameter on the new Association method
    • <FilterDescriptor Type="Input" Name="LastCrawlTime">
        <Properties>
          <Property Name="CrawlStartTime" Type="System.String"></Property>
        </Properties>
      </FilterDescriptor>
    • This is in support of the incremental crawl.  
    • The Property CrawlStartTime causes SharePoint to provide the last time the previous crawl of the current crawl type was performed, except for Full Crawls.  I've seen either '1900-01-01 00:00:00' or '1899-12-31 18:00:00' be passed into the filter.
      • The significance here is that the first Incremental Crawl will function like a Full Crawl in that the same CrawlStartTime value is passed in.
  • We have a new In Parameter specified
    • <Parameter Direction="In" Name="@lastRunDate">
        <TypeDescriptor TypeName="System.DateTime" AssociatedFilter="LastCrawlTime"
        Name="lastModifiedTime">
          <Interpretation>
            <NormalizeDateTime LobDateTimeMode="Local" />
          </Interpretation>
        </TypeDescriptor>
      </Parameter>
    • This is in support of the incremental crawl.  
    • This takes the filter value and associates it with the parameter, passing it to the backend where we can use it in our Stored Procedure to limit our results.
  • We have also defined the 'LastModifiedTimeStampField' property.
    • This enables the crawler to perform the incremental crawl even on the first incremental run.  It will use this field value to compare to the records already present in the index.  Having this present enables the Crawler to not have to replace all of the data it read in the Full Crawl, increasing the speed of the process.
The other Entities all follow this pattern.  The entire model is available for download here.

5/7/2017- Link changed to github