Sunday, June 28, 2015

BCS Models - Part 7: Changes to the BCS Model to support segmented crawl

This post is part of an eight part series describing the process I have followed to be able create a BCS model capable of indexing well over 10 million items.

Series Index:
BCS Models - Part 1: Target Database
BCS Models - Part 2: Initial BCS External Content Types
BCS Models - Part 3: Crawl Results
BCS Models - Part 4: Bigger Database
BCS Models - Part 5: The Bigger Database
BCS Models - Part 6: How to eat this elephant?
BCS Models - Part 7: Changes to the BCS Model to support segmented crawl <-- You are here
BCS Models - Part 8: Crawl Results

In BCS Models - Part 6: How to eat this elephant? I created a number of stored procedures to be used to segment the data so we can take smaller bites of our huge elephant of data, stackoverflow.

Here's a representative entity describing the QuestionSegment Entity.  There doesn't seem to be a good way to deal with the wide XML, so it's a bit ugly.  

<Entity Namespace="stackexchange.so" Version="1.0.0.0" EstimatedInstanceCount="10000" Name="QuestionSegment" DefaultDisplayName="QuestionSegment">
  <Properties>
    <Property Name="DefaultAction" Type="System.String">View Profile</Property>
  </Properties>
  <Identifiers>
    <Identifier TypeName="System.Int32" Name="segmentNumber" />
  </Identifiers>
  <Methods>
    <Method IsStatic="false" Name="usp_getQuestionSegmentsRead List">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_getQuestionSegments</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">[dbo].[usp_getQuestionSegments]</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <Parameters>
        <Parameter Direction="Return" Name="usp_getQuestionSegmentsRead List Return">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getQuestionSegmentsRead List Collection">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getQuestionSegmentsRead ListElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="segmentNumber" Name="segmentNumber" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="lowerID" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="upperID" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <MethodInstance Type="Finder" ReturnParameterName="usp_getQuestionSegmentsRead List Return" Default="true" Name="usp_getQuestionSegmentsRead List Instance">
          <Properties>
            <Property Name="RootFinder" Type="System.String"></Property>
            <Property Name="UseClientCachingForSearch" Type="System.String"></Property>
          </Properties>
        </MethodInstance>
      </MethodInstances>
    </Method>
    <Method IsStatic="false" Name="usp_getQuestionSegmentRead Item">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_getQuestionSegment</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">[dbo].[usp_getQuestionSegment]</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <Parameters>
        <Parameter Direction="In" Name="@segmentNumber">
          <TypeDescriptor TypeName="System.Int32" IdentifierName="segmentNumber" Name="@segmentNumber" />
        </Parameter>
        <Parameter Direction="Return" Name="usp_getQuestionSegmentRead Item Return">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getQuestionSegmentRead Item Collection">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getQuestionSegmentRead ItemElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="segmentNumber"
                     Name="segmentNumber" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="lowerID" />
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" Name="upperID" />
                  <TypeDescriptor TypeName="System.Int64" ReadOnly="true" Name="DeletedCount" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <MethodInstance Type="SpecificFinder" ReturnParameterName="usp_getQuestionSegmentRead Item Return"
          ReturnTypeDescriptorPath="usp_getQuestionSegmentRead Item Collection[0]" Default="true"
          Name="usp_getQuestionSegmentRead Item Instance">
          <Properties>
            <Property Name="DeletedCountField" Type="System.String">DeletedCount</Property>
            <Property Name="UseClientCachingForSearch" Type="System.String"></Property>
          </Properties>
        </MethodInstance>
      </MethodInstances>
    </Method>
  </Methods>
  <Actions>
    <Action Position="1" IsOpenedInNewWindow="false" Url="http://sp2013lab:80/sites/ECT/_bdc/stackexchange_so/QuestionSegment_2.aspx?segmentNumber={0}" ImageUrl="/_layouts/15/1033/images/viewprof.gif" Name="View Profile">
      <LocalizedDisplayNames>
        <LocalizedDisplayName LCID="1033">View Profile</LocalizedDisplayName>
      </LocalizedDisplayNames>
      <Properties>
        <Property Name="IsTaskpaneAction" Type="System.Boolean">true</Property>
        <Property Name="Office Version" Type="System.String">15</Property>
      </Properties>
      <ActionParameters>
        <ActionParameter Index="0" Name="segmentNumber[0]">
          <Properties>
            <Property Name="IdOrdinal" Type="System.Byte">0</Property>
          </Properties>
        </ActionParameter>
      </ActionParameters>
    </Action>
  </Actions>
</Entity>



I've highlighted three lines.  The first is the RootFinder property on the Finder MethodInstance.  This indicates to the crawler to start crawling here.  Each of the Segment Entities will have this, causing crawl to go after all of them at the same time.

The second highlighted line, TypeDescriptor for DeletedCount is a System.Int64.  This is a field in support of incremental crawls telling the crawler how many deleted rows there are.  The data sources I've ever used didn't really delete any data so I've always made my supporting SQL return a 0 cast as a BigInt.  The BigInt is required by BCS.

The third highlight is in support of Incremental crawls and tells BCS which of the of the fields is the DeletedCountField.

Here's the Question Entity:
<Entity Namespace="stackexchange.so" Version="1.1.0.0" EstimatedInstanceCount="10000" Name="Question" DefaultDisplayName="Question">
  <Properties>
    <Property Name="DefaultAction" Type="System.String">View Profile</Property>
  </Properties>
  <Identifiers>
    <Identifier TypeName="System.Int32" Name="ID" />
  </Identifiers>
  <Methods>
    <Method IsStatic="false" Name="usp_GetQuestionsBySegment AssociationNavigator">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_GetQuestionsBySegment</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">usp_GetQuestionsBySegment</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <FilterDescriptors>
        <FilterDescriptor Type="Input" Name="LastCrawlTime">
          <Properties>
            <Property Name="CrawlStartTime" Type="System.String"></Property>
          </Properties>
        </FilterDescriptor>
      </FilterDescriptors>
      <Parameters>
        <Parameter Direction="In" Name="@segmentNumber">
          <TypeDescriptor TypeName="System.Int32" IdentifierName="segmentNumber"
           IdentifierEntityName="QuestionSegment" IdentifierEntityNamespace="stackexchange.so"
           ForeignIdentifierAssociationName="usp_GetQuestionsBySegment AssociationNavigator Instance"
           Name="@segmentNumber" />
        </Parameter>
        <Parameter Direction="In" Name="@lastRunDate">
          <TypeDescriptor TypeName="System.DateTime" AssociatedFilter="LastCrawlTime" Name="lastModifiedTime">
            <Interpretation>
              <NormalizeDateTime LobDateTimeMode="Local" />
            </Interpretation>
          </TypeDescriptor>
        </Parameter>
        <Parameter Direction="Return" Name="usp_GetQuestionsBySegment Return">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="uv_AllQuestionsRead List">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="uv_AllQuestionsRead ListElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="ID" Name="ID" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ParentId" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="AnswerCount" />
                  <TypeDescriptor TypeName="System.String" Name="Body">
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToEmptyString" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ClosedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommentCount" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommunityOwnedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.DateTime" Name="CreationDate">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="FavoriteCount" />
                  <TypeDescriptor TypeName="System.DateTime" Name="LastActivityDate">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="LastEditDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="LastEditorDisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="Score">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="Tags">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">150</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="Title">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">250</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="ViewCount">
                    <Properties>
                      <Property Name="RequiredInForms" Type="System.Boolean">true</Property>
                    </Properties>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="DisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="PostType">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">50</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="segmentNumber" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <Association Name="usp_GetQuestionsBySegment AssociationNavigator Instance" Type="AssociationNavigator" ReturnParameterName="usp_GetQuestionsBySegment Return">
          <Properties>
            <Property Name="DirectoryLink" Type="System.String"></Property>
            <Property Name="ForeignFieldMappings" Type="System.String">
              &lt;?xml version="1.0" encoding="utf-16"?&gt;
              &lt;ForeignFieldMappings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;
              &lt;ForeignFieldMappingsList&gt;
              &lt;ForeignFieldMapping ForeignIdentifierName="segmentNumber" ForeignIdentifierEntityName="QuestionSegment" ForeignIdentifierEntityNamespace="stackexchange.so" FieldName="segmentNumber" /&gt;
              &lt;/ForeignFieldMappingsList&gt;
              &lt;/ForeignFieldMappings&gt;
            </Property>
            <Property Name="LastModifiedTimeStampField" Type="System.String">LastEditDate</Property>
            <Property Name="UseClientCachingForSearch" Type="System.String"></Property>
          </Properties>
          <SourceEntity Namespace="stackexchange.so" Name="QuestionSegment" />
          <DestinationEntity Namespace="stackexchange.so" Name="Question" />
        </Association>
      </MethodInstances>
    </Method>
    <Method IsStatic="false" Name="usp_getPostByID Question">
      <Properties>
        <Property Name="BackEndObject" Type="System.String">usp_getPostByID</Property>
        <Property Name="BackEndObjectType" Type="System.String">SqlServerRoutine</Property>
        <Property Name="RdbCommandText" Type="System.String">[dbo].[usp_getPostByID]</Property>
        <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">StoredProcedure</Property>
        <Property Name="Schema" Type="System.String">dbo</Property>
      </Properties>
      <Parameters>
        <Parameter Direction="In" Name="@postID">
          <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" IdentifierName="ID" Name="@postID" />
        </Parameter>
        <Parameter Direction="Return" Name="usp_getPostByID">
          <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="usp_getPostByID">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" Name="usp_getPostByIDElement">
                <TypeDescriptors>
                  <TypeDescriptor TypeName="System.Int32" ReadOnly="true" IdentifierName="ID" Name="ID" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ParentId" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="AnswerCount" />
                  <TypeDescriptor TypeName="System.String" Name="Body">
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToEmptyString" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="ClosedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommentCount" />
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="CommunityOwnedDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.DateTime" Name="CreationDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="FavoriteCount" />
                  <TypeDescriptor TypeName="System.DateTime" Name="LastActivityDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.DateTime, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="LastEditDate">
                    <Interpretation>
                      <NormalizeDateTime LobDateTimeMode="UTC" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="LastEditorDisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="Score" />
                  <TypeDescriptor TypeName="System.String" Name="Tags">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">150</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="Title">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">250</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Int32" Name="ViewCount" />
                  <TypeDescriptor TypeName="System.String" Name="DisplayName">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">40</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.String" Name="PostType">
                    <Properties>
                      <Property Name="Size" Type="System.Int32">50</Property>
                    </Properties>
                    <Interpretation>
                      <NormalizeString FromLOB="NormalizeToNull" ToLOB="NormalizeToNull" />
                    </Interpretation>
                  </TypeDescriptor>
                  <TypeDescriptor TypeName="System.Nullable`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]" Name="AcceptedAnswerId" />
                </TypeDescriptors>
              </TypeDescriptor>
            </TypeDescriptors>
          </TypeDescriptor>
        </Parameter>
      </Parameters>
      <MethodInstances>
        <MethodInstance Type="SpecificFinder" ReturnParameterName="usp_getPostByID" ReturnTypeDescriptorPath="usp_getPostByID[0]" Default="true" Name="usp_getPostByID Question Instance">
          <Properties>
            <Property Name="LastDesignedOfficeItemType" Type="System.String">None</Property>
          </Properties>
        </MethodInstance>
      </MethodInstances>
    </Method>
  </Methods>
  <AssociationGroups>
    <AssociationGroup Name="QuestionSegment-Question">
      <AssociationReference AssociationName="usp_GetQuestionsBySegment AssociationNavigator Instance" Reverse="false" EntityNamespace="stackexchange.so" EntityName="Question" />
    </AssociationGroup>
  </AssociationGroups>
  <Actions>
    <Action Position="1" IsOpenedInNewWindow="false" Url="http://sp2013lab:80/sites/ECT/_bdc/stackexchange_so/Question_2.aspx?ID={0}" ImageUrl="/_layouts/15/1033/images/viewprof.gif" Name="View Profile">
      <LocalizedDisplayNames>
        <LocalizedDisplayName LCID="1033">View Profile</LocalizedDisplayName>
      </LocalizedDisplayNames>
      <Properties>
        <Property Name="IsTaskpaneAction" Type="System.Boolean">true</Property>
        <Property Name="Office Version" Type="System.String">15</Property>
      </Properties>
      <ActionParameters>
        <ActionParameter Index="0" Name="ID[0]">
          <Properties>
            <Property Name="IdOrdinal" Type="System.Byte">0</Property>
          </Properties>
        </ActionParameter>
      </ActionParameters>
    </Action>
  </Actions>

</Entity>

I'll enumerate the differences between this Entity Definition and the prior generation Entity Definition:

  • The Finder Method has been removed.
    • This is to ensure that the crawler won't crawl this entity directly, hence undercutting all our good work to segment the data crawl.  Remember the crawler looks for Entities that have the RootFinder property on a Finder or an Entity that has both a SpecificFinder and Finder method defined to crawl.
  • The ChangedIdEnumerator and DeletedIdEnumerator methods have been removed.
    • Even if they are provided, the crawler won't call them.  
  • A new Association Method is defined to represent the Association from the QuestionSegment Entity to the Question entity.  The AssociationMethod has a property named DirectoryLink.
    • This is the whole purpose of the new model.  
    • The presence of the DirectoryLink causes the Crawler to treat the Source of the Association as a Directory or Container.
    • Each Container Enumeration is processed independently of other Container Enumerations.  This is what gives us the multiple, smaller result sets that enables the Crawler to use less memory and survive the encounter.
  • We have a Filter and Parameter on the new Association method
    • <FilterDescriptor Type="Input" Name="LastCrawlTime">
        <Properties>
          <Property Name="CrawlStartTime" Type="System.String"></Property>
        </Properties>
      </FilterDescriptor>
    • This is in support of the incremental crawl.  
    • The Property CrawlStartTime causes SharePoint to provide the last time the previous crawl of the current crawl type was performed, except for Full Crawls.  I've seen either '1900-01-01 00:00:00' or '1899-12-31 18:00:00' be passed into the filter.
      • The significance here is that the first Incremental Crawl will function like a Full Crawl in that the same CrawlStartTime value is passed in.
  • We have a new In Parameter specified
    • <Parameter Direction="In" Name="@lastRunDate">
        <TypeDescriptor TypeName="System.DateTime" AssociatedFilter="LastCrawlTime"
        Name="lastModifiedTime">
          <Interpretation>
            <NormalizeDateTime LobDateTimeMode="Local" />
          </Interpretation>
        </TypeDescriptor>
      </Parameter>
    • This is in support of the incremental crawl.  
    • This takes the filter value and associates it with the parameter, passing it to the backend where we can use it in our Stored Procedure to limit our results.
  • We have also defined the 'LastModifiedTimeStampField' property.
    • This enables the crawler to perform the incremental crawl even on the first incremental run.  It will use this field value to compare to the records already present in the index.  Having this present enables the Crawler to not have to replace all of the data it read in the Full Crawl, increasing the speed of the process.
The other Entities all follow this pattern.  The entire model is available for download here.

5/7/2017- Link changed to github

No comments: