Friday, April 19, 2013

Enterprise Search - Boolean and other advanced search query techniques

The Enterprise edition of SharePoint 2010 and 2013 include FAST Search. FAST is an acronym meaning FAst Search and Transfer. I won't go into the history of how or why Microsoft acquired this technology. Suffice it to say it's more robust and feature rich than Standard and other editions of SharePoint. In SharePoint 2010, FAST is deployed separately with its own installation package and configuration wizard (as well as it's own service packs and cumulative updates). In SharePoint 2013, FAST is built into SharePoint and there is no separate install required - the functionality is there out of the box. In my previous blog posts about Enterprise Search, I've covered some of the features of FAST Search for SharePoint. In this post, I'll be going over some of the advance search techniques you can use in the search box to help you find what you are looking for, fast. Regardless of whether your implementing or running SharePoint 2010 or 2013 Enterprise edition, this post about search query techniques may prove useful.

Boolean: Everyone with experience using Internet search engines, especially in the early or pre-Google days, has likely used Boolean search operators such as AND, OR and NOT. When searching for terms, for example, defense budget you may want to try fine-tuning your results using Boolean operators. Searches like defense AND budget, defense NOT navy, defense NOT navy AND budget, and defense OR government AND budget will all return different results. Boolean operators are CASE-SENSITIVE. Only uppercase operators will work correctly. If you just search for defense and budget, the search ending will not recognize and as an operator.  

NEAR: The NEAR operator can be used to return results with keywords that are located near each other in a file. This can be useful if you know you want to fine, for example, a document that talks about Navy budget, but the 2 words are not mentioned as a phrase. Perhaps the document says Navy and Marine Corps defense budget. A search like navy NEAR budget would work well to find that document. The default # of characters they keywords have to be from each other is 8, but you can use NEAR(#) to override this default. For example, navy NEAR(5) budget ) will return results with the keyword Navy up to 8 terms away from budget. As with other keyword operators, you can also use this with phrases (e.g., navy NEAR(5) "defense budget" ).

WORDS: The WORDS operator affects how results are ranked. If you search for several different terms, content with more occurrences of 1 word or another are ranked higher. WORDS forces the search engine to treat them as synonyms and rank results as though you were searching for single term. For example, in WORDS(military, navy, "marine corps", army, "air force") , the keywords and phrases are treated as the same term and would return results differently than searching for military navy "marine corps" army "air force"

You may also have some experience using other operators with Google, Bing, or other search engines. SharePoint search also has many of these same operators (and different ones). Below, I’ve described a few of the more useful ones. Some of these, and others, are available in the Refinements Panel on the left side of the search results page. However, sometimes you may want to avoid those extra clicks required to refine your search by doing the refinements directly in the search box.

FileName:  One of the more useful operators is FileName, which, as the name suggests, can be used to refine search results to a specific file name. To use this type of operator, you must follow it with a colon and the name of the file you want find in quotes. For example filename:"defensebudget.xlsx”. If you don’t know the exact name of the file you want to find, you can use wildcard just as with any other search. For example: filename:"defense*.xlsx”. You can also combine this and other operators. For example, you can combine the FileName operator with a Boolean operator filename:"defense*.xlsx" OR filename:"budget*.xlsx". You can also use this operator for limiting search results to a specific file type by preceding the extension by an asterisk wildcard (e.g., filename:"*.xlsx" )

Author: This is another useful operator, especially if you’re trying to find a document you know you uploaded to SharePoint, but cannot recall where. As with FileName, it has to be followed by a colon and a quoted string (e.g., author:"Collogan, Vincent*"  or author:"Vincent Collogan*" or simply author:"Collogan*" ). The * wildcard at the end is particularly important here for ignoring any suffixes added to the end of someone’s display name.

Write: This operator can be used to find files, including site pages, modified before, after, or during a specified time range. For example, this query will return files modified during the month of March, 2013 write>"2013-03-01" AND write<"2013-03-31". Here, I’ve combined the Write and FileName operators write>"2013-03-01" AND write<"2013-03-31" AND filename:"*budget.xls*" to find Excel files modified after May 31st, 2013.

Site: This operator is used to limit search results to a specific site, perhaps your own teamsite. You need to know the URL you want to search and it can be as deep as you’d like, from something like site:"" to something more specific like site:"”. As with other operators, you can combine them (e.g., defense AND budget AND site:"" )

Size: If you know the file you are looking for is a specific size or within a range, you can use this operator. For example, size:100KB, or size:1000MB (same as searching for size:1MB),  Again, this can be combined with other operators to refine results (e.g., size:30KB AND author:"Collogan, Vincent*" ). Unfortunately, unlike Write, you cannot use the Size operator to specify a range - you'll need to know the exact size.

Other tips to make your searches more effective

  • Use double quotes to find exact phrases. 
  • Case/capitalization doesn't matter in search except for Boolean operators. DEFENSE is the same as defense.
  • Use a wildcard (*) if you want get variations or are unsure about spelling. SharePoint has the capability of doing search stemming, which means it would search for various versions of words (ran = ran, run, runs, running, etc.) But, we have not enabled this because it has the tendency to make search results less valid. Without stemming, you can get similar results using a wildcard (*). For example, you can search for run* to get results that include runs and running, although not ran.

For more advanced and in-depth reading, take a look at the MSDN documentation Querying Enterprise Search

No comments:

Post a Comment