Crawl a WordPress Blog with SharePoint 2013

At work we have a WordPress blog that we wanted to include in our public website’s search results.  Yep, our public website is SharePoint 2013.  We recently moved it to Azure, but it is a blog for another day…

Anyway, I started down this path and ran into a few issues before I sorted it out.  Once I figured it out I thought it’d be a good idea to share.

The first thing I did was go to my search config and create a new content source.  I added the URL to the blog to it and I started down a path of trying several different Crawl Settings.

Turns out I just needed to set it to Only crawl within the server of each start address.  I couldn’t tell this worked though because I kept running into this warning in my crawl logs every time I did a full crawl…

Item not crawled due to one of the following reasons: Preventive crawl rule; Specified content source hops/depth exceeded; URL has query string parameter; Required protocol handler not found; Preventive robots directive. ( This item was deleted because it was excluded by a crawl rule. )

I tried to google the site and could only ever get a result it I googled the URL,, which told me…

A description for this result is not available because of this site’s robots.txt

I went and checked the reading settings on the blog.  Turns out the Search Engine Visibility check box, Discourage search engines from indexing this site, was checked.  I unchecked it and kicked off a crawl.  At this point I didn’t have the proper Crawl Setting set and was just trying to crawl the sitemap.xml file with SharePoint can’t do.  I experimented with crawl rules for a while and then switched back to the url of the blog in the content source.

This resulted in much more stuff coming into the index than I would ever want.

Eventually, what ended up working for me is the following:

  • Content source  with Blog
  • Crawl Setting set to, Only crawl within the server of each start address
  • Crawl rule set to
    • blogpath/*
    • User regular expression syntax for matching this ruled checkbox checked
    • Include all items in this path selected and all check boxes below left unchecked
    • Anonymous access

After getting my configuration sorted out as indicated above, I crawl worked as expected and I have blog entries showing up in search.

I specifically wrote this blog because this forum post didn’t provide a solution to the poster’s issues…


About tbithell
I am the Chief Technical Architect of Portals at B2B Technologies, LLC in Atlanta GA. I first started working on SharePoint in 2005, and have built SharePoint portals and developed custom solutions for a wide variety of users. I have a MS in Computer Science and am a SharePoint 2010 MCITP and MCPD. I am very excited about the new version, and will be blogging about it on a regular basis as I explore the newly released Preview and future releases.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: