Category Archives: Frustration

Chatbot with Microsoft Azure

In lieu of SleepScore Labs newest sleep solution, SleepScore App, our Customer Service team wanted to create a chat alternative to support this product launch.

Over the course of three furious weeks (one to vet out the needs, and two to build), I worked one-on-one with our Customer Service Manager on outlining the bare minimum of what we’d really want in this type of feature.

It’s safe to say, that you can get trapped inside of a rabbit hole fairly quickly with ideas but after easing in a bit our solution was to design an FAQ (Frequently Asked Questions) Chatbot.

We took as much existing data we could from previous customer service engagements, organized them a bit and indexed them within a Web Service called QnA Maker. I connected that dot to a Microsoft Azure Bot Service as a NodeJS Web App and published it to a channel called Live Assist.

A NodeJS chatbot powered by QnA Maker, Microsoft Azure and channel, Live Assist.

Housed entirely inside of Microsoft products, I was for the most part impressed on how straightforward it was to put all three together to build out something real customers could actually engage with. There was no additional coding on my part and getting to the finish line on time was in itself a win.

If you ever wanted to try building this out yourself here’s the bare minimum you’d need:

  • QnA Maker: which will index your questions and answers and open an end-point for your web app (chatbot) to consume.
  • Azure Bot Service: which is actually how you create the chatbot.

The optional requirement here is the channel. Building out a chatbot requires you to point your chatbot to a channel. In our case it was Live Assist because that was the requirement. But this could easily ship to other channels like Facebook, SLACK, etc.

I’ll be making a follow-up post to this for the most part as a retrospective to outline the journey and its gaps to this implementation in hopes that one day I have the opportunity to make more improvements at this first pass on building out a chatbot.

Thanks to, Microsoft Developer US for getting me started.

Simple PHP Proxy returns incorrect JSON from Apache Solr instance

I’ve implemented Ben Alman’s simple-proxy.php to communicate to an Apache Solr instance (in this case my local) outside of my domain.

I’ve followed the instructions in full, the core of which is to set the simple-proxy.php on my domain’s file server.

I’m curious on if there are any modifications that must be made to the proxy in order for the response to be in the correct format?

View on Stackoverflow.

Frustrations excluding urls without ‘www’ from Nutch 1.7 crawl

I’m currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs being indexed as www vs. non-www.

Specifically, after firing the crawl and index to Solr 4.5 then validating the results on the front-end with AJAX Solr, the search results page lists results/pages that are both ‘www’ and ” urls such as:


www.mywebsite.com
mywebsite.com
www.mywebsite.com/page1.html
mywebsite.com/page1.html

My understanding is that the url filtering aka regex-urlfilter.txt needs modification. Are there any regex/nutch experts that could suggest a solution?


# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
 
 
# The default url filter.
# Better for whole-internet crawling.
 
# Each non-comment, non-blank line contains a regular expression
# prefixed by '+' or '-'.  The first matching pattern in the file
# determines whether a URL is included or ignored.  If no pattern
# matches, the URL is ignored.
 
# skip file: ftp: and mailto: urls
-^(file|ftp|mailto):
 
# skip image and other suffixes we can't yet parse
# for a more extensive coverage use the urlfilter-suffix plugin
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$
 
# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]
 
# skip URLs with slash-delimited segment that repeats 3+ times, to break loops
-.*(/[^/]+)/[^/]+\1/[^/]+\1/
 
# accept anything else
+^http://([a-z0-9]*\.)*mywebsite.com/

Also on Stackoverflow and pastebin.

Frustrations integrating AJAX Solr with Solr 4.5

I’m setting up Solr Search for my company’s domain and have stumbled onto roadblock after roadblock. Now that I’m at another obstacle my Google skills are again depleted. This project is almost three weeks in. Restorting to the Stackoverflow Gods again!

Here is the link in detail, AJAX Solr returning the default wildcard *:* and not what I query

Frustrations with indexing Nutch 1.7 to Solr 4.5

I’m setting up Solr Search for my company’s domain and have stumbled onto roadblock after roadblock. Now that I’m at another obstacle my Google skills are pretty much depleted. This project is almost two weeks in with an aggressive timeline. I’ve had to resort to the Stackoverflow Gods for the first time…hopefully it works.

Here’s the link in detail, Exception in thread “main” java.io.IOException: Job failed! on Nutch 1.7