Hi, my name is Enrico. I was an intern at BCNANalytics and started in the beginning of 2015 together with Aleix Ruiz de Villa and Josep Marc Mingot to work on an idea which was supposed to help out mostly small businesses to promote their products and services online.
The idea was to search for deals and promotions online and to classify and display these automatically in order to provide a solution similar to the daily deal sites available. So where could you find deals if not on Groupon or Groupalia? An obvious place to look at was Social Media and more exactly Twitter due to it’s great API. We thought that, if businesses promote a deal on Groupon, they probably promote it to their fanbase on Twitter as well. And those who are not on Twitter most likely will promote a discount or anything similar as well on Twitter since it is for free and it would be almost weird not to let your followers know about it.
So we tried to take look on Twitter and the first thing we noticed was, that in fact there were promotions going on.
But, it was not easy to find them. This was because of two main things:
- You can’t specifically look for businesses on Twitter. Everything you search for, is mixed in the results. Meaning that you find businesses accounts, private profiles, tweets, anything. So there is no way of searching for tweets that are coming only from businesses. Especially not businesses from your town! So what we did was we built manually a list on Twitter that included relevant business accounts up to about 780 accounts which were coming from Barcelona.
- The second problem is, that if you manage to create a list of businesses that you might like (which works pretty well via Twitterlists) you still have no way of searching through them in order to automatically find deals. Precisely this means that you have to think about a certain word like „oferta“ but this does not guarantee you a discount. The example from the presentation were the following two sentences: „Kevin Costner me hizo una oferta pero yo de Barcelona no me muevo“ and „Vuelos a Cuba en oferta i/v 350 €“. So you quickly see the problem.
In order to solve this problem, we thought about training a machine learning classifier. At first this meant a lot of manual work since we had to look for discount tweets within tweets from businesses. But after we created a list of those, we then used different classifiers (using the Scikit Learn library) and eventually achieved an accuracy of 80% overall.
There is a catch though: As described earlier, it is important to mention that our classifier is only trained on certain keywords. It is easy to imagine that one has to start somewhere when training the classifier, meaning that within all tweets coming from our list, we have to pick out some of them via certain keywords. For example the word „oferta“ from the image above. We can train the classifier on deciding between the two phrases one of which is a deal and another is not. But our classifier does not know about other possible keywords (we did train it on more than one word though). However there might be a keyword combination like „siempre los“ which potentially detects a lot of deals but is not in our current list which means that we can not detect deals that are described in this way. This is why we are going to implement ongoing learning. But we’ll leave this approach for another post.
By Enrico Kunz