New event: Causal Inference Part I

We are pleased to announce our next event “Causal Inference – Part I” on November 25th, 19h at Movistar Centre. Doors will open at 18:45.

Machine learning models or A/B testing are useful methods to make business decisions. But sometimes they are not feasible or present some limitations. Moreover, in many cases, we need to address questions such as: what would have happened if instead of doing X we had done Y? Can we have an estimate of the effect of one variable into another? In these cases, causal inference becomes the best option. And unsurprisingly, companies such as Uber are using causal inference as part of their data science efforts.

We are organizing two events to properly discuss the potential of causal inference and when it makes sense to apply it. In this first session, Bartek Skorulski and Aleix Ruiz de Villa will show the whole spectrum of causality, from AB tests to causal inference. You can register here

Aleix Ruiz de Vila holds a Ph.D. in mathematics. He has been head of data science at LaVanguardia, SCRM – Lidl, and Onna. He’s a co-organizer of Barcelona Data Science and Machine Learning Meetup and board member of Societat Catalana de Matemàtiques. He currently teaches at BData and Uoc and he is a data science consultant specialized in causal inference. Check for an introduction to causal inference

Bartek Skorulski, Data Scientist and Ph.D. in Dynamical System. He works as a Senior Data Scientist, Recommender System Lead in Telefonica Innovation Alpha. Previously, he was working as Staff Insight Analyst in Schibsted, Data Science Team Lead in Lidl-SCRM and Data Scientist in King. Moreover, he has many years of experience as an academic researcher and teacher. Now he collaborates with the Polytechnic University of Catalonia, University of Barcelona and Kschool teaching Machine Learning, Deep Learning, and Data Management courses. He is also co-organizer of Barcelona Data Science and Machine Learning Meetup.

This talk is co-organized with Barcelona Data Science and Machine Learning Meetup.

This event could not be possible without the collaboration of Movistar Centre.

Recommended event: La Funció de les Dades. Jornada SCM-SoCE sobre dades i empresa.

La Societat Catalana de Matemàtiques conjuntament amb la Societat Catalana d’Estadística organitza una jornada dedicada a promoure l’activitat de les matemàtiques i l’estadística en el món empresarial. El títol d’aquesta jornada és La Funció de les Dades. Les places són limitades i cal enregistrar-se prèviament en aquest enllaç.

La primera part de la jornada (matí) estarà formada per tallers d’una hora que tenen l’objectiu de donar eines, algunes més conegudes i d’altres de menys, que ajudin a les empreses a treballar amb les seves dades.

En la segona part (tarda) tindrem xerrades amb gent del món empresarial que ens explicarà la seva visió i experiència al voltant de l’ús de les matemàtiques i l’estadística en el món empresarial.

La Jornada tindrà lloc el 14 de novembre a la sala Prat de la Riba, IEC. Carrer del Carme, 47.

Videos of our last events

In the last months we had two great events and now you can watch online both of them.

The first event was held on January 21st and we discussed if Barcelona can become a European hub for Advanced Analytics and Big Data. As speakers we had Josep Maria Martorell (Associate Director at the Barcelona Supercomputing Center) and Òscar Sala (mVentures Director at the Mobile World Capital Barcelona organization). You can watch it in this link below.

The second event was held on February 14th (Valentine Day!!) and we reviewed how Analytics can play a role in Sports (are we close to a Money Ball world?)  As speakers we had Sergi Oliva (Senior Director, Analytics & Strategy at Philadelphia 76ers) and Javier Fernandez (Head of Sports Analytics at FC Barcelona). You can watch in this link below.

Enjoy them (and share them if you like them)

New event: Sports and Analytics

We are pleased to announce our next event “Sports Analytics” on February 14th, 19h at Movistar Centre. Doors will open at 18:45. You can register here.

In this session we will focus on how Analytics can play a role in Sports. Are we close to a Money Ball world?  As speakers we will have Sergi Oliva (Senior Director, Analytics & Strategy at Philadelphia 76ers) and Javier Fernandez (Head of Sports Analytics at FC Barcelona).

This event could not be possible without the collaboration of Movistar Centre.

New event: “Barcelona: Hub for Advanced Analytics and Big Data”


We are pleased to announce our next event “Barcelona: Hub for Advanced Analytics and Big Data” on January 21st 19h at Movistar Centre. Doors will open at 18:45. You can register here.

We will have two great speakers in our panel: Josep Maria Martorell (Associate Director at the Barcelona Supercomputing Center) and Òscar Sala (mVentures Director at the Mobile World Capital Barcelona organization). Both will share their views on Barcelona and its potential to become a European Hub for Advanced Analytics and Big Data. You can see their bios below.

Josep Maria Martorell is Associate Director at the Barcelona Supercomputing Center, Spain’s leading supercomputing centre, specialized in High Performance Computing. Josep Maria gathers a rich experience in technology and research in government, education and the private sector. Among other positions, he was Director of Research for the Catalan Government, Head of Research at Universitat Ramon Llull and is a shareholder and advisor in multiple technological startups in Barcelona.

Òscar Sala is the mVentures Director at the Mobile World Capital Barcelona organization, a venture builder program that addresses the challenge of transforming scientific knowledge into technological solutions. In the past, Òscar held multiple positions related to technology and innovation at Caixabank, VP of Product Strategy at Strands (a successful local fintech), and member of the board at Mobey Forum, a global industry association empowering banks and other financial institutions to lead the future of digital services.

This event could not be possible without the collaboration of Movistar Centre.

Data & Ethics, summary of our last event

In the last months we have seen that Ethics has emerged as an extremely sensitive topic for Data and Analytics community. Most likely, one of the main drivers of this wave of concern was Facebook scandal: Mark Zuckerberg (founder and CEO of Facebook) had to testify in front of US Congress about how his company handles its users’ data and how this could have influenced results in recent elections in several countries. But Facebook is not the only company whose practices are under scrutiny. Tones of questions have also been raised regarding how much personal data Google collects and how this is being used: according to Guillaume Chaslot (an ex-Google engineer), the Youtube algorithm “does not appear to be optimising for what is truthful, or balanced, or healthy for democracy”.

In other words, we are talking not only about privacy but also on how data could even threaten our political system. As Cathy O’Neil writes in her must-read book Weapons of math destruction, “the math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of the models encoded human prejudice, misunderstanding and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models are opaque (…) Their verdicts, even wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society, while making the rich richer”.

As Data-Driven professionals we cannot ignore this inconvenient truth and must address it. This is one of the reasons we at BcnAnalytics organised a session to discuss about Data & Ethics. As speakers we had Carlos Castillo (Distinguished Research Professor at Universitat Pompeu Fabra) and Gemma Galdon (Founder at Eticas Research & Consulting and Researcher at Universitat de Barcelona).

Carlos focused his talk on algorithmic discrimination. He initially reviewed the concept of discrimination from a philosophical perspective and then explained the concept of group discrimination, which means “disadvantageous treatment to an individual because he or she belongs to a specific socially salient group”. According to Carlos a further step is statistical discrimination which can be observed “when group discrimination happens because of some statistical belief, which means that someone has certain data, has looked at this data and based on statistics extracted from this data has decided to treat someone worse than another person”. After reviewing these concepts, Carlos raised the key issue: machine learning algorithms can discriminate.

Why is that? Machine learning systems take data and extract statistical beliefs from this data and therefore they are enabled to discriminate some individuals, regardless of intention and animosity. The key aspect is the consequences of this algorithm in terms of treating worst a person because he or she belongs to a group. Carlos emphasized that to avoid this discrimination, models need to optimize not only accuracy but also need to look at “the risk of two different populations of not getting the same outcome”. Carlos also highlighted how important is that systems are transparent: “if you get a negative outcome, you have to have a way to challenge this decision in a way that is effective… If I am denied a loan or parole, I need to have a way of effectively challenge the decision to say the systems was wrong in my case”.

Gemma started her talk quoting “The Fall of Public Man” from Richard Sennett. “In a city full of sensors and cameras and surveillance everywhere, where would Romeo and Juliet fall in love?”. From Gemma’s perspective, technology is changing our lives and we really need to ask ourselves: Why we are investing in technology? What kind of societies are these technologies creating or promoting? Are we building the cities that we want to build? Do we want to live in a world where everything is remembered? Do we want to live in a world where we can never forget? As she mentioned: “for the first time in history, forgetting is more expensive than remembering. Everything we do is recorded by a camera or a sensor”. Gemma, then, started to review real cases on non-expected outcomes of certain technologies. For instance, smart borders based on biometrics. They were not part of the legislative debate because they were seen “as technical amendments”, but currently biometrics have become our IDs, and certain individuals self-mutilate when they want to hide their identities. In other words, their bodies became their enemies.

Gemma asked herself: “How can we hide behind a technical amendment? And what about false positives?  There is no redress mechanism”. According to her the most burning issue is we, as society, did not think technology could fail. But it fails. And this triggers the key issue: the way we do technology is very irresponsible and no one is facing the consequences of their actions, the consequences of their false positives…which might be human rights. Gemma ended her speech highlighting the fact we need to start thinking how technology is impacting our civilization: “we have the responsibility to decide how we build a social-technical infrastructure that is responsible and desirable for our generation and the next generations”.

See below the link with full session

New team members

BcnAnalytics family keeps growing. In the last weeks two new people joined our core team.

First one to join BcnAnalytics was Didac Fortuny. He is data scientist at Holaluz, a company that connects people to green power. In his own words: “I have a PhD in Physics in which I used data analytics to study the impact of climate change to Mediterranean precipitation. I also teach in a MSc in renewable energy and energy sustainability”.

Last addition has been Alejandra Manrique, She has more than 20 years of expertise in data analytics helping companies to get the most value out of data. In her own words: “I have worked in multiple sectors, water and enviroment, telecommunications, retail, automotive and media. I have international experience in different countries in Europe, America and Australia”.

Let us welcome Didac and Alejandra.

New Data-Driven events in Barcelona

The Barcelona GSE Data Science Center coordinates and promotes interdisciplinary and methodological research, training, and knowledge transfer in Data Science. They are now organising some academic seminars and conferences. See below their upcoming events for the month of March.

In the field of causality, we want to understand how a system reacts under interventions. These questions go beyond statistical dependences and can therefore not be answered by standard regression or classification techniques. In this tutorial, you will be introduced to the interesting problem of causal inference as well as recent developments in the field. We will introduce structural causal models, formalize interventional distributions, and define causal effects as well as show how to compute them. We will present three ideas that can be used to infer causal structure from data: (1) finding (conditional) independences in the data, (2) restricting structural equation models and (3) exploiting the fact that causal models remain invariant in different environments. If time allows, we will also show how causal concepts could be used in more classical machine learning problems. No prior knowledge about causality is required. The material is also covered in a recently published book (open access).

The course will offer an introduction to deep learning along with an extensive practical hands-on session in Python. We will cover deep feedforward models, convolutional networks used mainly in image processing, recurrent neural networks used commonly in text processing, autoencoders, word2vec, as well as introduce optimization for deep learning. During the hands-on workshop, we will use deep learning techniques on images and natural-language text.

Bayes Comp is a biennial conference sponsored by the ISBA section of the same name. The conference and the section both aim to promote original research into Bayesian computational methods for inference and decision making and to encourage the use of frontier computational tools among practitioners, the development of adapted software, languages, platforms, and dedicated machines, and to translate and disseminate methods developed in other disciplines among statisticians.

Do not miss our next event: Data & Ethics

In BcnAnalytics we are really passionate about Data. At the same time, we also have some concerns about ethical aspects of a data-driven world. So, we are pleased to announce our next event will focus on “Data and Ethics”.

Event will be on April 11th 19h at MWC, and as usual doors will open at 18:45.

We will have two great speakers in our panel: Carlos Castillo (Distinguished Research Professor at Universitat Pompeu Fabra) and Gemma Galdon (Founder at Eticas Research & Consulting and Researcher at Universitat de Barcelona). Both will share their views on ethical aspects when using data and building algorithms. They will raise concerns around bias, discrimination and opacity in a data-driven world and how this might negatively affect certain people on their lives.

As usual, after the talks we will have time for networking and free cold beers.

If you want to attend, you can register here

This event could not be possible without the collaboration of Movistar Centre.


On Sunday January 21st, at about 14:00, the winners of the BCN Air Quality Datathon were announced by the jury. This scene concluded an intense weekend in which 12 teams formed by data scientists with all kinds of backgrounds and coming from different countries worked hard to achieve a clear goal: use data to improve the air quality predictions that the Barcelona Supercomputer Center (BSC) performs with the CALIOPE system.

It all began on Saturday 20th at 9:00, when the first participants arrived and collected the wonderful green t-shirt with the motto “Keep modelling and mind the air quality”. Then, after the kind words of our host Vicenç Villatoro (the director of CCCB), Janet Sanz Deputy (mayor for Ecology, Urbanism and Mobility #Barcelona), and people from the companies that made the event possible (the sponsors Gauss&Neumann, Social Point and Holaluz), the datathon was presented and the challenge made public to the participants.

Given the concentration of NO2 observed hourly in 7 measurement stations, and hourly predictions of the concentration of NO2 performed every day with the CALIOPE system, the challenge was to find the model that best predicted the probability for a set of days in 2015 to exceed a threshold concentration of 100 µg/m3 at least in 1 hour of the day.

After that, the teams had about 24 hours to design and implement their models and submit their predictions. At that moment, the strategies of the different teams started to emerge. Some discussed how to build the model before implementing it, while others started coding straight away to make the most with the available time. While experienced teams used a rigorous methodology to work in parallel at a fast pace, some newbies struggled to find a way to combine different languages or pass data from one computer to another. All of this in an environment of concentration but also of relaxation.

After a night in which some participants (and some organizers) did not sleep much, the predictions were finally submitted on Sunday morning. It was the turn for the teams to describe their work in 4-minute presentations in front of a jury formed by Carlos Pérez García-Pando, Kim Serradell and Maria Teresa Pay from BSC, Marc Torrent from the Big Data Center of Excellence, Salvador Lladó from Leitat, and Manuel Bruscas and Didac Fortuny from BcnAnalytics.

Two awards were given: The accuracy award, which was given to the team with more precise predictions, consisted on 2000 € and a pass for the Mobile World Congress 2018 for each member of the team. The winning team was “Worthless Without Coffee”, who performed a time series prediction using concentration values of the previous days, predictions of the CALIOPE system, concentration increases, some calendar variables and the characteristics of the measurement stations. They have kindly agreed to share their code, which can be found following this link.

The creativity award took into account the originality in facing the challenge and the insights found within the data. The winners of this award were the team “Dreamers”, who proposed some appealing policies to improve the air quality, and the team “Alpha”, who made useful suggestions to the members of the BSC to improve their predictions based on what they observed within the data. Each team won 600€ and passes for the 4 Years From Now 2018 event.

The datathon is over but there is still room to improve air quality predictions. For this reason, the data set will be kept public and any restless data scientist will be able to access it and keep working on the problem. Following this link anyone can download the data and the documentation given in the datathon. So, data scientists, keep modelling and mind the air quality!