Acte presentació llibre INTRODUCCIÓN A APACHE SPARK para programar el Big Data

Tenim el plaer d’informar-vos de la presentació del llibre ‘Introducción a Apache Spark’ dels autors Mario Macías, Mauro Gómez, Rubèn Tous i Jordi Torres. Es tracta del primer llibre sobre Apache Spark escrit en castellà amb el suport de MATEI ZAHARIA (creador d’Spark, CTO a Databricks i vicepresident d’Apache Spark) que n’ha escrit el pròleg.

Apache Spark és el software de càlcul distribuit (amb especial énfasi a aplicacions d’anàlisi de dades i machine learning) més popular actualment. Si hi teniu interès o el voleu conèixer, no dubteu a visitar el Barcelona Spark Meetup!

L’event serà el proper 3 de novembre a la UPC. Per assistir-hi, cal registrar-se a

Nonregular event: Data Science at Booking

We are pleased to announce the following event, sponsored by and in collaboration with Gemleb. In this session, people from will explain us what kind of problems they deal with related to data science and machine learning (the talks may be more technical oriented than usual).

1) Data Science: For Fun and For Profit

Data Science is relatively new, but the ideas and techniques that form the underpinnings for this evidence-oriented discipline have a solid foundation in hundreds of years of scientific development. In order to understand the new science of data, one must first understand the science of science.
The Scientific Method, the unintended effects of repeated significance testing and Simpson’s paradox: this talk will focus on the practical applications of the theoretical constructs that lie at the heart of Data Science; and expand on some potential pitfalls of statistical analysis that you are likely to encounter when venturing into the field.

Bio: Lukas Vermeer (Data Scientist,

Lukas is an experienced data science professional with a background in computing science and online machine learning for real time decision support. A strong advocate of “Evidence-Based Everything”, he is forever learning and helping machines do the same. As a Data Scientist at, the world’s leading accommodation website, Lukas is exploring novel ways to make booking hotels online into a more personal experience.

2) Topic Modelling on Travel Data. collects millions of diverse endorsements from its users, for example, London endorsed for Shopping, Brussels for Chocolate, Athens for Museums and Barcelona for its nightlife. These endorsements are organised using Latent Dirichlet Allocation to a set of topics and used to personalise the Email-Marketing campaign of The results from experiments on more than 40 million unique users demonstrate the conceptual value of the discovered topics.

Bio: Athanasios Noulas

Athanasios Noulas completed his PhD in Machine Learning in the University of Amsterdam where he focused on Dynamic Bayesian Networks and Deep Learning. He then worked as a strategist in Source Capital where he developed algorithms for high frequency automated training. He is currently working for as a Data Scientist in the Visitor Profiling team, where he performs analysis on user-behaviour and implements algorithms that adjust the web-site to the user’s needs.

At the end we will have some drinks and time for networking.

If you want to attend, please register via meetup.


Idescat’s 250m grid population data

We are glad to announce that population data in a 250m grid from Idescat is now available. Here you can find the following fields from the 2011 census ‘Registre de Població’ from January 1st of 2014:

  • Total population
  • Men and women population
  • Population by ages: 0-14, 15-64, more than 64 and more than 85

Click to enlarge
Click to enlarge

You can also find the (R) code to create the map of any of the variables above in the BcnAnalytics github account

We would like to thank Idescat for their support.

New data set with demographic information by inAtlas

People from the BcnAnalytics community can enjoy the data set provided by inAtlas. It includes information at censal tract level of all Catalonia of the following quantities:

  • Age
  • Gender
  • Nationality

You can download the data set here or via R through the BcnDataAccess package, writing BcnDataSources$inAtlas .


We want to thank inAtlas for their help.

New dataset: demographic data at census tract level, provided by Idescat

A new data set is at our disposal, containing demographic information of Barcelona at census tract level by gender, provided by Idescat. This data has been obtained from the census of 2001. The data set contains:

  • Age
  • Knowledge of catalan
  • Level of education
  • Main activity
  • Profession
  • Place of birth

You can download the data set here. You have also direct acces from R through the package BcnDataAccess, writing BcnDataSources$Idescat$Cens2001.
We want to  thank Idescat for providing the data.