The increasing demand for Big Data specialists has not gone unnoticed in the past few years, especially if you happen to mingle with the IT crowd…
According to ‘La Vanguardia 25/10/17’: 350,000 jobs not covered in Spain due to lack of big data profiles.
But there are two distinct facets to this beast which is more apparent perhaps when your entry to the BI world comes from a hardware engineering background like electronics or mechatronics perhaps. The more secretive facet, ‘IoT’ will broaden the Big Data experience to applications such as ‘Mobility’ or ‘Security’ by interfacing ‘Data’ with ‘Hardware’ (sensors) to Output a desirable, low probability of failure, ‘Reaction’ utilizing processes such as Deep Learning and Artificial Intelligence (AI).
The other facet, a more open subject, ‘ecommerce’ is the one we will cover extensively today by attending the second day of this year’s Big Data Congress in Barcelona.
In their own words: What is Big Data Congress? Big Data Congress is the leading congress on Big Data in Catalonia organised by Big Data CoE Barcelona. The conference is for the third consecutive year, the meeting point for professionals, suppliers and companies that want to develop or are carrying out projects in the field of Big Data & Data Analytics, extracting data value. The Big Data Congress is the place to explain what is being done in Barcelona, Catalonia and what is being done in the world in the area of Big Data. The conference wants to combine experiences from the Big Data CoE and from their collaborators, with presentations of national and international experts.
26th OCTOBER BIG DATA MATURITY & TECHNOLOGY
The introduction to the day was delivered by Joan Batlle, Director of Digital Transformation and International Relations at the Chief Technology and Digital Innovation Office. Barcelona City Council.
Joan gave us a warm welcome and introduced us to our first Keynote speaker of the day, Hugo Zaragoza, making a point that quite suitably, the company this speaker works for, provided services to most of the consecutive speakers of the day, this company being unsurprisingly ‘Amazon’…
Mr Batlle then closed his introduction suggesting that Hugo was so passionate with the technology involved that the interactions became each time more human in nature. Hugo told us a little about the Amazons stance on Big Data and how they leverage its power to help the company improve its processes. For internal reasons, Amazon did not want their slides published so here is a sketch to illustrate how Amazon describe their ‘growth’ philosophy:
The above image shows the feedback loop introduced to the model to optimize growth. Through knowledge gained via Big Data, cost structures can be lowered which in turn reduce shelf price and thus improve customer experience.
Another area and relatively new service provided by Amazon is ‘AmazonFresh’, which offers fresh localy produced items for sale, as well as a subset of items from the main Amazon.com storefront. Items ordered through AmazonFresh are available for home delivery on the same day or the next day, depending on the time of the order and the availability of transport.
This service disrupts the whole dynamic of Amazon’s usual box-type storage fetched by robots high up on shelves; now the items have a limited shelf-life and can convert into health hazards if the conditions to preserve them is not met.
Hugo goes on to explain that initially, the quality testing was 100% human but some processes turned out to prove 20% unreliable due to differing opinions between human fruit and vegetable selectors. This process was improved considerably by applying Machine Learning to train an AIP (applied image processing) system to automatically select the food items.
(What happened to the five people initially employed to select the goods? I still wonder now!!)
This kind of efficiency is impressive and at the same time worrying I personally think; according to the law of resonance, there is always a perfect middle point in any balanced system and attempting to push the efficiency at one end will surely destabilize something else?! In this case I am referring to the loss of employment of course and wonder who will buy all these wonderful products when no-one is capable of earning money? Unless of course the grand plan is to provide benefits to everyone and thus earning money will no longer be necessary as artificial intelligence takes over?! Or should ‘existential’ variables be integrated by law within deep learning algorithms? Time will tell as this discussion is for another day!!
I was very tempted to raise my hand and ask the above question in the closing Q&A section but restrained myself to avoid converting the Cloud discussion into a rather dark and wet one…
Another member of the public did however have a constructive question to Hugo by asking him why Amazon had chosen Barcelona as HQ. Hugo simply replied that the likely reason was related to the ease of finding highly qualified and talented people in Barcelona compared to other parts of the country.
Marc and Jose were hilarious presenters and really engaged with the public. They began by introducing the idea behind Data Lakes, a place to store all your enterprise data, it being structured or unstructured and in its default format.
The ‘moto’ being (how to) “Build Enterprise Data Lake without Drowning”
It was comforting to see that in SERVIZURICH’s mind, Big Data was not a threat but rather a supplement to Data Analytics as can be observed in the above slide.
The next question being ‘how do we approach this service’.. open-source? If so, what will the support be like? Other options could be ‘On Premise’ or in ‘the Cloud’? To help answer some of these questions, Marc and José Luis suggested: Data scientists love to take technology to the limit of its capabilities with complex algorithms etc.. But, when do we actually go to production with the findings? This is a very difficult question to answer for a data scientist as algorithms can always be improved but on the other hand ‘Production’ cannot be modified once promoted.
To satisfy this ‘play&go’ type scenario, Cloud environments seem much more appropriate not to mention the technology life-cycle which again consolidates the case against investment of on-premise.
Moving on, Victor Dertiano from BI Geek introduced us to one of their recent digital banks projects ‘2GETHERBANK’ and described how they integrated various Big Data solutions to resolve distinct issues. This was beautifully illustrated through their animated presentation slides available here and it was clear Machine Learning played a big part of the conceptual architecture.
He also consolidated the fact that Cloud easily outshone the On-Premise option for a Big Data infrastructure for the following reasons:
IT team dedicated to maintenance | You do not need a dedicated IT team
Limited access to devices | Access from any location (internet)
Limited software upgrades and improvements | Continuous updates and improvements
High initial and renewal cost of equipment | Costs limited to use
Risk of data loss managed | Low risk of data loss
Versus the classic on-premise solution as follows:
The cloud solution, Google in this case, offered a much more flexible landscape:
Our final speaker of this particular session was Mr Oscar Romero, Professor of the Department of Engineering of Services and Information Systems of the UPC and member of the research group in Database Technologies and Information Management Universitat Politècnica de Catalunya.
Oscar began by mythological cleansing proclaiming that Big Data did not mean a ‘large volumes of data’.. confused? Read on… Big Data actually best fitted into a glove of ‘Variety of data’ and more accurately, the ‘integration of a large variety of data’.
That said, the key to ensuring the data lake did not convert into a data swamp was in its Semantic layer, The Metadata!!
The full slide show is available here but to cut to the chase, our conclusion for the best data governance was illustrated as follows:
Written by Stephane Rodicq