Category: Machine Learning
Google Cloud Data Engineer Exam Preparation
This is a little text with all the stuff that helped me prepare for the Google Cloud Data Engineer Exam. There are a lot of courses and resources, that help you in preparing for this. The following links helped me in preparation for my Google Data Engineer Exam. On Coursera there is are several courses…
Plumber: Getting R ready for production environments?
R Project and Production Running R Project in production is a controversially discussed topic, as is everything concerning R vs Python. Lately there have been some additions to the R Project, that made me look into this again. Researching R and its usage in production environments I came across several packages / project, that can…
Apache Spark 2.0
Apache Spark has release version 2.0, which is a major step forward in usability for Spark users and mostly for people, who refrained from using it, due to the costs of learning a new programming language or tool. This is in the past now, as Spark 2.0 supports improved SQL functionalities with SQL2003 support. It…
Python vs. R for Data Science
In Data Science there are two languages that compete for users. On one side there is R, on the other Python. Both have a huge userbase, but there is some discussion, which is better to use in a Data Science context. Lets explore both a bit: R R is a language and programming environment especially…
Apache Spark: The Next Big (Data) Thing?
Since Apache Spark became a Top Level Project at Apache almost a year ago, it has seen some wide coverage and adoption in the industry. Due to its promise of being faster than Hadoop MapReduce, about 100x in memory and 10x on disk, it seems like a real alternative to doing pure MapReduce. Written in…
SQL on Hadoop: Facebook’s Presto
Earlier this month Facebook open sourced its own product for using SQL on Hadoop. It is called Presto and is something like Facebook’s answer to Cloudera’s Impala or Hortonwork’s Stinger already presented in an earlier post called SQL and Hadoop on this site. Presto is unlike Hive and more like Impala, since it doesn’t rely…
Big Data in Learning
There are many fields in which big data can improve results. One of these being (e-)learning. Until recently the focus on analysing learning lay on analysing results of exams but with big data and analytics there are new possibilities to enhance the experience of learning as a whole. For example there is the possibility to…
Data Science and Machine Learning
Machine Learning is acknowlegded as a part of Data Science, but will it be able to replace a Data Scientist? There have been several articles around that topic in the last few years and months. It’s true there has been some major progress in the field of machine learning and there are already articles about…