schochastics
http://blog.schochastics.net/
Recent content on schochasticsHugo -- gohugo.ioen-usDavid SchochSat, 10 Mar 2018 00:00:00 +0000Analyzing NBA Player Data III: Similarity Networks
http://blog.schochastics.net/post/analyzing-nba-player-data-iii-similarity-networks/
Sat, 10 Mar 2018 00:00:00 +0000http://blog.schochastics.net/post/analyzing-nba-player-data-iii-similarity-networks/This is the last part of the mini series Analysing NBA Player data. The first part was concerned with scraping and cleaning player statistics from any NBA season. The second part showed how to use principal component analysis and k means clustering to “revolutionize” player positions. Which kind of failed. Anyway, this third part is now dealing with something a little more advanced, namely similarity networks of players and what we can learn from them.Analyzing NBA Player Data II: Clustering Players
http://blog.schochastics.net/post/analyzing-nba-player-data-ii-clustering/
Sun, 04 Mar 2018 00:00:00 +0000http://blog.schochastics.net/post/analyzing-nba-player-data-ii-clustering/This is the second post of my little series Analyzing NBA player data. The first part was concerned with scraping and cleaning player statistics from any NBA season. This post is dealing with gaining some inside in the player stats. In particular, clustering players according to their stats to produce a new set of player positions.
#used libraries library(tidyverse) # for data wrangling library(rvest) # for web scraping library(janitor) # for data cleaning library(factoextra) # for pca and cluster visuals theme_set(theme_minimal()+ theme(legend.Analyzing NBA Player Data I: Getting Data
http://blog.schochastics.net/post/analyzing-nba-player-data-i-getting-data/
Sat, 03 Mar 2018 00:00:00 +0000http://blog.schochastics.net/post/analyzing-nba-player-data-i-getting-data/As a football (soccer) data enthusiast, I have always been jealous of the amount of available data for American sports. While much of the interesting football data is proprietary, you can can get virtually anything of interest for the NBA, MLB, NFL or NHL.
I have decided to move away from football for a moment and write a little series on Analyzing NBA player data. The series will go through all the major steps in a data analytic pipeline, such as obtaining, cleaning, exploring and analyzing data, with a rich set of statistics for NBA players.Using UMAP in R with rPython
http://blog.schochastics.net/post/using-umap-in-r-with-rpython/
Wed, 14 Feb 2018 00:00:00 +0000http://blog.schochastics.net/post/using-umap-in-r-with-rpython/I wrote about dimensionality reduction methods before and now, there seems to be a new rising star in that field, namely the Uniform Manifold Approximation and Projection, short UMAP. The paper can be found here, but be warned: It is really math-heavy. From the abstract:
UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data.Sample Entropy with Rcpp
http://blog.schochastics.net/post/sample-entropy-with-rcpp/
Wed, 07 Feb 2018 00:00:00 +0000http://blog.schochastics.net/post/sample-entropy-with-rcpp/Entropy. I still shiver when I hear that word, since I never fully understood that concept. Today marks the first time I was kind of forced to look into it in more detail. And by “in detail”, I mean I found a StackOverflow question that had something to do with a problem I am having (sound familiar?). The problem was is about complexity of time series and one of the suggested methods was Sample Entropy.SOMs and ggplot
http://blog.schochastics.net/post/soms-and-ggplot/
Wed, 24 Jan 2018 00:00:00 +0000http://blog.schochastics.net/post/soms-and-ggplot/#used packages library(tidyverse) # for data wrangling library(stringr) # for string manipulations library(kohonen) # implements self organizing maps library(ggforce) # for additional ggplot features I introduced self-organizing maps (SOM) in a previous post and since then I am using the kohonen package on a daily basis. However, I prefer the ggplot style plotting, so I reimplemented the SOM plots of the package with the ggplot2 package. But don’t get me wrong, the kohonen package does an amazing job in visualizing SOMs.Traveling Beerdrinker Problem
http://blog.schochastics.net/post/traveling-beerdrinker-problem/
Fri, 19 Jan 2018 00:00:00 +0000http://blog.schochastics.net/post/traveling-beerdrinker-problem/Whenever I participate in a Science Slam, I try to work in an analysis of something typical for the respective city. My next gig will be in Munich, so there are two natural options: beer or football. In the end I choose both, but here I will focus on the former.
#used packages library(tidyverse) # for data wrangling library(TSP) #solving Traveling Salesman problems library(ggmap) #maps in ggplot2 library(leaflet) #interactive maps Data Munich is, among other things of course, famous for its beergardens.A wild R package appears! Pokemon/Gameboy inspired plots in R
http://blog.schochastics.net/post/a-wild-r-package-appears/
Sun, 17 Dec 2017 00:00:00 +0000http://blog.schochastics.net/post/a-wild-r-package-appears/I have to comute quite long every day and I always try to keep occupied with little projects. One of my first projects was to increase my knowledge on how to create R packages. The result of it is Rokemon, a Pokemon/Game Boy inspired package. In this post, I will briefly introduce some functionalities of the package and illustrate how incredible useful it can be. A similar introduction can also be found on github.Predicting Player Positions of FIFA 18 Players
http://blog.schochastics.net/post/predicting-player-positions/
Fri, 24 Nov 2017 00:00:00 +0000http://blog.schochastics.net/post/predicting-player-positions/In this post, I will use the results of the exploratory analysis from the previous post and try to predict the position of players in FIFA 18 using different machine learning algorithms.
As a quick reminder, these were the figures we obtained using PCA, t-SNE and a self organizing map. #used packages library(tidyverse) # for data wrangling library(hrbrthemes) # nice themes for ggplot library(caret) # ML algorithms Data The data we use comes from Kaggle and contains around 18,000 players of the game FIFA 18 with 75 features per player.Dimensionality Reduction Methods Using FIFA 18 Player Data
http://blog.schochastics.net/post/dimensionality-reduction-methods/
Sun, 19 Nov 2017 00:00:00 +0000http://blog.schochastics.net/post/dimensionality-reduction-methods/In this post, I will introduce three different methods for dimensionality reduction of large datasets.
#used packages library(tidyverse) # for data wrangling library(stringr) # for string manipulations library(ggbiplot) # pca biplot with ggplot library(Rtsne) # implements the t-SNE algorithm library(kohonen) # implements self organizing maps library(hrbrthemes) # nice themes for ggplot library(GGally) # to produce scatterplot matrices Data The data we use comes from Kaggle and contains around 18,000 players of the game FIFA 18 with 75 features per player.About
http://blog.schochastics.net/page/about/
Sun, 19 Nov 2017 00:00:00 +0100http://blog.schochastics.net/page/about/I have been writing for a while now for my mildlyscientific blog. This blog was mainly a playground for non-scientific posts about me playing with weird or “mildly” interesting data. But I somehow lost the motivation and was also lacking the time. You may ask, so when you no time for blogging, then why start a new one? Well, I decided that it was time to dive into this whole “Big Data” and “Machine Learning” thing since it has become somewhat relevant for my research.