This post was semi automatically converted from blogdown to Quarto and may contain errors. The original can be found in the archive.
During the last 5 years, I have accumulated various scripts with (personal) convenience functions for network analysis and I also implemented new methods from time to time which I could not find in any other package in R. The package netUtils
gathers all these functions and makes them available for anyone who may also needs to apply “non-standard” network analytic tools. In this post, I will briefly highlight some of the most prominent functions of the package. All available functions are listed in the README on github.
# developer version
::install_github("schochastics/netUtils")
remotes
install.packages("netUtils")
Random graph generators
The package includes three new random graph generators:
graph_kpartite()
creates a random k-partite network.sample_coreseq()
creates a random graph with given coreness sequence.sample_pa_homophilic()
creates a preferential attachment graph with two groups of nodes.split_graph()
sample graph with perfect core-periphery structure
graph_kpartite()
can be used to construct a complete k-partite graph. A k-partite graph is a graph with k groups of nodes where no two nodes within the same group are connected but are connected to all other nodes in other groups.
The example below shows a 3-partite graph where each group consists of 5 nodes.
<- graph_kpartite(n = 15, grp = c(5,5,5)) g
The function sample_coreseq()
is conceptually very similar to the function sample_degseq()
in {{igraph}}. Instead of sampling networks with the same degree sequence, sample_coreseq()
samples network which have the same k-core decomposition
<- sample_gnp(40,0.1)
g1 <- sort(coreness(g1))
kcore1 <- sample_coreseq(kcore1)
g2 <- sort(coreness(g2))
kcore2 all(kcore1==kcore2)
## [1] TRUE
sample_pa_homophilic()
creates a preferential attachment graph with two groups of nodes. The parameter h_ab
is used to adjust the probability that an edge between groups occurs. A network is maximally heterophilic if h_ab=1
, that is there only exist edges between groups, and maximally homophilic if h_ab=0
, that is there only exist edges within groups.
# maximally heterophilic network
sample_pa_homophilic(n = 50, m = 2,minority_fraction = 0.2,h_ab = 1)
# maximally homophilic network
sample_pa_homophilic(n = 50, m = 2,minority_fraction = 0.2,h_ab = 0)
The figure below shows some examples for varying degrees of homophily.
The function split_graph()
can be used to create graphs with a perfect core-periphery structure. This means that there are two groups of nodes: One forms a clique (the core: all nodes are pairwise connected) and the other group is only connected to nodes in the core (the periphery: all nodes are pairise disconnected)
In the below example, we create a split graph with 100 nodes and core size 2o (100*0.2)
<- split_graph(n = 100,p = 0.3,core = 0.2) sg
The figure below shows the typical pattern of the adjacency matrix of a split graph.
Analytic functions
The most important analytic functions are
triad_census_attr()
which calculates the triad census with vertex attributes.core_periphery()
which fits a discrete core periphery model.
set.seed(112)
<- sample_gnp(20,p = 0.3,directed = TRUE)
g # add a vertex attribute
V(g)$type <- rep(1:2,each = 10)
triad_census_attr(g,"type")
## T003-111 T003-112 T003-122 T003-222 T012-111 T012-121 T012-112 T012-122
## 8 33 28 7 32 40 31 19
## T012-211 T012-221 T012-212 T012-222 T021D-111 T021D-211 T021D-112 T021D-212
## 27 41 25 26 9 19 19 21
## T021D-122 T021D-222 T102-111 T102-112 T102-122 T102-211 T102-212 T102-222
## 7 10 11 18 16 5 19 10
## T021C-111 T021C-211 T021C-121 T021C-221 T021C-112 T021C-212 T021C-122 T021C-222
## 17 23 29 17 19 7 24 10
## T111U-111 T111U-121 T111U-112 T111U-122 T111U-211 T111U-221 T111U-212 T111U-222
## 9 16 7 21 5 13 10 6
## T021U-111 T021U-112 T021U-122 T021U-211 T021U-212 T021U-222 T030T-111 T030T-121
## 11 19 13 3 14 7 11 11
## T030T-112 T030T-122 T030T-211 T030T-221 T030T-212 T030T-222 T120U-111 T120U-112
## 11 13 10 14 8 5 1 8
## T120U-122 T120U-211 T120U-212 T120U-222 T111D-111 T111D-121 T111D-112 T111D-122
## 6 0 4 4 4 12 8 13
## T111D-211 T111D-221 T111D-212 T111D-222 T201-111 T201-112 T201-121 T201-122
## 14 20 10 15 0 5 3 5
## T201-221 T201-222 T030C-111 T030C-112 T030C-122 T030C-222 T120C-111 T120C-121
## 3 3 2 12 14 3 3 8
## T120C-211 T120C-221 T120C-112 T120C-122 T120C-212 T120C-222 T120D-111 T120D-112
## 7 5 5 7 7 6 0 9
## T120D-211 T120D-212 T120D-122 T120D-222 T210-111 T210-121 T210-211 T210-221
## 1 9 4 1 2 8 3 5
## T210-112 T210-122 T210-212 T210-222 T300-111 T300-112 T300-122 T300-222
## 1 3 5 5 0 1 0 2
The output is a named vector where the names are of the form Txxx-abc, where xxx corresponds to the standard triad census notation and “abc” are the attributes of the involved nodes.
The function core_periphery()
fits a standard discrete core-periphery model to the data
#graph with perfect core-periphery structure
<- split_graph(n = 100, p = 0.3, core = 0.2)
core_graph core_periphery(core_graph)
## $vec
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## $corr
## [1] 1
# random graphs have a very weak core-periphery structure
<- sample_gnp(n = 100,p = 0.2)
rgraph core_periphery(rgraph)
## $vec
## [1] 1 0 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 0 0 0 0
## [38] 1 0 1 0 1 0 1 1 1 1 1 1 0 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 0
## [75] 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0
##
## $corr
## [1] 0.162141
More advanced core-periphery models are planned for a future release.
A new print method
I also extended str
to work with igraph objects for an alternative way of printing igraph objects using additional information.
library(networkdata)
data("greys")
str(greys)
## -----------------------------------------------------------
## UNNAMED NETWORK (undirected, unweighted, one-mode network)
## -----------------------------------------------------------
## Nodes: 54, Edges: 57, Density: 0.0398, Components: 4, Isolates: 0
## -Vertex Attributes:
## name(c): Addison Montgomery, Adele Webber, Teddy Altman, Amelia ...
## sex(c): F, F, F, F, F, F, M, F, M, M, F, M, M, M, F, M, F, F, M, F, M, ...
## race(c): White, Black, White, White, White, White, Black, Black, Black, ...
## birthyear(n): 1967, 1949, 1969, 1981, 1976, 1975, 1981, 1969, 1972, ...
## position(c): Attending, Non-Staff, Attending, Attending, Attending, ...
## season(n): 1, 2, 6, 7, 5, 3, 6, 1, 6, 7, 8, 3, 2, 1, 1, 2, 1, 2, 1, 1, ...
## sign(c): Libra, Leo, Pisces, Libra, Leo, Gemini, Leo, Virgo, Aquarius, ...
## ---
## -Edges (first 10):
## Arizona Robbins->Leah Murphy Alex Karev->Leah Murphy Arizona
## Robbins->Lauren Boswell Arizona Robbins->Callie Torres Erica
## Hahn->Callie Torres Alex Karev->Callie Torres Mark Sloan->Callie Torres
## George O'Malley->Callie Torres Izzie Stevens->George O'Malley Meredith
## Grey->George O'Malley
Twitter Facebook Google+ LinkedIn
Please enable JavaScript to view the comments powered by Disqus.
schochastics
© 2023 / Powered by Hugo
Ghostwriter theme By JollyGoodThemes / Ported to Hugo By jbub
Reuse
Citation
@online{schoch2022,
author = {Schoch, David},
title = {Extending Network Analysis in {R} with {netUtils}},
date = {2022-08-27},
url = {http://blog.schochastics.net/posts/2022-08-27_extending-network-analysis-in-r-with-netutils/},
langid = {en}
}