statnet / network

Classes for Relational Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

network_from_data_frame()

knapply opened this issue · comments

Data frames are almost always the natural result of a data cleaning pipeline, but the network package lacks a built-in equivalent to igraph::graph_from_data_frame(). The goal of network_from_data_frame() is to fill that gap.

Is there interest in including this within the network package itself?

If so, I'll open a pull request.

edited:

Example Usage:

# install.packages("remotes")
# remotes::install_github("knapply/network")

vertex_df <- data.frame(name = letters[1:5],
                        int_attr = seq_len(5),
                        chr_attr = LETTERS[1:5],
                        lgl_attr = c(TRUE, FALSE, TRUE, FALSE, TRUE),
                        stringsAsFactors = FALSE)
vertex_df[["df_list_attr"]] <- replicate(5, mtcars, simplify = FALSE)

edge_df <- data.frame(from = c("b", "c", "c", "d", "d", "e"),
                      to = c("a", "b", "a", "a", "b", "a"),
                      int_attr = seq_len(6),
                      chr_attr = LETTERS[1:6],
                      lgl_attr = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE),
                      stringsAsFactors = FALSE)
edge_df[["df_list_attr"]] <- replicate(6, mtcars, simplify = FALSE)

tibble::as_tibble(vertex_df) # tibble for pretty printing only
#> # A tibble: 5 x 5
#>   name  int_attr chr_attr lgl_attr df_list_attr       
#>   <chr>    <int> <chr>    <lgl>    <list>             
#> 1 a            1 A        TRUE     <df[,11] [32 x 11]>
#> 2 b            2 B        FALSE    <df[,11] [32 x 11]>
#> 3 c            3 C        TRUE     <df[,11] [32 x 11]>
#> 4 d            4 D        FALSE    <df[,11] [32 x 11]>
#> 5 e            5 E        TRUE     <df[,11] [32 x 11]>
tibble::as_tibble(edge_df)
#> # A tibble: 6 x 6
#>   from  to    int_attr chr_attr lgl_attr df_list_attr       
#>   <chr> <chr>    <int> <chr>    <lgl>    <list>             
#> 1 b     a            1 A        TRUE     <df[,11] [32 x 11]>
#> 2 c     b            2 B        FALSE    <df[,11] [32 x 11]>
#> 3 c     a            3 C        TRUE     <df[,11] [32 x 11]>
#> 4 d     a            4 D        FALSE    <df[,11] [32 x 11]>
#> 5 d     b            5 E        TRUE     <df[,11] [32 x 11]>
#> 6 e     a            6 F        FALSE    <df[,11] [32 x 11]>

network::network_from_data_frame(edge_df, directed = TRUE, vertices = vertex_df)
#>  Network attributes:
#>   vertices = 5 
#>   directed = TRUE 
#>   hyper = FALSE 
#>   loops = FALSE 
#>   multiple = FALSE 
#>   bipartite = FALSE 
#>   total edges= 6 
#>     missing edges= 0 
#>     non-missing edges= 6 
#> 
#>  Vertex attribute names: 
#>     chr_attr df_list_attr int_attr lgl_attr vertex.names 
#> 
#>  Edge attribute names: 
#>     chr_attr df_list_attr int_attr lgl_attr

devtools::test("~/network", filter = "network_from_data_frame")
#> Loading network
#> network: Classes for Relational Data
#> Version 1.16-378 created on 2019-10-11.
#> copyright (c) 2005, Carter T. Butts, University of California-Irvine
#>                     Mark S. Handcock, University of California -- Los Angeles
#>                     David R. Hunter, Penn State University
#>                     Martina Morris, University of Washington
#>                     Skye Bender-deMoll, University of Washington
#>  For citation information, type citation("network").
#>  Type help("network-package") to get started.
#> Testing network
#> v |  OK F W S | Context
#> v |  24 0      | network_from_data_frame
#> 
#> == Results ==================================================================================================================
#> OK:       24
#> Failed:   0
#> Warnings: 0
#> Skipped:  0

Hey Brendan, many thx for proposing this! And sorry for the delay getting back to you. This looks like some nice functionality, and we're discussing it, so will respond soon.

Hi Brendan,

So, again many thx for this. It's great to start adding the new functionality available in data frames -- esp. the attrs.

We would like to include it in network, so we invite you to submit a PR.

The PR should include:

  1. unit tests for all functions you will be contributing
  2. examples in Roxygen format
  3. code that passes rcmdcheck

Once the code has been integrated and released, we will add you to the contributors list :)

If you have any questions, please don't hesitate to ask. And again -- many thx for the contribution.

Hi Martina,

That's very gracious of you.

I had a difficult time getting the package to pass CMD Check, both locally and on Travis, but I brought in the the most recent changes from the statnet master branch and was able to get it working today.

This required that I make a few changes that I didn't anticipate:

  • R/zzz.R: I had to remove the .onLoad() call to be able to run devtools::check() or devtools::document(). I believe this is redundant with #' @useDynLib network, .registration = TRUE that is in R/network-package.R.
  • DESCRIPTION: roxygen warns if Encoding: UTF-8 is not present, so I added that.

Otherwise, the function and its documentation are added and passing. Tests are in tests/test_network_from_data_frame.R. Since there are many tests that likely predate the testthat package, I left the test_that() functions commented out to make it easier to move to that paradigm if desired in the future. The function has 100% coverage.

Let me know if there are questions/comments/concerns, or additional functionality.