statnet / tergm

Fit, Simulate and Diagnose Models for Network Evolution Based on Exponential-Family Random Graph Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the tergm workshop EGMME example is taking a REALLY long time again

martinamorris opened this issue · comments

i raised this a couple of wks back -- i think in an email? in any case, we decided there was a lot of stochastic variability in the fitting times and dropped it.

i'm not sure if anything material has changed in tergm since then, but that same model is now taking what seems like forever to fit. it seems like forever b/c i haven't gotten it to converge yet, so i've stopped it a couple of times after 20 min or so and started again to see if it was a fluke. but no luck.

@krivit can you test on your machine again before you release tergm?

note: i've commented out the interactive progress plot, in case that was slowing things down. it may be, but even without it i can't get the model to converge.

library(tergm)
data(florentine)

startTime <- Sys.time()
tergm.fit.1 <- tergm(
  flobusiness ~ 
    Form(~ edges + gwesp(0, fixed=T)) + 
    Diss(~ offset(edges)),
  targets = "formation",
  offset.coef = log(9),
  estimate = "EGMME"#,
  #control = control.tergm(SA.plot.progress=TRUE)
  )
stopTime <- Sys.time()
print(paste("Estimation time:", stopTime-startTime))

an hour later...

========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.5 \\\\/\/\/\////////////\/\\\/\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.6 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.7 \//////////////////////////////////////////////////////////////////////////////////\\\/\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.8 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.9 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.10 /////////////////////////////////////////////////////////////\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.11 \\\//\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.12 /////\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.13 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.14 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.15 //\\/\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.16 ///////\\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.17 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.18 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.19 \\\\\
========  Phase 3: Simulate from the fit and estimate standard errors. ========
Subphase 2.20 ////////////////////////////////////////////////////////////////////////////

welp. it's 12:25am now, and it still hasn't finished. i'll leave it running, but i'm off to bed. something is clearly wrong.

What happened is that exactly what we were worried about before with respect to Diss() vs Persist() had come back to bite us. Unlike the old dissolve= and the new Persist(), for Diss(), higher parameters means more dissolution, which means log(9) dissolution coefficient implies that each edge that exists at time t has a 10% chance of surviving to t+1.

Just change log(9) to -log(9), and it finishes in 3-5 minutes.

jeez. thx.

that said, it's still surprising to me that this simple model, on a tiny network, would take 3-5 minutes.

is this the full stergm estimation curse? or is there some possibility of improvment?

It's a model that dissolves almost 9/10 of the ties during a given time step. It may well be the case that it can't actually reach the target statistics.