TY - GEN

T1 - To sample or to smash? Estimating reachability in large time-varying graphs

AU - Basu, P

AU - Yu, F

AU - Bar-Noy, A

AU - Rawitz, D

N1 - Place of conference:USA

PY - 2014

Y1 - 2014

N2 - Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node in two different settings - when a time-evolution of a T-graph is specified by a probabilistic model, and when the actual T-graph snapshots are known and given to us offline (“data aware” setting). Since the value of T could be large in many applications, we propose two simple techniques, namely T-graph sampling and T-graph smashing for significantly reducing the complexity of this computation, while minimizing the estimation error. We show that for the data-aware case, both T-graph sampling and smashing problems are NP-hard, but they are amenable to reasonably good approximations. We also show that for the probabilistic setting where each graphlet in a T-graph is an Erdos-Renyi random graph, sampling yields a loose lower bound for the T-reachable set, while different styles of smashing yield more useful upper and lower bounds. Finally, we show that our algorithms (both data-aware and data-oblivious) can estimate the T-reachable set in real world time-varying networks within reasonable accuracy using less than 0.5% of the number of graphlets.

AB - Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node in two different settings - when a time-evolution of a T-graph is specified by a probabilistic model, and when the actual T-graph snapshots are known and given to us offline (“data aware” setting). Since the value of T could be large in many applications, we propose two simple techniques, namely T-graph sampling and T-graph smashing for significantly reducing the complexity of this computation, while minimizing the estimation error. We show that for the data-aware case, both T-graph sampling and smashing problems are NP-hard, but they are amenable to reasonably good approximations. We also show that for the probabilistic setting where each graphlet in a T-graph is an Erdos-Renyi random graph, sampling yields a loose lower bound for the T-reachable set, while different styles of smashing yield more useful upper and lower bounds. Finally, we show that our algorithms (both data-aware and data-oblivious) can estimate the T-reachable set in real world time-varying networks within reasonable accuracy using less than 0.5% of the number of graphlets.

UR - https://scholar.google.co.il/scholar?q=To+sample+or+to+smash%3F+Estimating+reachability+in+large+time-varying+graphs&btnG=&hl=en&as_sdt=0%2C5

M3 - Conference contribution

BT - 14th SIAM International Conference on Data Mining (SDM)

PB - SDM

ER -