**Part II: Complex Networks Lecture 9: Empirical Properties and Metrics **
**What is a Network?** ## A set of “nodes” - Humans, routers, web pages, telephone switches, airports, proteins, scientific articles …
## Relations between these nodes - humans: friendship/relation or online friendship
- routers, switches: connected by a communication link
- web pages: hyperlinks from one to other
- airports: direct flights between them
- articles: one citing the other
- proteins: link if chemically interacting
## Network often represented as
## a graph: - vertex = node
- link relation (weight strength)
**Social Networks (of the past)**
**Social Networks (of the past)**
**Network Research of the Past** ## Mostly done by Social Scientists ## Interested in Human (Social) Networks ## Spread of Diseases, Influence, etc. ## Methodology: Questionnaires cumbersome, (lots of) bias ## Network Size: 10s or at most 100s
**Citation Networks**
**Email Network **
**(a subset!) of the Internet Graph**
**(Online) Social Networks**
**New York Subway Network**
**The Science of Complex Networks ** ## The study of large networks coming from all sorts of diverse areas - We will focus on technological (e.g. Internet) and information networks (e.g. Web, Facebook)
- Cannot visually observe such networks (as in the case of old social networks of few 10s of nodes) need ways to measure them, and quantify their properties
- the field is often called Social Networks or Network Science or Network Theory
## Question 1: What are the statistical properties of real networks? - Connectivity, paths lengths, degree distributions
- How do we measure such huge networks sampling
## Question 2: Why do these properties arise? - Models of large networks: random graphs
- Deterministic ways too complex/restrictive
## Question 3: How can we take advantage of these properties? - Connectivity (epidemiology, resilience)
- Spread (information, disease)
- Search (Web page, person)
**Part I: Network Properties of Interest** ## There are a lot of different properties we might be interested in also depends on application ## But there are some commonly studied properties for 2 reasons: - These properties are important for key applications
- The majority of networks exhibit surprising similarities with respect to these properties.
## Degree distribution (“scale free structure”) ## Path length (“small world phenomena”) ## Clustering (“community structure”)
**Undirected Graphs** ## Graph G=(V,E) - V = set of vertices
- E = set of edges
**Directed Graphs** ## Graph G=(V,E) - V = set of vertices
- E = set of edges
**Weighted and Unweighted Graphs** ## Edges have / do not have a weight associated with them
**Undirected graph: Degree Distribution**
**Undirected Graph: Degree Distribution**
**Directed Graph: In- and Out-Degree**
**Paths** ## Path from node i to node j: a sequence of edges (directed or undirected from node i to node j) - path length: number of edges on the path
- nodes i and j are connected
- cycle: a path that starts and ends at the same node
**Shortest Paths** ## Shortest Path from node i to node j - also known as BFS path, or geodesic path
**Diameter** ## The longest shortest path in the graph
**Undirected graph: Components**
**Fully Connected Graph** ## Clique Kn ## A graph that has all possible n(n-1)/2 edges
**Directed Graph**
**Adjacency Matrix: Directed Graph** ## Adjacency Matrix __symmetric__ matrix for undirected graphs
**Adjacency Matrix: Directed Graph** ## Adjacency Matrix __non-symmetric__ matrix for undirected graphs
**Examples of Adjacency Matrices **
**Measuring Real Networks: Degree distributions** ## Problem: find the probability distribution that best fits the observed data
**Exponential distribution** ## Probability of having k neighbors ## Identified by a line in the log-linear plot
**Power-law distributions** ## Right-skewed/Heavy-tail distribution - there is a non-negligible fraction of nodes that has very high degree (hubs)
- scale-free: f(ax) = bf(x), no characteristic scale, average is not informative
**Power Law vs. Exponential Distribution**
**Internet Topology Primer**
**Internet Degree Distribution**
**Degree Distribution for Other Networks**
**Power Law Exponent in Real Networks (M. Newman 2003)**
**Measuring path length** ## dij = shortest path between i and j ## Diameter: ## Characteristic path length: ## Also of interest: distribution of all shortest paths
**Path Length: Lattice Network** ## A total of n nodes arranged in a grid ## Only neighbors (up,down,left,right) connected
## Q: What is the diameter of the network? ## A: 2 -1 ## Q: What is the avg. distance? - i.e. picking two nodes randomly
## A: It is in the order of (i.e. c )
**Path Length: Random Geometric Network** ## n wireless nodes in an area of 1x1 ## Each transmits at distance R ## R must be at least for connectivity
## Q: Choose two random nodes: What is the expected hop count (distance) between them? ## A:
**Millgram’s small world experiment** ## Letters were handed out to people in Nebraska to be sent to a target in Boston ## People were instructed to pass on the letters to someone they knew on first-name basis - ~60 letters, only about 35% delivered
## The letters that reached the destination followed paths of length around 6 ## Six degrees of separation: (play of John Guare) ## Later email version: - In 2001, Duncan Watts, a professor at Columbia University, recreated Milgram's experiment using an e-mail message as the “package" that needed to be delivered.
- Surprisingly, after reviewing the data collected by 48,000 senders and 19 targets in 157 different countries, Watts found that again the average number of intermediaries was 6.
**Kevin Bacon number: link 2 actors in same movie**
**Kevin Bacon Number** ## (statistics from IMDB) ## ~740000 linkable actors ## Average (path length) = 3 ## 99% of actors less than 6 hops ## Try your own actor here: http://www.cs.virginia.edu/oracle/
**Erdos number: collaboration networks** ## Legendary mathematician Paul Erdos, - around 1500 papers and 509 collaborators
## Collaboration Graph: link between two authors who wrote a paper together ## Erdos number of X: hop count between Erdos and author X in collaboration graph ## ~260,000 in connected component
**Internet Path Lengths **
**Internet Path Length: Different Continents**
**Measurement Findings: Path Length** ## Milgram’s experiment => Small World Phenomenon ## Short paths exist between most nodes: Path length l << total nodes N (e.g line network: path length l = O(N))
**Clustering (Transitivity) coefficient** ## Measures the density of triangles (local clusters) in the graph ## Two different ways to measure it: ## The ratio of the means
**Example**
**Clustering (Transitivity) coefficient** ## Clustering coefficient for node i ## The mean of the ratios
**Example** ## The two clustering coefficients give different measures ## C(2) increases with nodes with low degree
**The C(k) distribution** ## The C(k) distribution is supposed to capture the hierarchical nature of the network - when constant: no hierarchy
- when power-law: hierarchy
**Clustering Coeff. In Real Nets (M. Newman 2003)**
**Summary of Findings** ## Most real networks have… ## Short paths between nodes (“small world”) ## Transitivity/Clustering coefficient that is finite > 0 ## Degree distribution that follows a power law |