One-mode projecting is extensively used to compress bipartite networks. Since one-mode projection is always less informative than the bipartite representation, a proper weighting method is required to better retain the original information. In this article, inspired by the network-based resource-allocation dynamics, we raise a weighting method which can be directly applied in extracting the hidden information of networks, with remarkably better performance than the widely used global ranking method as well as collaborative filtering. This work not only provides a creditable method for compressing bipartite networks, but also highlights a possible way for the better solution of a long-standing challenge in modern information science: How to do a personal recommendation.

Illustration of a bipartite network a, as well as its X projection b and Y projection c. The edge weight in b and c is set as the number of common neighbors in Y and X, respectively.
Illustration of the resource-allocation process in bipartite network. The upper three are X nodes and the lower four are Y nodes. The whole process consists of two steps: First, the resource flows from X to Y a → b, and then returns to X b → c. Different from the prior network-based resource-allocation dynamics 53, the resource here can only flow from one node set to another without consideration of asymptotical stable flow among one node set.

Figures - uploaded by Matus Medo

Author content

All figure content in this area was uploaded by Matus Medo

Content may be subject to copyright.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

Bipartite network projection and personal recommendation

Tao Zhou,

1,2,

*

Jie Ren,

1

Matúš Medo,

1

and Yi-Cheng Zhang

1,3,

1

Department of Physics, University of Fribourg, Chemin du Muse 3, CH-1700 Fribourg, Switzerland

2

Department of Modern Physics and Nonlinear Science Center, University of Science and Technology of China, Hefei Anhui, 230026,

People's Republic of China

3

Lab for Information Economy and Internet Research, Management School, University of Electronic Science and Technology of China,

Chengdu Sichuan, 610054, People's Republic of China

Received 17 July 2007; published 25 October 2007

One-mode projecting is extensively used to compress bipartite networks. Since one-mode projection is

always less informative than the bipartite representation, a proper weighting method is required to better retain

the original information. In this article, inspired by the network-based resource-allocation dynamics, we raise

a weighting method which can be directly applied in extracting the hidden information of networks, with

remarkably better performance than the widely used global ranking method as well as collaborative filtering.

This work not only provides a creditable method for compressing bipartite networks, but also highlights a

possible way for the better solution of a long-standing challenge in modern information science: How to do a

personal recommendation.

DOI: 10.1103/PhysRevE.76.046115 PACS numbers: 89.75.Hc, 87.23.Ge, 05.70.Ln

I. INTRODUCTION

The last few years have witnessed tremendous activity

devoted to the understanding of complex networks 17 .A

particular class of networks is the bipartite networks, whose

nodes are divided into two sets X and Y , and only the con-

nection between two nodes in different sets is allowed as

illustrated in Fig. 1a . Many systems are naturally modeled

as bipartite networks 8 : The human sexual network 9 con-

sists of men and women, the metabolic network 10 consists

of chemical substances and chemical reactions, etc. Two

kinds of bipartite networks are important because of their

particular significance in social, economic, and information

systems. One is the so-called collaboration network, which is

generally defined as a network of actors connected by a com-

mon collaboration act 11 ,12 . Examples are numerous, in-

cluding scientists connected by coauthoring a scientific paper

13,14, movie actors connected by co-starring in the same

movie 1,15 , and so on. Moreover, the concept of collabo-

ration network is not necessarily restricted to social systems

see, for example, recent reports on technological collabora-

tion of software 16 and urban traffic systems 17 兴兲. Al-

though the collaboration network is usually displayed by the

one-mode projection on actors see later the definition, its

fully representation is a bipartite network. The other one is

called the "opinion network" 18,19 , where each node in the

user-set is connected with its collected objects in the object-

set. For example, listeners are connected with the music

groups they collected from a music-sharing library e.g., au-

dioscrobbler.com兲关 20,21, web-users are connected with the

webs they collected in a bookmark site e.g., "delicious"

22, customers are connected with the books they bought

e.g., Amazon.com

兲关 23,24.

Recently, much attention has been paid to analyzing

8,20,2527 and modeling 28 30 bipartite network. How-

ever, for the convenience of directly showing the relations

among a particular set of nodes, the bipartite network is usu-

ally compressed by one-mode projecting. The one-mode pro-

jection onto X X projection for short means a network con-

taining only X nodes, where two X nodes are connected

when they have at least one common neighboring Y node.

Figures 1b and 1c show the resulting networks of X and Y

projection, respectively. The simplest way is to project the

bipartite network onto an unweighted network

13,14,3133, without taking into account the frequency

that a collaboration has been repeated. Although some topo-

logical properties can be qualitatively obtained from this un-

weighted version, the loss of information is obvious. For

example, if two listeners collected more than 100 music

groups each the average number of collected music groups

per listener at audioscrobbler.com is 140 20 兴兲, and only one

music group is selected by both listeners, one may conclude

that those two listeners probably have different music tastes.

On the contrary, if nearly 100 music groups belong to the

overlap, those two listeners are likely to have very similar

habits. However, in the unweighted listener projection, these

two cases have exactly the same graph representation.

*

zhutou@ustc.edu

yi-cheng.zhang@unifr.ch

FIG. 1. Illustration of a bipartite network a, as well as its X

projection b and Y projection c. The edge weight in b and c is

set as the number of common neighbors in Y and X , respectively.

PHYSICAL REVIEW E 76, 046115 2007

1539-3755/2007/764/0461157 ©2007 The American Physical Society046115-1

Since the one-mode projection is always less informative

than the original bipartite network, to better reflect structure

of the network one has to use the bipartite graph to quantify

the weights in the projection graph. A straightforward way is

to weight an edge directly by the number of times the corre-

sponding partnership repeated 34,35 . This simple rule is

used to obtain the weights in Figs. 1b and 1c for X and Y

projection, respectively. This weighted network is much

more informative than the unweighted one, and can be ana-

lyzed by standard techniques for unweighted graphs since its

weights are all integers 36 . However, this method is also

quantitatively biased. Li et al. 37 empirically studied the

scientific collaboration networks, and pointed out that the

impact of one additional collaboration paper should depend

on the original weight between the two authors. For example,

one more co-authorized paper for the two authors having

only co-authorized one paper before should have higher im-

pact than for the two authors having already co-authorized

100 papers. This saturation effect can be taken into account

by introducing a hyperbolic tangent function onto the simple

count of collaborated times 37 . Newman pointed out that

two authors whose names appear on a paper together with

many other co-authors know one another less well on aver-

age than two who were the sole authors of a paper 14 ,to

consider this effect, he introduced the factor 1 / n −1 to

weaken the contribution of collaborations involving many

participants 38,39 , where n is the number of participants

e.g., the number of authors of a paper.

How to weigh the edges is the key question of the one-

mode projections and their use. However, we lack a system-

atic exploration of this problem, and no solid base of any

weighting methods have been reported thus far. For example,

one may ask the physical reason why using the hyperbolic

tangent function to address the saturation effect 37 rather

than other infinite possible candidates. In addition, for sim-

plicity, the weighted adjacent matrix w

ij

is always set to be

symmetrical, that is, w

ij

= w

ji

. However, as in scientific col-

laboration networks, different authors may assign different

weights to the same co-authorized paper, and it is probably

the case that the author having less publications may give a

higher weight, vice versa. Therefore, a more natural weight-

ing method may be not symmetrical. Another blemish in the

prior methods is that the information contained by the edge

whose adjacent X node Y node is of degree 1 will be lost in

Y projection X projection. This information loss may be

serious in some real opinion networks. For example, in the

user-web network of "delicious" http://del.icio.us, a re-

markable fraction of webs have been collected only once and

a remarkable fraction of users have collected only one web.

Therefore, both the user projection and web projection will

squander a lot of information. Since more than half of the

publications in Mathematical Reviews have only one author

31, the situation is even worse in the mathematical collabo-

ration network.

A central problem closely related to the opinion network

is how to extract the hidden information and do a personal

recommendation. The exponential growth of the Internet 40

and World Wide Web 41 confronts people with an informa-

tion overload: They face too much data and sources able to

find out those most relevant for him. One landmark for in-

formation filtering is the use of search engines 42 , however,

it cannot solve this overload problem since it does not take

into account personalization and thus returns the same results

for people with far different habits. So, if the user's habits

are different from the mainstream, it is hard for him to find

out what he likes in the countless searching results. Thus far,

the most potential way to efficiently filter out the information

overload is to recommend personally. That is to say, using

the personal information of a user i.e., the historical track of

this user's activities to uncover his habits and to consider

them in the recommendation. For instance, Amazon.com

uses one's purchase history to provide individual sugges-

tions. If you have bought a textbook on statistical physics,

Amazon may recommend you some other statistical physics

books. Based on the well-developed

WEB 2.0 technology 43 ,

the recommendation systems are frequently used in web-

based movie sharing music sharing, book sharing, etc. sys-

tems, web-based selling systems, bookmark web sites, and so

on. Motivated by the significance in economy and society,

recently, the design of an efficient recommendation algo-

rithm becomes a joint focus from marketing practice 44,45

to mathematical analysis 46 , from engineering science

47 49 to physics community 5052.

In this article, we propose a weighting method, with

asymmetrical weights i.e., w

ij

w

ji

and allowed self-

connection i.e., w

ii

0. Moreover, we give rise to a bridge

connecting the two sides: bipartite network projection and

personal recommendation. The numerical simulation indi-

cates that a directly application of the proposed projecting

method, as a personal recommendation algorithm, can per-

form remarkably better than the widely used global ranking

method GRM and collaborative filtering CF.

II. METHOD

A. Bipartite network projection

Without loss of generality, we discuss how to determine

the edge weight in X projection, where the weight w

ij

can be

considered as the importance of node i in j 's sense, and it is

generally not equal to w

ji

. For example, in the book projec-

tion of a customer-book opinion network, the weight w

ij

be-

tween two books i and j contributes to the strength of book i

recommendation to a customer provided he has bought book

j . In the scientific collaboration network, w

ij

reflects how

likely is j to choose i as a contributor for a new research

project. More generally, we assume a certain amount of a

resource e.g., recommendation power, research fund, etc. is

associated with each X node, and the weight w

ij

represents

the proportion of the resource j would like to distribute to i .

To derive the analytical expression of w

ij

, we go back to

the bipartite representation. Since the bipartite network itself

is unweighted, the resource in an arbitrary X node should be

equally distributed to its neighbors in Y . Analogously, the

resource in any Y node should be equally distributed to its X

neighbors. As shown in Fig. 2a , the three X nodes are ini-

tially assigned weights x , y , and z. The resource-allocation

process consists of two steps; first from X to Y , then back to

X. The amount of resource after each step is marked in Figs.

2b and 2c , respectively. Merging these two steps into one,

ZHOU et al. PHYSICAL REVIEW E 76, 046115 2007

046115-2

the final resource located in those three X nodes, denoted by

x

, y

, and z

, can be obtained as

x

y

z

=

11/18 1/6 5/18

1/9 5/12 5/18

5/18 5/12 4/9

x

y

z

. 1

Note that this 3 3 matrix is column normalized, and the

element in the ith row and j th column represents the fraction

of resource the j th X node transferred to the i th X node.

According to the above description, this matrix is the very

weighted adjacent matrix we want.

Now, consider a general bipartite network G X , Y , E ,

where E is the set of edges. The nodes in X and Y are de-

noted by x

1

, x

2

,...,x

n

and y

1

, y

2

,...,y

m

, respectively. The

initial resource located on the ith X node is f x

i

0. After

the first step, all the resource in X flows to Y , and the re-

source located on the lth Y node reads

f y

l

=

i=1

n

a

il

f x

i

k x

i

, 2

where k x

i

is the degree of x

i

and a

il

is an n m adjacent

matrix:

a

il

=

1,

x

i

y

l

E ,

0, otherwise.

3

In the next step, all the resource flows back to X, and the

final resource located on x

i

reads

f

x

i

=

l=1

m

a

il

f y

l

/ k y

l

=

l=1

m

a

il

k y

l

j=1

n

a

jl

f x

j

k x

j

. 4

This can be rewritten as

f

x

i

=

j=1

n

w

ij

f x

j

, 5

where

w

ij

=

1

k x

j

l=1

m

a

il

a

jl

k y

l

, 6

which sums the contribution from all two-step paths between

x

i

and x

j

. The matrix W = w

ij

n n

represents the weighted X

projection we were looking for. The resource-allocation pro-

cess can be written in the matrix form as f

I

= Wf

J

.

It is worthwhile to emphasize the particular characters of

this weighting method. For convenience, we take the scien-

tific collaboration network as an example, but our statements

are not restricted to the collaboration networks. First, the

weighted matrix is not symmetrical as

w

ij

k x

j

=

w

ji

k x

i

. 7

This is in accordance with our daily experience—the weight

of a single collaboration paper is relatively small if the sci-

entist has already published many papers i.e., he has large

degree, vice versa. Secondly, the diagonal elements in W are

nonzero, thus the information contained by the connections

incident to one-degree Y node will not be lost. Actually, the

diagonal element is the maximal element in each column.

Only if all x

i

's Y neighbors belongs to x

j

's neighbors set,

w

ii

= w

ji

. It is usually found in scientific collaboration net-

works, since some students co-author every paper with their

supervisors. Therefore, the ratio w

ji

/ w

ii

1 can be consid-

ered as x

i

's researching independence to x

j

, the smaller the

ratio, the more independent the researcher is, vice versa. The

independence of x

i

can be approximately measured as

I

i

=

j

w

ji

w

ii

2

. 8

Generally, the author who often publishes papers solely, or

often publishes many papers with different co-authors is

more independent. Note that, introducing the measure I

i

here

is just to show an example how to use the information con-

tained by self-weight w

ii

, without any comments whether to

be more independent is better, or contrary.

B. Personal recommendation

Basically, a recommendation system consists of users and

objects, and each user has collected some objects. Denote the

FIG. 2. Illustration of the resource-allocation process in bipartite

network. The upper three are X nodes and the lower four are Y

nodes. The whole process consists of two steps: First, the resource

flows from X to Y a b , and then returns to X b c . Different

from the prior network-based resource-allocation dynamics 53 ,

the resource here can only flow from one node set to another with-

out consideration of asymptotical stable flow among one node set.

BIPARTITE NETWORK PROJECTION AND PERSONAL PHYSICAL REVIEW E 76, 046115 2007

046115-3

object-set as O = o

1

, o

2

,...,o

n

and user-set as U

= u

1

, u

2

,...,u

m

. If users are only allowed to collect objects

they do not rate them, the recommendation system can be

fully described by an n m adjacent matrix a

ij

, where a

ij

=1 if u

j

has already collected o

i

and a

ij

=0 otherwise. A

reasonable assumption is that the objects you have collected

are what you like and a recommendation algorithm aims at

predicting your personal opinions to what extent you like or

hate them on those objects you have not yet collected. A

more complicated case is the voting system 54,55 , where

each user can give ratings to objects e.g., in Yahoo Music,

the users can vote each song with five discrete ratings repre-

senting "Never play again," "It is ok," "Like it," "Love it,"

and "Can't get enough", and the recommendation algorithm

concentrates on estimating unknown ratings for objects.

These two problems are closely related, however, in this ar-

ticle, we focus on the former case.

Denote k o

i

=

j=1

m

a

ij

as the degree of object o

i

. The glo-

bal ranking method GRM sorts all the objects in the de-

scending order of degree and recommends those with highest

degrees. Although the lack of personalization leads to an

unsatisfying performance of GRM see numerical compari-

son in the next section, it is widely used since it is simple

and spares computational resources. For example, the well-

known "Yahoo Top 100 MTVs," "Amazon List of Top Sell-

ers," as well as the board of most downloaded articles in

many scientific journals, can be all considered as results of

GRM.

Thus far, the widest applied personal recommendation al-

gorithm is collaborative filtering CF兲关 49,54, based on a

similarity measure between users. Consequently, the predic-

tion for a particular user is made mainly using the similar

users. The similarity between users u

i

and u

j

can be mea-

sured in the Pearson-like form

s

ij

=

l=1

n

a

li

a

lj

mink u

i

, k u

j

兲其

, 9

where k u

i

=

l=1

n

a

li

is the degree of user u

i

. For any user-

object pair u

i

o

j

,ifu

i

has not yet collected o

j

i.e., a

ji

=0 ,

by CF, the predicted score,

v

ij

to what extent u

i

likes o

j

,is

given as

v

ij

=

l=1,l i

m

s

li

a

jl

l=1,l i

m

s

li

. 10

Two factors give rise to a high value of

v

ij

. First, if the

degree of o

j

is larger, it will, generally, have more nonzero

items in the numerator of Eq. 10 . Secondly, if o

j

is fre-

quently collected by users very similar to u

i

, the correspond-

ing items will be significant. The former pays respect to the

global information, and the latter reflects the personalization.

For any user u

i

, all the nonzero

v

ij

with a

ji

=0 are sorted in

descending order, and those objects in the top are recom-

mended.

We propose a recommendation algorithm, which is a di-

rect application of the weighting method for bipartite net-

works presented above. The layout is simple: first compress

the bipartite user-object network by object-projection, the re-

sulting weighted network we label G. Then, for a given user

u

i

, put some resource on those objects already been collected

by u

i

. For simplicity, we set the initial resource located on

each node of G as

f o

j

= a

ji

. 11

That is to say, if the object o

j

has been collected by u

i

, then

its initial resource is unit, otherwise it is zero. Note that, the

initial configuration, which captures personal preferences, is

different for different users. The initial resource can be un-

derstood as giving a unit recommending capacity to each

collected object. According to the weighted resource-

allocation process discussed in the prior section, the final

resource, denoted by the vector f

I

,is f

I

= Wf

J

. Thus compo-

nents of f

are

f

o

j

=

l=1

n

w

jl

f o

l

=

l=1

n

w

jl

a

li

. 12

For any user u

i

, all his uncollected objects o

j

1 j n , a

ji

=0 are sorted in the descending order of f

o

j

, and those

objects with highest value of final resource are recom-

mended. We call this method network-based inference NBI,

since it is based on the weighted network G . Note that, the

calculation of Eq. 12 should be repeated m times, since the

initial configurations are different for different users.

III. NUMERICAL RESULTS

We use a benchmark data-set, namely, MovieLens, to

judge the performance of described algorithms. The Movie-

Lens data is downloaded from the web-site of GroupLens

Research http://www.grouplens.org. The data consists 1682

movies objects and 943 users. Actually, MovieLens is a

rating system, where each user votes movies in five discrete

ratings 1–5. Hence we applied the coarse-graining method

similar to what is used in Ref. 19 : A movie has been col-

lected by a user iff the giving rating is at least 3. The original

data contains 10

5

ratings, 85.25% of which are 3, thus the

user-movie bipartite network after the coarse gaining con-

tains 85 250 edges. To test the recommendation algorithms,

the data set i.e., 85 250 edges is randomly divided into two

parts: The training set contains 90% of the data, and the

remaining 10% of data constitutes the probe. The training set

is treated as known information, while no information in

probe set is allowed to be used for prediction.

All three algorithms GRM, CF, and NBI can provide each

user an ordered queue of all its uncollected movies. For an

arbitrary user u

i

, if the edge u

i

o

j

is in the probe set accord-

ing to the training set, o

j

is an uncollected movie for u

i

,we

measure the position of o

j

in the ordered queue. For ex-

ample, if there are 1500 uncollected movies for u

i

, and o

j

is

the 30th from the top, we say the position of o

j

is the top

30/1500, denoted by r

ij

=0.02. Since the probe entries are

actually collected by users, a good algorithm is expected to

ZHOU et al. PHYSICAL REVIEW E 76, 046115 2007

046115-4

give high recommendations to them, thus leading to small r .

The mean value of the position value, averaged over entries

in the probe, are 0.139, 0.120, and 0.106 by GRM, CF, and

NBI, respectively. Figure 3 reports the distribution of all the

position values, which are ranked from the top position

r 0 to the bottom position r 1 . Clearly, NBI is the

best method and GRM performs worst.

To make this work more relevant to the real-life recom-

mendation systems, we introduce a measure of algorithmic

accuracy that depends on the length of recommendation list.

The recommendation list for a user u

i

, if of length L, contains

L highest recommended movies resulting from the algorithm.

For each incident entry u

i

o

j

in the probe, if o

j

is in u

i

's

recommendation list, we say the entry u

i

o

j

is "hit" by the

algorithm. The ratio of hit entries to the population is called

the "hitting rate." For a given L, the algorithm with a higher

hitting rate is better, and vice versa. If L is larger than the

total number of uncollected movies for a user, the recom-

mendation list is defined as the set of all his uncollected

movies. Clearly, the hitting rate is monotonously increasing

with L, with the upper bound 1 for sufficiently large L .In

Fig. 4, we report the hitting rate as a function of L for dif-

ferent algorithms. In accordance with Fig. 3, the accuracy of

the algorithms is NBI CF GRM. The hitting rates for

some typical lengths of recommendation list are shown in

Table I.

In a word, via the numerical calculation on a benchmark

data set, we have demonstrated that the NBI has remarkably

better performance than GRM and CF, which strongly guar-

antee the validity of the present weighting method.

IV. CONCLUSION AND DISCUSSION

Weighting of edges is the key problem in the construction

of a bipartite network projection. In this article we proposed

a weighting method based on a resource-allocation process.

The present method has two prominent features. First, the

weighted matrix is not symmetrical and the node having

larger degree in the bipartite network generally assigns

smaller weights to its incident edges. Second, the diagonal

element in the weighted matrix is positive, which makes the

weighted one-mode projection more informative.

Furthermore, we proposed a personal recommendation al-

gorithm based on this weighting method, which performs

much better than the most commonly used global ranking

method as well as the collaborative filtering. Especially, this

algorithm is tune-free i.e., does not depend on any control

parameters, which is a big advantage for potential users.

The main goal of this article is to introduce a weighting

method, as well as to provide a bridge from this method to

the recommendation systems. The presented recommenda-

tion algorithm is just a rough framework whose details have

not been exhaustively explored yet. For example, the setting

of the initial configuration may be oversimplified, a more

complicated form, such as f o

j

= a

ji

k

o

j

, may lead to a

better performance than the presented one with

=0. One is

also encouraged to consider the asymptotical dynamics of

the resource-allocation process 53 , which can eventually

lead to some certain iterative recommendation algorithms.

Although such an algorithm require much longer CPU time,

it may give a more accurate prediction than the present

algorithm.

If we denote k

u

and k

o

the average degree of users and

objects in the bipartite network, the computational complex-

ity of CF is O m

2

k

u

+ mn k

o

典兲, where the first term accounts

for the calculation of similarity between users see Eq. 9 兲兴,

and the second term accounts for the calculation of the pre-

dicted score see Eq. 10 兲兴. Substituting the equation n k

o

FIG. 3. Color online The predicted position of each entry in

the probe ranked in the ascending order. The black, red, and blue

curves, from top to bottom, represent the cases of GRM, CF, and

NBI, respectively. The mean values are top 13.9% GRM , top

12.0% CF, and top 10.6% NBI .

FIG. 4. Color online The hitting rate as a function of the length

of recommendation list. The black, red, and blue curves, from bot-

tom to top, represent the cases of GRM, CF, and NBI, respectively.

TABLE I. The hitting rates for some typical lengths of recom-

mendation list.

Length GRM CF NBI

10 10.3% 14.1% 16.2%

20 16.9% 21.6% 24.8%

50 31.1% 37.0% 41.2%

100 45.2% 51.0% 55.9%

BIPARTITE NETWORK PROJECTION AND PERSONAL PHYSICAL REVIEW E 76, 046115 2007

046115-5

= m k

u

, we are left with O m

2

k

u

典兲. The computational com-

plexity for NBI is O m k

u

2

+ mn k

u

典兲 with two terms account-

ing for the calculation of the weighted matrix and the final

resource distribution, respectively. Here k

u

2

is the second

moment of the users' degree distribution in the bipartite net-

work. Clearly, k

u

2

n k

u

, thus the resulting form is

O mn k

u

典兲. Note that the number of users is usually much

larger than the number of objects in many recommendation

systems. For instance, the "EachMovie" dataset provided by

the Compaqcompany contains m = 72 916 users and n

=1628 movies, and the Netflix company provides nearly 20

thousands online movies for a million users. It is also the

case of music-sharing systems and online bookstores, the

number of registered users is more than one magnitude larger

than that of the available objects e.g., music groups, books,

etc.. Therefore, NBI runs much fast than CF. In addition,

NBI requires n

2

memory to store the weighted matrix w

ij

,

while CF requires m

2

memory to store the similarity matrix

s

ij

. Hence, NBI is able to beat CF in all the three criterions

of recommendation algorithm: accuracy, time, and space.

However, in some recommendation systems, as in bookmark

sharing websites, the number of objects e.g., webpages is

much larger than the number of users, thus CF may be more

practicable.

ACKNOWLEDGMENTS

This work is partially supported by SBF Switzerland for

financial support through project No. C05.0148 Physics of

Risk, and the Swiss National Science Foundation Project

No. 205120-113842. T.Z. thanks Sang Hoon Lee for his

valuable suggestions, and is grateful for the support from the

National Natural Science Foundation of China NNSFC

under Grant No. 10635040.

1 L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley,

Proc. Natl. Acad. Sci. U.S.A. 97, 11149 2000.

2 S. H. Strogatz, Nature London 410, 268 2001.

3 R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74 ,472002.

4 S. N. Dorogovtsev and J. F. F. Mendes, Adv. Phys. 51, 1079

2002.

5 M. E. J. Newman, SIAM Rev. 45, 167 2003.

6 S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Huang,

Phys. Rep. 424, 175 2006.

7 L. da F. Costa, F. A. Rodrigues, G. Travieso, P. R. V. Boas,

Adv. Phys. 56, 167 2007.

8 P. Holme, F. Liljeros, C. R. Edling, and B. J. Kim, Phys. Rev.

E 68, 056107 2003.

9 F. Liljeros, C. R. Edling, L. A. N. Amaral, H. E. Stanley, Y.

Aberg, Nature London 411, 907 2001.

10 H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, A.-L. Barabasi,

Nature London 407 , 651 2000.

11 S. Wasserman and K. Faust, Social Network Analysis Cam-

bridge University Press, Cambridge, 1994.

12 J. Scott, Social Network Analysis Sage Publication, London,

2000.

13 M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. 98, 404

2001.

14 M. E. J. Newman, Phys. Rev. E 64, 016131 2001.

15 D. J. Watts and S. H. Strogatz, Nature London 393 , 440

1998.

16 C. R. Myers, Phys. Rev. E 68, 046116 2003.

17 P.-P. Zhang, K. Chen, Y. He, T. Zhou, B.-B. Su, Y.-D. Jin, H.

Chang, Y.-P. Zhou, L.-C. Sun, B.-H. Wang, R.-R. He, Physica

A 360, 599 2006.

18 S. Maslov and Y.-C. Zhang, Phys. Rev. Lett. 87, 248701

2001.

19

M. Blattner, Y.-C. Zhang, and S. Maslov, Physica A 373, 753

2007.

20 R. Lambiotte and M. Ausloos, Phys. Rev. E 72, 066107

2005.

21 P. Cano, O. Celma, M. Koppenberger, and J. M. Buldu, Chaos

16, 013107 2006.

22 C. Cattuto, V. Loreto, and L. Pietronero, Proc. Natl. Acad. Sci.

U.S.A. 104, 1461 2007.

23 G. Linden, B. Smith, and J. York, IEEE Internet Comput. 7 ,76

2003.

24 K. Yammine, M. Razek, E. Aimeur, C. Frasson, Lect. Notes

Comput. Sci. 3220, 720 2004.

25 R. Lambiotte and M. Ausloos, Phys. Rev. E 72, 066117

2005.

26 P. G. Lind, M. C. González, and H. J. Herrmann, Phys. Rev. E

72, 056127 2005.

27 E. Estrada and J. A. Rodríguez-Velázquez, Phys. Rev. E 72,

046105 2005.

28 J. J. Ramasco, S. N. Dorogovtsev, and R. Pastor-Satorras,

Phys. Rev. E 70, 036106 2004.

29 J. Ohkubo, K. Tanaka, and T. Horiguchi, Phys. Rev. E 72,

036120 2005.

30 M. Peltomäki and M. Alava, J. Stat. Mech.: Theory Exp.

2006, P01010.

31 J. W. Grossman and P. D. F. Ion, Congr. Numer. 108, 129

1995.

32 A.-L. Barabási, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, T.

Vicsek, Physica A 311, 590 2002.

33 T. Zhou, B.-H. Wang, Y.-D. Jin, D.-R. He, P.-P. Zhang, Y. He,

B.-B. Su, K. Chen, Z.-Z. Zhang, and J.-G. Liu,Int. J. Mod.

Phys. C 18, 297 2007.

34 J. J. Ramasco and S. A. Morris, Phys. Rev. E 73, 016122

2006.

35 M. Li, J. Wu, D. Wang, T. Zhou, Z. Di, Y. Fan, Physica A 375,

355 2007.

36 M. E. J. Newman, Phys. Rev. E 70, 056131 2004.

37 M. Li, Y. Fan, J. Chen, L. Gao, Z. Di, J. Wu, Physica A 350,

643 2005.

38 M. E. J. Newman, Phys. Rev. E 64, 016132 2001

.

39 M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. 101, 5200

2004.

40 M. Faloutsos, P. Faloutsos, and C. Faloutsos, Comput. Com-

mun. Rev. 29, 251 1999.

41 A. Broder, R. Kumar, F. Moghoul, P. Raghavan, S. Rajago-

ZHOU et al. PHYSICAL REVIEW E 76, 046115 2007

046115-6

palan, R. Stata, A. Tomkins, J. Wiener, Comput. Netw. 33, 309

2000.

42 J. M. Kleinberg, J. ACM 46, 604 1999.

43 B. Alexander, Educ. Res. 41 ,332006.

44 A. Ansari, S. Essegaier, and R. Kohli, J. Mark. Res. 37, 363

2000.

45 Y. P. Ying, F. Feinberg, and M. Wedel, J. Mark. Res. 43, 355

2006.

46 R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, J.

Comput. Syst. Sci. 63 ,422001.

47 N. J. Belkin, Commun. ACM 43 ,582000.

48 M. Montaner, B. López, and J. L. De La Rosa, Artif. Intell.

Rev. 19, 285 2003.

49 J. L. Herlocker, J. A. Konstan, K. Terveen, and J. T. Riedl,

ACM Trans. Inf. Syst. secur. 22 ,5 2004.

50 P. Laureti, L. Moret, Y.-C. Zhang, and Y.-K. Yu, Europhys.

Lett. 75, 1006 2006.

51 Y.-K. Yu, Y.-C. Zhang, P. Laureti, and L. Moret, e-print

arXiv:cond-mat/0603620.

52 F. E. Walter, S. Battiston, and F. Schweitzer, e-print arXiv:nlin/

0611054.

53 Q. Ou, Y. D. Jin, T. Zhou, B. H. Wang, and B. Q. Yin, Phys.

Rev. E 75, 021102 2007.

54 J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R.

Gordon, J. Riedl, Commun. ACM 40 ,771997.

55 K. Glodberg, T. Roeder, D. Gupta, and C. Perkins, Information

Retrieval 4, 133 2001.

BIPARTITE NETWORK PROJECTION AND PERSONAL PHYSICAL REVIEW E 76, 046115 2007

046115-7

... social networks, have an inherent bipartite structure driving the topological structure of the unipartite version [46,85,108,109]. These bipartite structures are captured by bipartite graphs, which are rich data models that provide full representation without information loss for interactions that naturally occur in one mode (compressed datasets as unipartite graphs [128]), or multiple modes (high order interconnections as hyper graphs [5,57,114]). A prevalent use case is where bipartite graphs capture the interactions of users with entities spanning different domains such as social networks (users-hashtags [122]), web-based services (users-websites, multimedia services, and products [58,102,105,112,117]), financial systems (users-donation campaigns [4]), transportation systems (users-registered vehicles [59]), and communication systems (users-phone calls [120]). ...

Bipartite graphs are rich data structures with prevalent applications and identifier structural features. However, less is known about their growth patterns, particularly in streaming settings. Current works study the patterns of static or aggregated temporal graphs optimized for certain down-stream analytics or ignoring multipartite/non-stationary data distributions, emergence patterns of subgraphs, and streaming paradigms. To address these, we perform statistical network analysis over web log streams and identify the governing patterns underlying the bursty emergence of mesoscopic building blocks, 2,2-bicliques known as butterflies, leading to a phenomenon that we call "scale-invariant strength assortativity of streaming butterflies". We provide the graph-theoretic explanation of this phenomenon. We further introduce a set of micro-mechanics in the body of a streaming growth algorithm, sGrow, to pinpoint the generative origins. sGrow supports streaming paradigms, emergence of 4-vertex graphlets, and provides user-specified configurations for the scale, burstiness, level of strength assortativity, probability of out-of-order records, generation time, and time-sensitive connections. Comprehensive Evaluations on pattern reproducing and stress testing validate the effectiveness, efficiency, and robustness of sGrow in realization of the observed patterns independent of initial conditions, scale, temporal characteristics, and model configurations. Theoretical and experimental analysis verify the robust ability of sGrow in generating streaming graphs based on user-specified configurations that affect the scale and burstiness of the stream, level of strength assortativity, probability of-of-order streaming records, generation time, and time-sensitive connections.

... In this study, we focused seven basic network configurations of bipartite networks without network projections, because studies showed that network projections may lose important information of bipartite networks (Robins and Alexander 2004;Zhou et al. 2007). Figure 2 illustrates seven network configurations of bipartite networks in which the blue square and the red circle represent two-node sets. ...

Understanding actor collaboration networks and their evolution is essential to promoting collective action in resilience planning and management of interdependent infrastructure systems. Local interactions and choice homophily are two important network evolution mechanisms. Network motifs encode the information of network formation, configuration, and the local structure. Homophily effects, on the other hand, capture whether the network configurations have significant correlations with node properties. The objective of this paper is to explore the extent to which local interactions and homophily effects influence actor collaboration in resilience planning and management of interdependent infrastructure systems. We mapped bipartite actor collaboration network based on a post-Hurricane Harvey stakeholder survey that revealed actor collaborations for hazard mitigation. We examined seven bipartite network motifs for the mapped collaboration network and compared the mapped network to simulated random models with same degree distributions. Then we examined whether the network configurations had significant statistics for node properties using exponential random graph models. The results provide insights about the two mechanisms—local interactions and homophily effect—influencing the formation of actor collaboration in resilience planning and management of interdependent urban systems. The findings have implications for improving network cohesion and actor collaborations from diverse urban sectors.

... To evaluate the robustness and the effectiveness of the reputation-evaluation methods, we adopt four widely used metrics: Kendall's tau [35], AUC (the area under the ROC curve) [36], recall [37], and ranking score [38]. ...

  • Meng Li
  • Chengyuan Han
  • Yuanxiang Jiang
  • Zengru Di

Characterizing the reputation of an evaluator is particularly significant for consumer to obtain useful information from online rating systems. Furthermore, to overcome the difficulties with spam attacks on the rating system and to get the reliable on reputation of evaluators is an important topic in the research. We have noticed that most of the existing evaluator reputation evaluation methods only rely on the evaluator's rating information and abnormal behavior to establish a reputation system, which miss the systematic aspects of the rating systems including the structure of the evaluator-object bipartite network and the effects of nonlinear effects. This study we propose an improved reputation evaluation method by combining the structure of the evaluator-object bipartite network with rating information and introducing penalty and reward factors. This novel method has been empirically analyzed on a large-scale artificial data set and two real data sets. The results show that the proposed method is more accurate and robust in the presence of spam attacks. This fresh idea contributes a new way for building reputation evaluation models in sparse bipartite rating network.

  • Wei-ping Wu
  • Ke-Xing Wu
  • Wei-kang Zeng
  • Peng-Cheng Yang

The reverse distribution of renewable energy resources and load centers makes exploring the optimal transmission mode of long-distance and large-scale renewable hydrogen the key to solving the bottleneck of renewable hydrogen development. This study incorporates hydrogen pipeline (HGP), natural gas pipeline (NGP), and Ultra High Voltage (UHV) into an optimal planning model framework and analyzes the optimal transmission mode, quantity, network, and cost of large-scale renewable hydrogen in China. Constructing a sensitivity analysis framework, this study also investigates the optimal transmission mode changes under different scenarios. The results show that the optimal mode of large-scale renewable hydrogen transmission in the province is NGP, and 5.4% of supply level is the critical point to export renewable hydrogen inter-provincially. It switches to the combination of NGP and HGP when the unit transmission costs of these elements decrease to a certain proportion simultaneously or switches to HGP when the unit transmission cost of HGP decreases more than that of NGP. The complementary transmission mode of NGP and UHV is the optimal mode for inter-provincial transmission, and the HGP can be put into use for inter-provincial transmission only when the unit transmission cost of which is reduced to less than 25%. Jilin is the key node in the NGP network, and Tibet and Gansu are the key nodes in UHV network, and the participation or absence of which will have significant impact on the renewable hydrogen transmission system. Only minor adjustments to the transmission technical parameters of NGP or HGP can promote the qualitative overflight of the optimal transmission volume of them so as to achieve the target optimization at the minimum cost.

  • Wei Wang
  • Yongqing Wang
  • Yu Zhang
  • Xianfang Wang

Identification of drug-target interactions (DTIs) has great practical importance in the drug discovery process for known diseases. However, only a small proportion of DTIs in these databases has been verified experimentally, and the computational methods for predicting the interactions remain challenging. As a result, some effective computational models have become increasingly popular for predicting DTIs. In this work, the authors predict potential DTIs from the local structure of drug-target associations' network, which is different from the traditional global network similarity methods based on structure and ligand. A novel method called PPDTS is proposed to predict DTIs. First, according to the DTIs' network local structure, the known DTIs are converted into a binary network. Second, the Resource Allocation algorithm is used to obtain a drug-drug similarity network and a target-target similarity network. Third, a Collaborative Filtering algorithm is used with the known drug-target topology information to obtain similarity scores. Fourth, the linear combination of drug-target similarity model and the target-drug similarity model are innovatively proposed to obtain the final prediction results. Finally, the experimental performance of PPDTS has proved to be higher than that of the previously mentioned four popular network-based similarity methods, which is validated in different experimental datasets. Some of the predicted results can be supported in UniProt and DrugBank databases.

This article studies the contents of the Architectural Biennial of Quito's (BAQ) archive from its first edition in 1978 to its eighth in 1992, using technological tools and producing bipartite networks that make visible the inter-and intrarelationships of the archive's data through quantitative and qualitative analysis. By using these methods and means, the results firstly show how the BAQ articulated discussions of modern architecture in Ecuador, and secondly, the extent of local and international exchanges implicated in the professional scope throughout the BAQ. The study of this archive allows one to understand the most important relationships in the architectural productions and communications at the hands of BAQ as a platform for circulating ideas.

  • Divya Lakshmi Venkatraman
  • Deepshika Pulimamidi
  • Harsh G. Shukla
  • Shubhada R. Hegde

An increased surge of -omics data for the diseases such as cancer allows for deriving insights into the affiliated protein interactions. We used bipartite network principles to build protein functional associations of the differentially regulated genes in 18 cancer types. This approach allowed us to combine expression data to functional associations in many cancers simultaneously. Further, graph centrality measures suggested the importance of upregulated genes such as BIRC5, UBE2C, BUB1B, KIF20A and PTH1R in cancer. Pathway analysis of the high centrality network nodes suggested the importance of the upregulation of cell cycle and replication associated proteins in cancer. Some of the downregulated high centrality proteins include actins, myosins and ATPase subunits. Among the transcription factors, mini-chromosome maintenance proteins (MCMs) and E2F family proteins appeared prominently in regulating many differentially regulated genes. The projected unipartite networks of the up and downregulated genes were comprised of 37,411 and 41,756 interactions, respectively. The conclusions obtained by collating these interactions revealed pan-cancer as well as subtype specific protein complexes and clusters. Therefore, we demonstrate that incorporating expression data from multiple cancers into bipartite graphs validates existing cancer associated mechanisms as well as directs to novel interactions and pathways.

  • Liang-Chao Jiang
  • Run-Ran Liu Run-Ran Liu
  • Chun-Xiao Jia

Personalized recommender system is a powerful method to solve the problem of information overload, which has been widely applied in a variety of scenarios, such as e-commerce, video platforms and social networks, to help users find relevant items or friends of interest. Collaborative filtering is the most successful and widely used algorithm in the recommender systems as its powerful capability of generating recommendations by sharing collective experiences of users. In recent years, the use of mobile devices and the rapid development of internet infrastructures provide the possibility to analyze regional features of items based on user locations. Here we improve the performance of collaborative filtering by using user-location distribution to uncover the potential similarities between items. We find that the similarity of user-location distribution is one efficient measure for the item–item similarities in the framework of collaborative filtering to generate personalized recommendation for users. Furthermore, we have also mixed similarity measures of user-location distribution and the traditional method based on the number of common users linearly to optimize the performance of collaborative filtering. Based on the Movielens data set, we show that the performance of our methods could be improved in terms of the metrics of accuracy and diversity simultaneously.

Mutualistic networks are used to study the structure and processes inherent to mutualistic relationships. In this paper, we introduce a random matrix ensemble (RME) representing the adjacency matrices of mutualistic networks composed by two vertex sets of sizes n and m−n. Our RME depends on three parameters: the network size n, the size of the smaller set m, and the connectivity between the two sets α, where α is the ratio of current adjacent pairs over the total number of possible adjacent pairs between the sets. We focus on the spectral, eigenvector and topological properties of the RME by computing, respectively, the ratio of consecutive eigenvalue spacings r, the Shannon entropy of the eigenvectors S, and the Randić index R. First, within a random matrix theory approach (i.e. a statistical approach), we identify a parameter ξ≡ξ(n,m,α) that scales the average normalized measures <X¯> (with X representing r, S and R). Specifically, we show that (i) ξ∝αn with a weak dependence on m, and (ii) for ξ<1/10 most vertices in the mutualistic network are isolated, while for ξ>10 the network acquires the properties of a complete network, i.e., the transition from isolated vertices to a complete-like behavior occurs in the interval 1/10<ξ<10. Then, we demonstrate that our statistical approach predicts reasonably well the properties of real-world mutualistic networks; that is, the universal curves <X¯> vs. ξ show good correspondence with the properties of real-world networks.

"Product recommendation systems" are backbones of the Internet economy, leveraging customers' prior product ratings to generate subsequent suggestions. A key feature of recommendation data is that few customers rate more than a handful of items. Extant models take missing recommendation rating data to be missing completely at random, implicitly presuming that they lack meaningful patterns or utility in improving ratings accuracy. For the widely studied EachMovie data, the authors find that missing data are strongly nonignorable. Recommendation quality is improved substantially by jointly modeling "selection" and "ratings," both whether and how an item is rated. Accounting for missing ratings and various sources of heterogeneity offers a rich portrait of which items are rated well, which are rated at all, and how these processes are intertwined, while reducing holdout error by 10% or more. The authors discuss ways to implement the proposed framework within existing recommendation systems and its implications for marketers.

  • Asim Ansari
  • Skander Essegaier
  • Rajeev Kohli Rajeev Kohli

Several online firms, including Yahoo!, Amazon.com, and Movie Critic, recommend documents and products to consumers. Typically, the recommendations are based on content and/or collaborative filtering methods. The authors examine the merits of these methods, suggest that preference models used in marketing offer good alternatives, and describe a Bayesian preference model that allows statistical integration of five types of information useful for making recommendations: a person's expressed preferences, preferences of other consumers, expert evaluations, item characteristics, and individual characteristics. The proposed method accounts for not only preference heterogeneity across users but also unobserved product heterogeneity by introducing the interaction of unobserved product attributes with customer characteristics. The authors describe estimation by means of Markov chain Monte Carlo methods and use the model with a large data set to recommend movies either when collaborative filtering methods are viable alternatives or when no recommendations can be made by these methods.

We review the recent rapid progress in the statistical physics of evolving networks. Interest has focused mainly on the structural properties of complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of this kind have recently been created, which opens a wide field for the study of their topology, evolution, and the complex processes which occur in them. Such networks possess a rich set of scaling properties. A number of them are scale-free and show striking resilience against random breakdowns. In spite of the large sizes of these networks, the distances between most of their vertices are short - a feature known as the "small-world" effect. We discuss how growing networks self-organize into scale-free structures, and investigate the role of the mechanism of preferential linking. We consider the topological and structural properties of evolving networks, and percolation and disease spread on these networks. We present a number of models demonstrating the main features of evolving networks and discuss current approaches for their simulation and analytical study. Applications of the general results to particular networks in nature are discussed. We demonstrate the generic connections of the network growth processes with the general problems of non-equilibrium physics, econophysics, evolutionary biology, and so on.

Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements. Important related issues covered in this work comprise the representation of the evolution of complex networks in terms of trajectories in several measurement spaces, the analysis of the correlations between some of the most traditional measurements, perturbation analysis, as well as the use of multivariate statistics for feature selection and network classification. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the proper application and interpretation of measurements.

  • John Scott John Scott

This paper reports on the development of social network analysis, tracing its origins in classical sociology and its more recent formulation in social scientific and mathematical work. It is argued that the concept of social network provides a powerful model for social structure, and that a number of important formal methods of social network analysis can be discerned. Social network analysis has been used in studies of kinship structure, social mobility, science citations, contacts among members of deviant groups, corporate power, international trade exploitation, class structure, and many other areas. A review of the formal models proposed in graph theory, multidimensional scaling, and algebraic topology is followed by extended illustrations of social network analysis in the study of community structure and interlocking directorships.

  • Paolo Laureti
  • Lionel Moret
  • Y.-C. Zhang
  • YK Yu

The European Physical Society (EPS) is a not for profit association whose members include 41 National Physical Societies in Europe, individuals from all fields of physics, and European research institutions. As a learned society, the EPS engages in activities that strengthen ties among the physicists in Europe. As a federation of National Physical Societies, the EPS studies issues of concern to all European countries relating to physics research, science policy and education. Visit www.eps.org

In this paper, we propose an alternative model for collaboration networks based on nonlinear preferential attachment. Depending on a single free parameter "preferential exponent", this model interpolates between networks with a scale-free and an exponential degree distribution. The degree distribution in the present networks can be roughly classified into four patterns, all of which are observed in empirical data. And this model exhibits small-world effect, which means the corresponding networks are of very short average distance and highly large clustering coefficient. More interesting, we find a peak distribution of act-size from empirical data which has not been emphasized before. Our model can produce the peak act-size distribution naturally that agrees with the empirical data well.

The study of the Web as a graph is not only fascinating in its own right, but also yields valuable insight into Web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the Web graph using two AltaVista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the Web is considerably more intricate than suggested by earlier experiments on a smaller scale.