# Difference between revisions of "R console (tutorial)"

(→Loading networks from R into visone) |
(→Getting basic statistics about an igraph object) |
||

Line 67: | Line 67: | ||

E(egonet) | E(egonet) | ||

and offers access to edge attributes similar as for vertices. | and offers access to edge attributes similar as for vertices. | ||

+ | |||

+ | === More sophisticated indexing of vertex and edge iterators === | ||

+ | |||

+ | Vertex iterators and edge iterators can be indexed by more complex conditions. Actually, any logical vector whose length equals the number of vertices (respectively edges) can be used as an argument in the square brackets following vertex iterators (respectively edge iterators). The following lines illustrate such indexing tasks. To select all actors whose origin is in the US and save this iterator in a variable ''actors.from.usa'' type | ||

+ | actors.from.usa <- V(egonet)[Afrm == "United States"] | ||

+ | All edges connecting two actors from the US are obtained by | ||

+ | edges.within.usa <- E(egonet)[actors.from.usa %--% actors.from.usa] | ||

+ | The command ''%--%'' is a special command used in edge iterators between two vertex iterators; it selects all edges connecting vertices from the two specified subsets (which might be identical, as in the example above). | ||

+ | Edges with at least one actor from the US are selected by | ||

+ | edges.incident.usa <- E(egonet)[adj(actors.from.usa)] | ||

+ | |||

+ | To select all edges within any of the classes defined by Afrm we first construct a logical vector for edges that is true if and only if the ''Afrm'' attribute of the two connected vertices is identical and then use it as an argument in ''E(egonet)[...]''. Therefore type | ||

+ | el <- get.edgelist(egonet) +1 | ||

+ | within.class.edges <- V(egonet)[el[,1]]$Afrm == V(egonet)[el[,2]]$Afrm | ||

+ | E(egonet)[within.class.edges] | ||

+ | The variable ''el'' is just a matrix with two columns containing the vertex ids of adjacent vertices. The ''+1'' in the first line is necessary because ids start with zero. | ||

== Analyzing distributions of centralities in igraph == | == Analyzing distributions of centralities in igraph == |

## Revision as of 12:07, 20 February 2011

This trail illustrates how to send network data from visone to R and back. The R project for statistical computing offers a rich set of methods for data analysis and modeling which becomes accessible from visone through the R console. We assume that you have installed the R connection as it is explained in the installation trail. This trail assumes that you have basic understanding about how to work with visone as it is, for instance, explained in the trail on visualization and analysis. You do *not* need to have any previous knowledge about R to follow this trail; nevertheless, to exploit the full potential offered by R you could consult documentation and tutorials linked from the R-project page.

To follow the steps illustrated in this trail you should download the network file *Egonet.graphml* which is linked from and explained in the page Egoredes_(data). Further you should remove all ties that are not rated as *very likely* in the same manner as it is explained in the last section of the visualization and analysis trail.

## Contents

## Sending networks from visone to R

To send the network from visone to R open the R console tab, choose a name in the textfield **r name** (you might just accept visone's suggestion for this name, which should be *egonet*), and click on the **send** button. After clicking on **send** visone starts the Rserve connection and opens the R-console. If this does not work you should check the settings of the R connection options accessible via the **file, options** menu. If it works you should get a message like

ready to serve visone: sendActiveNet egonet done visone: lsIGraph "egonet"

in the message field of the R console.

You can list all variables that are in the R workspace by typing

ls()

in the input field at the bottom of the R console and pressing the Enter-Key (currently there is one object called *egonet*).
As we'll see soon, *egonet* is an object of class *igraph* which is an R package obtainable from a CRAN Website (this site also gives you access to R tutorials and documentation). The igraph package is documented in more detail on http://igraph.sourceforge.net/.

## Getting basic statistics about an igraph object

The class of the object *egonet* in the R workshop is printed when executing the command

class(egonet)

As we can see the class of *egonet* is *igraph*. The igraph documentation linked above gives a complete list of all methods available for this class. In the following we describe how to inspect what is encoded in the given object and how to get simple summary statistics.
Executing the command

summary(egonet)

outputs basic information such as the number of vertices and edges, the names of vertex and edge attributes, as well as a list of all edges.

To see the values of the vertex or edge attributes one needs to understand the concept of *vertex iterators* and *edge iterators* in igraph. The vertex iterator of graph *egonet* is returned by typing the command

V(egonet)

When executing this you see just the list of vertex *name*s which are here the numbers from 1 to 45. To obtain the values of an attribute (e.g., *Afrm*) type

V(egonet)$Afrm

which returns the vector of countries of origin of the various actors.

Useful summary statistics include information about how many actors originate from the various countries. However, typing the command

summary(V(egonet)$Afrm)

just outputs information about the class and size of the list of countries of origin:

Length Class Mode 45 character character

which is not very informative. To obtain the list of unique values for the *Afrm* attribute, you can type

unique(V(egonet)$Afrm)

which returns the names of five different countries. To count the number of actors in each of the countries, it is most convenient to convert the vector of character strings into a *factor* and save this factor in a new variable (e.g., called *from*) by typing the command

from <- as.factor(V(egonet)$Afrm)

Finally the command

summary(from)

returns the list of unique values along with the number of actors in each of the classes. In our example this is

Colombia Dominican Republic Puerto Rico Spain United States 2 14 5 1 23

Vertex iterators can be restricted to subsets by specifying a logical vector (or a command that produces one) in square brackets after the iterator. For instance, the command

V(egonet)[2:5]

returns the values 3 to 6. This seeming contradiction is explained by the fact that counting of indices in vectors starts at zero; thus, the name of the vertex at position 0 is 1, the name of the vertex at position 2 is 3, and so on. The result of such a restriction operation on an vertex iterator is itself a vertex iterator and, thus, provides access to vertex attributes. For instance, typing

V(egonet)[2:5]$Afrm

returns the countries of origin of actors indexed by 2 to 5 (i.e., named 3 to 6). The command

V(egonet)[Acit == "new york"]$Afrm

gives you the countries of origin of all actors whose attribute *Acit* (encoding the city of residence) equals *new york*, and so on. Vertex iterators can be restricted by specifying more sophisticated conditions; see the igraph documentation for details.

An edge iterator is returned via the command

E(egonet)

and offers access to edge attributes similar as for vertices.

### More sophisticated indexing of vertex and edge iterators

Vertex iterators and edge iterators can be indexed by more complex conditions. Actually, any logical vector whose length equals the number of vertices (respectively edges) can be used as an argument in the square brackets following vertex iterators (respectively edge iterators). The following lines illustrate such indexing tasks. To select all actors whose origin is in the US and save this iterator in a variable *actors.from.usa* type

actors.from.usa <- V(egonet)[Afrm == "United States"]

All edges connecting two actors from the US are obtained by

edges.within.usa <- E(egonet)[actors.from.usa %--% actors.from.usa]

The command *%--%* is a special command used in edge iterators between two vertex iterators; it selects all edges connecting vertices from the two specified subsets (which might be identical, as in the example above).
Edges with at least one actor from the US are selected by

edges.incident.usa <- E(egonet)[adj(actors.from.usa)]

To select all edges within any of the classes defined by Afrm we first construct a logical vector for edges that is true if and only if the *Afrm* attribute of the two connected vertices is identical and then use it as an argument in *E(egonet)[...]*. Therefore type

el <- get.edgelist(egonet) +1 within.class.edges <- V(egonet)[el[,1]]$Afrm == V(egonet)[el[,2]]$Afrm E(egonet)[within.class.edges]

The variable *el* is just a matrix with two columns containing the vertex ids of adjacent vertices. The *+1* in the first line is necessary because ids start with zero.

## Analyzing distributions of centralities in igraph

The igraph package offers methods to compute various established centrality measures (refer to the igraph documentation for a complete list of available methods). While these could also be directly computed in visone without the detour via the R console, R directly offers statistical descriptions and analysis of the computed values. This is demonstrated in the following. The vertex degrees are returned by the command

degree(egonet)

Let's save this vector in a variable *d* by typing

d <- degree(egonet)

Mean, standard deviation, and summary statistics (including min, max, quartiles, and median) are computed by

mean(d) sd(d) summary(d)

To display these (or other) statistics separately for each class of actors defined by the country of origin (or any other attribute), execute the command

tapply(d, from, summary)

The three arguments of *tapply* have the following meaning: *d* is the vector of values to which the function should be applied, *from* is the factor whose unique values determine the different classes, and *summary* is the function to be computed (instead of *summary*, you could also type *mean*, *sd*, and so on).

## Loading networks from R into visone

As networks can be sent from visone to R, you can also load objects of class *igraph* into visone. Loading the current R object *egonet* would be of no use since this network is already in visone and has not been modified. Loading networks from R into visone is useful when some values that have been computed in R and attached to the igraph object should be accessible as vertex or edge attributes in visone. This offers numerous possibilities to transform attributes, as it will be demonstrated in the following.

First let's copy *egonet* into a new igraph object named *g* by executing

g <- egonet

(This rather serves to demonstrate how new variables for igraph objects can be loaded into visone; we could also have attached the new data directly to *egonet*.) To attach a new attribute named *Degree* that encodes the previously computed node degrees type

V(g)$Degree <- d

Degrees could have been computed in visone as well. However, R offers methods to transform such values that are not implemented directly in visone. For instance, the variable *DegreeCentered* computed and attached via

V(g)$DegreeCentered <- d-mean(d)

encodes the differences between the individual degrees and the mean (so that nodes with relatively small degrees get negative values and nodes with relatively high degrees get positive values). Likewise

V(g)$LogDegree <- log(d)

attaches the logarithmized degrees to the graph (this transformation is quite useful in networks with skewed degree distributions, e.g., preferential attachment graphs). Note that while visone directly offers some possibilities to transform attributes, logarithmic transformations are not implemented; in contrast, R as a programming language, imposes no such restrictions.

Finally, loading the network along with all old and new attributes into visone can be done via the R console tab as explained in the following. Click once on the **refresh** button to show all R variables of class igraph (note that this shows the variable *g* but not the vector of degrees *d*). Selecting *g* and pushing the **load** button opens a new network tab with a network called *g*. You can inspect the newly attached attributes via the attribute manager.