1 `bnlearn` package

The bnlearn package is one of the most complete and popular package for Bayesian Networks available to the date in R.

First we should install and load the package.

Please make sure that you either download or compile at least bnelarn 4.2.x as previous versions might have some bugs. check the version of bnlearn pakcage by typing packageVersion("bnlearn") in R console.

You should check out the documentation that is available online [Link]. It contains nice examples and additional use cases beyond this tutorial. If you are interested in this topic, there are a couple of related books written by the author of the package, you can find the references in the mentioned website.

1.1 Install R package `bnlearn`

Method 1 : Graphic User Interface

Open R
Open Package Installer
- Menu / Packages & Data / Package Installer
install package bnlearn
- Click on “Get List”
  - Choose a CRAN mirror (you are free to choose any cran mirror)
- search bnlearn
- select bnlearn
  - the selected line turns to blue
  - the “install Selected” buttom turns to blue
- click on “install Selected” buttom
close the “Package Installer” window
- load the package bnlearn
- copy and past following code (click on the code button to show the hidden code) to console prompt: after >, then press “enter”.
  - If you see Update all/some/none? [a/s/n]:, reply n then press enter
  - If you see [1] "Congratulations, the package bnlearn is correctly installed.", the package bnlearn is correctly installed.
  - If not, please contact us !!

if(require(bnlearn, quietly = TRUE) && require(Rgraphviz, quietly = TRUE)){
    print("Congratulations, the package bnlearn is correctly installed.")
}else{
  source('http://bioconductor.org/biocLite.R')
  biocLite('Rgraphviz')
  biocLite("RBGL")
  if(!require(bnlearn)) install.packages("bnlearn")
  if(require(bnlearn) && require(Rgraphviz)){
    print("Congratulations, the package bnlearn is correctly installed.")
  }else print("Contact us !!!")
}

#> [1] "Congratulations, the package bnlearn is correctly installed."

Method 2 : Console Prompt `>`

copy and past following code (click on the code button to show the code) to console prompt: after >, then press “enter”
- if you see [1] "Congratulations, the package bnlearn is correctly installed." then the package bnlearn is correctly installed.
- if not, please contact us !!

if(require(bnlearn, quietly = TRUE) && require(Rgraphviz, quietly = TRUE)){
    print("Congratulations, the package bnlearn is correctly installed.")
}else{
  source('http://bioconductor.org/biocLite.R')
  biocLite('Rgraphviz')
  biocLite("RBGL")
  if(!require(bnlearn)) install.packages("bnlearn")
  if(require(bnlearn) && require(Rgraphviz)){
    print("Congratulations, the package bnlearn is correctly installed.")
  }else print("Contact us !!!")
}

2 Data

The hands-on workshop is based on the data set asia, a small synthetic data set from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer or bronchitis) and visits to Asia.

Source

Lauritzen S, Spiegelhalter D (1988). “Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion)”. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 50(2), 157-224.

Load

Load the data set from the bnlearn package
- show the R commands then type them in R console prompt: after >, then press “enter” (we strongly advise you to type the code than to copy paste them)

# list objects in the working environment
ls()

#> character(0)

# Import the data set called asia
data(asia)
# list the variables (column names) in asia
# NOT RUN : colnames(asia)
# Delete the column called "E"
asia$E = NULL
# Verify if the data set "asia" appears in the working environment ?
ls()

#> [1] "asia"

Format

Each variable is in a column.
Each observation is a row.
Each value is a cell.

The asia data set contains the following variables:

D (dyspnoea),
- a two-level factor with levels yes and no.
T (tuberculosis),
- a two-level factor with levels yes and no.
L (lung cancer),
- a two-level factor with levels yes and no.
B (bronchitis),
- a two-level factor with levels yes and no.
A (visit to Asia),
- a two-level factor with levels yes and no.
S (smoking),
- a two-level factor with levels yes and no.
X (chest X-ray),
- a two-level factor with levels yes and no.

Usage

Let’s investigate the characteristics of asia using some basic R operations :
- Print the data set on screen;
- Verify the format of data : variable names and values (levels) ;
- What is the sample size := how many individuals are there?
- Summarise the data set with a build-in function summary().

# print data set 
# NOT RUN: asia
# You may want to run asia. What it gives?

# The first five lines should be enough to get a very first idea of the data set asia
# Verify the format of data : variable names and types.
head(asia, n = 5)

#>    A   S   T  L   B   X   D
#> 1 no yes  no no yes  no yes
#> 2 no yes  no no  no  no  no
#> 3 no  no yes no  no yes yes
#> 4 no  no  no no yes  no yes
#> 5 no  no  no no  no  no yes

# What is the sample size := how many individuals are there?
nrow(asia)

#> [1] 5000

# summarise the data set with a build-in function summary()
summary(asia)

#>    A          S          T          L          B          X       
#>  no :4958   no :2485   no :4956   no :4670   no :2451   no :4431  
#>  yes:  42   yes:2515   yes:  44   yes: 330   yes:2549   yes: 569  
#>    D       
#>  no :2650  
#>  yes:2350

3 Exercises : Structure Learning

3.1 Expert-Driven Approach

There are different ways to manually initialise and modify the graph of a Bayesian Network. We illustrate in this hands-on session one of the options.

to create a formula for a given set of variables. Each node is specified by a pair of brackets [<var_name>]. If the node has a parent set, we denote it by | and specify the list of parents separated by colons :. We can compute the formula into a bn object with model2network. (see Example below)

Example

R syntax for manually defining a BN structure using model2network() function

# manually define a BN structure using model2network() function
dag_example = model2network("[Earthquake][Alert|Earthquake][Anxiety|Alert:Earthquake]")
# Plot the BN structure with graphviz.plot()
graphviz.plot(dag_example, sub = "Example DAG")

Remark

# the order doesn't matter : the three definitions below are equivalent
model2network("[Earthquake][Alert|Earthquake][Anxiety|Alert:Earthquake]")
model2network("[Earthquake][Alert|Earthquake][Anxiety|Earthquake:Alert]")
model2network("[Earthquake][Anxiety|Alert:Earthquake][Alert|Earthquake]")

Exercise 3.1

Now, it’s your turn :

Suppose we use the following variable names:

A : Visit to Asia
B : Bronchitis
D : Dyspnea
L : Lung Cancer
T : Tuberculosis
S : Smoking History
X : Chest X-ray

Questions

Define a BN structure (a DAG) representing the causal relationships among these variables based on your knowledge :

What are the causal relationships among the variables in asia data set ?
Use a BN structure (a DAG) to represent your causal relationships.
Encode in R your BN structure with model2network() and assign the output of model2network(), your BN structure, to dag_mydag (see the example above)
Plot your BN structure with graphviz.plot().

Solution

Following code shows one possible solution. Other BN structures are also acceptable.

# Create and plot the network structure.
dag_mydag = model2network("[A][S][T|S][L][B|S][D|B][X|L:T]")
# Plot
graphviz.plot(dag_mydag)

Exercise 3.2

Source This exercise is from this [LINK]

Consider the following piece of medical knowledge taken from [Lauritzen and Spiegelhalter, 1988]:

Tuberculosis and lung cancer can each cause shortness of breath (dyspnea) and a positive chest X-ray.
Bronchitis is another cause of dyspnea.
A recent visit to Asia can increase the probability of tuberculosis.
Smoking can cause both lung cancer and bronchitis.

Questions

Create again a DAG representing the causal relationships among these variables based on upper medical knowledge

Encode the BN structure with model2network().
Plot this BN structure with graphviz.plot().
Plot again your BN structure and compare.

Solution

# creat Lauritzen's BN structure
dag_Laur = model2network("[A][S][T|A][L|S][B|S][D|B:T:L][X|T:L]")
dag_Laur

#> 
#>   Random/Generated Bayesian network
#> 
#>   model:
#>    [A][S][B|S][L|S][T|A][D|B:L:T][X|L:T] 
#>   nodes:                                 7 
#>   arcs:                                  8 
#>     undirected arcs:                     0 
#>     directed arcs:                       8 
#>   average markov blanket size:           3.14 
#>   average neighbourhood size:            2.29 
#>   average branching factor:              1.14 
#> 
#>   generation algorithm:                  Empty

Remark the generation algorithm is Empty !!

par(mfrow = c(1,2))
graphviz.plot(dag_Laur, sub = "Lauritzen's DAG")
graphviz.plot(dag_mydag, sub = "My DAG")

3.2 Data-Driven Approach

Exercise 3.3 : Score-based methods

Questions

Learn a BN structure base on asia data set. hc(asia) can allow you to do this (the Hill-Climbing algorithm, which is a Score-based method, is applied)
Plot the BN structure (output of hc(asia))
What do you observe ? Is this a DAG ?

Solution

dag_learn_hc = hc(asia)
dag_learn_hc

#> 
#>   Bayesian network learned via Score-based methods
#> 
#>   model:
#>    [A][S][T][L|S][B|S][X|T:L][D|T:L:B] 
#>   nodes:                                 7 
#>   arcs:                                  7 
#>     undirected arcs:                     0 
#>     directed arcs:                       7 
#>   average markov blanket size:           2.86 
#>   average neighbourhood size:            2.00 
#>   average branching factor:              1.00 
#> 
#>   learning algorithm:                    Hill-Climbing 
#>   score:                                 BIC (disc.) 
#>   penalization coefficient:              4.258597 
#>   tests used in the learning procedure:  63 
#>   optimized:                             TRUE

Remark the generation algorithm is NOT empty this time !!

graphviz.plot(dag_learn_hc)

3.3 Comparing Bayesian network structures

Qualitative methods

- `compare()` function

compare() function compares two Bayesian network structures by listing three elements (output of compare()) that are the counts of:

the true positive (tp) arcs, which appear both in target and in current;
the false positive (fp) arcs, which appear in current but not in target;
the false negative (fn) arcs, which appear in target but not in current.

compare(target = dag_Laur, current = dag_mydag, arcs = TRUE)

#> $tp
#>      from to 
#> [1,] "S"  "B"
#> [2,] "B"  "D"
#> [3,] "T"  "X"
#> [4,] "L"  "X"
#> 
#> $fp
#>      from to 
#> [1,] "S"  "T"
#> 
#> $fn
#>      from to 
#> [1,] "A"  "T"
#> [2,] "S"  "L"
#> [3,] "T"  "D"
#> [4,] "L"  "D"

- `graphviz.compare()` function

Another approach to compare network structures is to plot them side by side and highlight differences with different colours. graphviz.compare() does that using Rgraphviz package while taking care that the nodes have the same position for all networks, to make it easier to spot which arcs are different. As we can see below, and unlike any of the functions we covered above, graphviz.compare() can take more than two network structures as arguments.

par(mfrow = c(1, 3))
graphviz.plot(dag_Laur, sub = "dag_Laur")
graphviz.plot(dag_mydag, sub = "dag_mydag")
graphviz.plot(dag_learn_hc, sub = "dag_learn_hc")

par(mfrow = c(1, 3))
graphviz.compare(dag_Laur, dag_mydag, dag_learn_hc)

Quantitative method

- `score()` function

We can compute the network score of a particular graph for a particular data set with the score() function (manual).

Remark In case the score() function is defined in different ways in other r packages, you should specify from which package we what to call this function. That is, if there is an error when you call score() function, try to replace score() by bnlearn::score() (may solve the problem) so that R knows you want to use the score() function from the bnlearn package.

Remarks:

The number of variables in the network and in the data must be the same, although the order is not important.
The names of the variables must match as well.

score(x = dag_mydag,     data = asia)
score(x = dag_Laur,      data = asia)
score(x = dag_learn_hc,  data = asia)

# if it does not work, try bnlearn::score()

# bnlearn::score(x = dag_mydag,      data = asia)
# bnlearn::score(x = dag_Laur,       data = asia)
# bnlearn::score(x = dag_learn_hc,   data = asia)

Remark: The scores of dag_Laur and dag_learn_hc are almost the same.

4 Exercises: Parameter Learning

In general, there are three ways of creating a bn.fit object representing a Bayesian network:

a data-driven approach, learning it from a data set using bn.fit() and a network structure (in a bn object) as illustrated here;
an expert-driven approach, in which both the network structure and the parameters are specified by the user;
a hybrid approach combining the above.

We will talk about the first and the third.

4.1 Data-driven approach

Once the structure of a DAG has been determined, the parameters can be determined as well. Two most common approaches are

maximum likelihood estimation (MLE) and
Bayesian estimation.

Parameter estimations are only based on the subset of data spanning the considered variable and its parents.

When using bnlearn, parameter learning is performed by the bn.fit function, which takes the network structure and the data as arguments.

Exercise (Demonstration) 4.1

Questions

Use asia data set to learn three sets of parameters based on: (bn.fit(x = one_of_BN_Structures_of_your_choice, data = asia, method = "mle"))
- your BN structure
- Lauritzen’s BN structure
- the data-driven BN structure (output of Hill-Climbing algorithm)
Print the output of bn.fit() function and find out the conditional probability table (CPT) of Lung Cancer with respect to (given) Smoking History : $\Pr(L\,|\,S) = ?$
Calculate how much smoking causes lung cancer :

\[\frac{\Pr(L = yes \,|\,S = \text{yes})}{\Pr(L = yes \,|\,S = \text{no})} = ?\]

Remark As you are not given any example showing how to use bn.fit, type simply ?bn.fit and try to figure it out by yourself. Otherwise, you are free to check the solution below.

Solution

learn three sets of parameters

# By default bn.fit use mle mthod :  
# bn.fit(x, data) == bn.fit(x, data, method = "mle")
bn_mydag          = bn.fit(x = dag_mydag,     data = asia, method = "mle")
bn_dag_Laur       = bn.fit(x = dag_Laur,      data = asia, method = "mle")
bn_dag_learn_hc   = bn.fit(x = dag_learn_hc,  data = asia, method = "mle")

# Optional
# bn_mydag_bayes          = bn.fit(x = dag_mydag,     data = asia, method = "bayes")
# bn_dag_Laur_bayes       = bn.fit(x = dag_Laur,      data = asia, method = "bayes")
# bn_dag_learn_hc_bayes   = bn.fit(x = dag_learn_hc,  data = asia, method = "bayes")

Print the output of bn.fit() function and find out the conditional probability table (CPT) of Lung Cancer with respect to (given) Smoking History : $\Pr(L\,|\,S) = ?$

# You can print CPTs attached to all nodes in a given BN structure (for example bn_dag_Laur) using : `bn_dag_Laur`
# But the list of CPTs might be too long

# print just the CPT of Lung Cancer with respect to Smoking History
coef(bn_dag_Laur$L) # or simply run : bn_dag_Laur$L

#>      S
#> L             no        yes
#>   no  0.98631791 0.88230616
#>   yes 0.01368209 0.11769384

# you can also represent a CPT with barchart.
bn.fit.barchart(bn_dag_Laur$L)

Calculate how much smoking causes lung cancer :

\[\frac{\Pr(L = yes \,|\,S = \text{yes})}{\Pr(L = yes \,|\,S = \text{no})} =\frac{0.11769384}{0.01368209} = 8.6020348\]

4.2 Hybrid approach

Specifying all the local distributions can be problematic in a large network. In most cases it can be more convenient to create a bn.fit object from (possibly dummy) data using bn.fit(), and then to modify just the local distributions of the nodes of interest.

Exercise (Demonstration) 4.2

Creating custom fitted Bayesian networks using both data and expert knowledge.
- For discrete Bayesian networks we can extract the conditional probability table stored in the bn.fit object with coef(), modify it, and re-save it.

`R` Code

dag = dag_Laur
fitted = bn.fit(dag, asia)
cpt = coef(fitted$S)
# Imagine the distribution of Smoking history in a population is "no": 70% , and "yes" : 30%
cpt[1:2] = c(0.7, 0.3) # The probability distribution of node S must sum to one.
fitted$S = cpt
fitted

#> 
#>   Bayesian network parameters
#> 
#>   Parameters of node A (multinomial distribution)
#> 
#> Conditional probability table:
#>      no    yes 
#> 0.9916 0.0084 
#> 
#>   Parameters of node B (multinomial distribution)
#> 
#> Conditional probability table:
#>  
#>      S
#> B            no       yes
#>   no  0.7006036 0.2823062
#>   yes 0.2993964 0.7176938
#> 
#>   Parameters of node D (multinomial distribution)
#> 
#> Conditional probability table:
#>  
#> , , L = no, T = no
#> 
#>      B
#> D             no        yes
#>   no  0.90017286 0.21373057
#>   yes 0.09982714 0.78626943
#> 
#> , , L = yes, T = no
#> 
#>      B
#> D             no        yes
#>   no  0.29752066 0.13170732
#>   yes 0.70247934 0.86829268
#> 
#> , , L = no, T = yes
#> 
#>      B
#> D             no        yes
#>   no  0.12500000 0.29166667
#>   yes 0.87500000 0.70833333
#> 
#> , , L = yes, T = yes
#> 
#>      B
#> D     no        yes
#>   no     0.00000000
#>   yes    1.00000000
#> 
#> 
#>   Parameters of node L (multinomial distribution)
#> 
#> Conditional probability table:
#>  
#>      S
#> L             no        yes
#>   no  0.98631791 0.88230616
#>   yes 0.01368209 0.11769384
#> 
#>   Parameters of node S (multinomial distribution)
#> 
#> Conditional probability table:
#>   no yes 
#> 0.7 0.3 
#> 
#>   Parameters of node T (multinomial distribution)
#> 
#> Conditional probability table:
#>  
#>      A
#> T              no         yes
#>   no  0.991528842 0.952380952
#>   yes 0.008471158 0.047619048
#> 
#>   Parameters of node X (multinomial distribution)
#> 
#> Conditional probability table:
#>  
#> , , T = no
#> 
#>      L
#> X              no         yes
#>   no  0.956587473 0.006134969
#>   yes 0.043412527 0.993865031
#> 
#> , , T = yes
#> 
#>      L
#> X              no         yes
#>   no  0.000000000 0.000000000
#>   yes 1.000000000 1.000000000

5 Exercises: Prediction

Inference and probability queries

The inference engine of bnlearn is limited, but it can be used to test the networks and to perform basic operations with the models.

By using cpquery() we can ask for the probability of an event given a set of evidence. We may ask for a particular combination of configurations in the BN and a set of observed statuses for the variables in the evidence. For example, we could ask, what is the posibility of a positive cancer diagnosis for a person that smokes?, in the asia network.

Reference : http://jarias.es/bayesnetRtutorial/#inference_and_probability_queries

5.1 Conditional probability query

Interested in the marginal posterior probability distribution of variables given evidence on other variables
Most likely outcome (a.k.a. maximum a posteriori)
Interested in finding the configuration of the variables that have the highest posterior probability (discrete)
cpquery() estimates the conditional probability of event given evidence.
cpdist() generates random observations conditional on the evidence.

Remark

Note that both cpquery() and cpdist() are based on Monte Carlo particle filters, and therefore they may return slightly different values on different runs.

Example

- `cpquery()`

Prior probability of Dyspnea being yes

# cpquery(fitted, event, evidence, cluster = NULL, method = "ls", ..., debug = FALSE)
set.seed(20180308)
# Prior distribution of Dyspnoea
cpquery(fitted = bn_dag_Laur, 
        event = (D == "yes"), 
        evidence = TRUE, # evidence being true means that we have no prior evidence.
        n = 100000)

#> [1] 0.46936

Posterior probability of Dyspnea = yes given Lung cancer being no

# Posterior distribution of Dyspnoea given Lung cancer being "no"
cpquery(fitted = bn_dag_Laur, 
        event = (D == "yes"), 
        evidence = (L == "no"),
        n = 100000)

#> [1] 0.4436559

Posterior probability of Dyspnea = yes given Lung cancer being yes

# Posterior distribution of Dyspnoea given Lung cancer being "yes"
cpquery(fitted = bn_dag_Laur, 
        event = (D == "yes"), 
        evidence = (L == "yes"),
        n = 100000)

#> [1] 0.8201835

- `cpdist()`

# cpdist( fitted, nodes, evidence, cluster = NULL, method = "ls", ..., debug = FALSE)
set.seed(20180308)

prop.table(
  table(
    cpdist(fitted = bn_dag_Laur, 
           nodes = c("D"),
           evidence = TRUE,
           n = 100000))
)

#> 
#>        no       yes 
#> 0.5304898 0.4695102

prop.table(
  table(
    cpdist(fitted = bn_dag_Laur, 
           nodes = c("D"),
           evidence = (L == "no"),
           n = 100000))
)

#> 
#>        no       yes 
#> 0.5563441 0.4436559

prop.table(
  table(
    cpdist(fitted = bn_dag_Laur, 
           nodes = c("D"),
           evidence = (L == "yes"),
           n = 100000))
)

#> 
#>        no       yes 
#> 0.1765428 0.8234572

Exercise 5.1

Calculate by hand
- $\Pr (L = yes\,|\,S = yes)$ : Probability of an individual having lung cancer
- $\Pr (L = yes\,|\,S = no)$ : Probability of an individual having lung cancer given that he/she has NO smoking history.
- $\Pr (L = yes)$
Estimate following probabilties using cpquery()
- $\hat{\Pr} (L = yes)$ : Probability of an individual having lung cancer
- $\hat{\Pr} (L = yes\,|\,S = yes)$ : Probability of an individual having lung cancer given that he/she has smoking history
- $\hat{\Pr} (L = yes\,|\,S = no)$ : Probability of an individual having lung cancer given that he/she has NO smoking history.
Calculate the ratio between $\hat{\Pr}(L = yes\,|\,S = yes)$ and $\hat{\Pr}(L = yes\,|\,S = no)$
Compare the ratio with “Exercise 4.1 under Section 4.1 Data-driven approach”

Solution

Calculate by hand
- $\Pr (L = yes\,|\,S = yes) = 0.11769384$ (read from coef(bn_dag_Laur$L))
- $\Pr (L = yes\,|\,S = no) = 0.01368209$ (read from coef(bn_dag_Laur$L))
- $\Pr (L = yes) = 0.11769384 \times 0.503 + 0.01368209 \times 0.497 = 0.066$

Recall : Law of total probability

\[\Pr(L = yes) = {\huge[} \Pr(L = yes\,|\,S = yes) \times \Pr(S = yes){\huge+}\Pr(L = yes\,|\,S = no)\times\Pr(S = no){\huge]} \]

coef(bn_dag_Laur$L)
coef(bn_dag_Laur$S)
0.11769384 * 0.503 + 0.01368209 * 0.497 # sum(coef(bn_dag_Laur$L) [2,] * coef(bn_dag_Laur$S))

Estimate following probabilties using cpquery()

set.seed(20180308)

L_prior = cpquery(fitted = bn_dag_Laur, 
            event = (L == "yes"), 
            evidence = TRUE, # evidence being true means that we have no prior evidence.
            n = 100000)
L_prior

#> [1] 0.06608

L_S_yes = cpquery(fitted = bn_dag_Laur, 
            event = (L == "yes"), 
            evidence = (S == "yes"),
            n = 100000)
L_S_yes

#> [1] 0.1177547

L_S_no  = cpquery(fitted = bn_dag_Laur, 
            event = (L == "yes"), 
            evidence = (S == "no"),
            n = 100000)
L_S_no

#> [1] 0.01359477

L_ratio = L_S_yes / L_S_no
L_ratio

#> [1] 8.661764

Exercise 5.2 (Homework !!)

Estimate following probabilties using cpquery()
- $\hat{\Pr} (S)$ : Probability of an individual having smoking history
- $\hat{\Pr} (S = yes\,|\,L = yes)$ : Probability of an individual having smoking history given that he/she has lung cancer
- $\hat{\Pr} (S = yes\,|\,L = no)$ : Probability of an individual having smoking history given that he/she does NOT have lung cancer.
Calculate the ratio between $\hat{\Pr}(S = yes \,|\,L = yes)$ and $\hat{\Pr}(S = yes\,|\,L = no)$
Calculate (by hand) the ratio between $\Pr(S = yes \,|\,L = yes)$ and $\Pr(S = yes \,|\,L = no)$

6 Notes

Source: https://www.norsys.com/tutorials/netica/secA/tut_A1.htm

Probabilities need not be exact to be useful

Some people have shied away from using Bayes nets because they imagine they will only work well, if the probabilities upon which they are based are exact. This is not true. It turns out very often that approximate probabilities, even subjective ones that are guessed at, give very good results. Bayes nets are generally quite robust to imperfect knowledge. Often the combination of several strands of imperfect knowledge can allow us to make surprisingly strong conclusions.

Causal Conditional Probabilities are easier to estimate than the reverse

Studies have shown people are better at estimating probabilities “in the forward direction”. For example, doctors are quite good at giving the probability estimates for “if the patient has lung cancer, what are the chances their X-ray will be abnormal?”, rather than the reverse, “if the X-ray is abnormal, what are the chances of lung cancer being the cause?” (Jensen, Finn V. (1996) An Introduction to Bayesian Networks, Springer Verlag, New York.)

Back to home

Hands-on workshop: Bayesian networks

1 bnlearn package

1.1 Install R package bnlearn

Method 1 : Graphic User Interface

Method 2 : Console Prompt >

2 Data

3 Exercises : Structure Learning

3.1 Expert-Driven Approach

Example

Exercise 3.1

Questions

Solution

Exercise 3.2

Questions

Solution

3.2 Data-Driven Approach

Exercise 3.3 : Score-based methods

Questions

Solution

3.3 Comparing Bayesian network structures

Qualitative methods

- compare() function

- graphviz.compare() function

Quantitative method

- score() function

4 Exercises: Parameter Learning

4.1 Data-driven approach

Exercise (Demonstration) 4.1

Questions

Solution

4.2 Hybrid approach

Exercise (Demonstration) 4.2

R Code

5 Exercises: Prediction

5.1 Conditional probability query

Example

- cpquery()

- cpdist()

Exercise 5.1

Solution

Exercise 5.2 (Homework !!)

6 Notes

Probabilities need not be exact to be useful

Causal Conditional Probabilities are easier to estimate than the reverse

1 `bnlearn` package

1.1 Install R package `bnlearn`

Method 2 : Console Prompt `>`

- `compare()` function

- `graphviz.compare()` function

- `score()` function

`R` Code

- `cpquery()`

- `cpdist()`