Learn CPTs of Bayesian Netork#

For the design of Bayesian Networks for a given problem domain, one can follow one of the following approaches:

  1. topology (nodes and arcs) and the conditional propabilities are configured by applying expert knowledge, i.e. experts determine the relevant variables, their dependencies and estimate the conditional probabilities

  2. topology is determined by experts, but the conditional probabilities of the CPTs are learned from data

  3. topology and conditional probabilities are learned from data

In this section it is shown how pyAgrum can be applied for the second option, i.e. to learn the Conditional Probability Tables (CPTs) of Bayesian Networks, whose topology is known.

As demonstrated for example in learn Bayesian Network from covid-19 data pyAgrum can also applied for option 3, where the topology and the CPTs are learned from data.

In contrast to Machine Learning algorithms, Bayesian Networks provide the important capability to integrate knowledge from data with expert knowledge.

In order to demonstrate the learning capability we apply two Bayesian Networks. Both have the identical topology, but they are initialized with different random CPTs. Since a Bayesian Network represents a joint probability distribution (JPT), data can be generated from this network by sampling according to the networks JPT. We sample 10000 instances from one network and apply this data to learn the CPTs of the other. By comparing the distance between the JPTs before and after the second network has been trained with the data from the first network, one can verify, that the second network learns to get similar to the first network, by learning from the first networks data.

%matplotlib inline
from pylab import *
import matplotlib.pyplot as plt
import os
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb

Loading two BNs#

Two identical Bayes Nets for the Visit to Asia?-problem (see section Bayesian Networks with pyAgrum are loaded from disk.

bn=gum.loadBN(os.path.join("out","VisitAsia.bif"))
bn2=gum.loadBN(os.path.join("out","VisitAsia.bif"))

gnb.sideBySide(bn,bn2,
               captions=['First bn','Second bn'])
G X X L L E E L->E A A T T A->T T->E E->X D D E->D S S S->L B B S->B B->D
G X X L L E E L->E A A T T A->T T->E E->X D D E->D S S S->L B B S->B B->D
First bn
Second bn

As shown below, both BNs have the same CPTs

gnb.sideBySide(bn.cpt("D"),bn2.cpt("D"),
               captions=['CPT Dyspnoae bn','CPT Dyspnoae bn2'])
D
E
B
0
1
0
0
0.90000.1000
1
0.20000.8000
1
0
0.30000.7000
1
0.10000.9000
D
E
B
0
1
0
0
0.90000.1000
1
0.20000.8000
1
0
0.30000.7000
1
0.10000.9000
CPT Dyspnoae bn
CPT Dyspnoae bn2

Randomizing the parameters#

Next, for both BNs the values of the CPTs are randomized

bn.generateCPTs()
bn2.generateCPTs()

As can be seen below, the CPTs of both BNs now have new and different values:

gnb.sideBySide(bn.cpt("D"),bn2.cpt("D"),
               captions=['CPT Dyspnoae bn','CPT Dyspnoae bn2'])
D
E
B
0
1
0
0
0.54550.4545
1
0.19980.8002
1
0
0.64090.3591
1
0.49520.5048
D
E
B
0
1
0
0
0.48070.5193
1
0.44920.5508
1
0
0.53330.4667
1
0.60360.3964
CPT Dyspnoae bn
CPT Dyspnoae bn2
gnb.sideBySide(bn.cpt("L"),bn2.cpt("L"),
               captions=['CPT Lung Cancer bn','CPT Lung Cancer bn2'])
L
S
0
1
0
0.62770.3723
1
0.73150.2685
L
S
0
1
0
0.60420.3958
1
0.60210.3979
CPT Lung Cancer bn
CPT Lung Cancer bn2

Exact KL-divergence#

A Bayesian Network represents a joint probability distribution (JPT). For measuring the similarity of two probability distributions, different metrics exist. Here, we apply the Kullback-Leibler (KL) Divergence. The lower the KL-divergence, the higher the similarity of the two distributions.

g1=gum.ExactBNdistance(bn,bn2)
before_learning=g1.compute()
print(before_learning['klPQ'])
5.328485003314561

Just to be sure that the distance between a BN and itself is 0 :

g0=gum.ExactBNdistance(bn,bn)
print(g0.compute()['klPQ'])
0.0

As shown below, the compute()-method of class ExactBNdistance(), does not only provide the Kullback-Leibler Divergence, but also other distance measures. However, here we just apply KL.

before_learning
{'klPQ': 5.328485003314561,
 'errorPQ': 0,
 'klQP': 5.923071875160725,
 'errorQP': 0,
 'hellinger': 1.1593911985090197,
 'bhattacharya': 1.115028222595494,
 'jensen-shannon': 0.7717255411247985}

Generate a data from the original BN#

By applying the methode generateCSV() one can sample data from a Bayesian Network. In the code-cell below 10000 samples, each describing the values of the 8 random variables for one fictional patient, are generated and saved in out/test.csv.

gum.generateCSV(bn,os.path.join("out","test.csv"),10000,False)
-62772.68916612484

Learn CPTs of Bayesian Network from Data#

Next, we will apply the data, as sampled from Bayesian Network bn above, to learn the CPTs of another Bayesian Network bnx with the same topology as bn. We expect, that after a successfull learning process, the KL-Divergence between bn and bn2 is low, i.e. both nets are similar.

There exist different options to learn the CPTs of a Bayesian Network. Below, we implement the following 3 options:

  1. the BNLearner()-class from pyAgrum

  2. the BNLearner()-class from pyAgrum with Laplace Smoothing

  3. the pandas crosstab() method for calculating CPTs

Apply pyAgrum Learners#

BNLearner() without smoothing:

learner=gum.BNLearner(os.path.join("out","test.csv"),bn) 
bn3=learner.learnParameters(bn.dag())

BNLearner() without Laplace Smoothing:

learner=gum.BNLearner(os.path.join("out","test.csv"),bn) 
learner.useAprioriSmoothing(100) # a count C is replaced by C+100
bn4=learner.learnParameters(bn.dag())

As shown below, both approaches learn Bayesian Networks, which have a small KL-divergence to the Bayesian Network, from which training data has been sampled:

after_pyAgrum_learning=gum.ExactBNdistance(bn,bn3).compute()
after_pyAgrum_learning_with_laplace=gum.ExactBNdistance(bn,bn4).compute()
print("KL-Divergence for option without smoothing :{}".format(after_pyAgrum_learning['klPQ']))
print("KL-Divergence for option with smooting(100):{}".format(after_pyAgrum_learning_with_laplace['klPQ']))
KL-Divergence for option without smoothing :0.0016687819527767985
KL-Divergence for option with smooting(100):0.0073678034228635576

Apply pandas to learn CPTs#

import pandas
df=pandas.read_csv(os.path.join("out","test.csv"))
df.head()
L X A D S T E B
0 0 1 0 1 1 0 0 1
1 0 0 0 0 1 0 0 0
2 0 1 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0
4 0 0 0 1 1 0 1 1

We use the crosstab function in pandas, to determine conditional counts:

d_counts=pandas.crosstab(df['D'],[df['E'],df['B']])
d_counts
E 0 1
B 0 1 0 1
D
0 1517 763 936 906
1 1289 3072 519 998

The same function can be applied, to determine conditional probabilities:

d_condprob=pandas.crosstab(df['D'],[df['E'],df['B']],normalize="columns")
d_condprob
E 0 1
B 0 1 0 1
D
0 0.540627 0.198957 0.643299 0.47584
1 0.459373 0.801043 0.356701 0.52416

A global method for estimating Bayesian network parameters from CSV file using PANDAS#

def computeCPTfromDF(bn,df,name):
    """
    Compute the CPT of variable "name" in the BN bn from the database df
    """
    id=bn.idFromName(name)
    domains=[bn.variableFromName(name).domainSize() 
             for name in bn.cpt(id).var_names]

    parents=list(bn.cpt(id).var_names)
    parents.pop()
    
    if (len(parents)>0):
        s=pandas.crosstab(df[name],[df[parent] for parent in parents],normalize="columns")
        #s=c/c.sum().apply(np.float32)
    else:
        s=df[name].value_counts(normalize=True)
        
    bn.cpt(id)[:]=np.array((s).transpose()).reshape(*domains)
    
def ParametersLearning(bn,df):
    """
    Compute the CPTs of every varaible in the BN bn from the database df
    """
    for name in bn.names():
        computeCPTfromDF(bn,df,name)
ParametersLearning(bn2,df)

KL has decreased a lot (if everything’s OK)

g1=gum.ExactBNdistance(bn,bn2)
print("BEFORE LEARNING")
print(before_learning['klPQ'])
print
print("AFTER LEARNING")
print(g1.compute()['klPQ'])
BEFORE LEARNING
5.328485003314561
AFTER LEARNING
0.0016687819527767985

And CPTs should be close

gnb.sideBySide(bn.cpt(3),
               bn2.cpt(3),
               captions=["Original BN","learned BN"])
L
S
0
1
0
0.62770.3723
1
0.73150.2685
L
S
0
1
0
0.64570.3543
1
0.73060.2694
Original BN
learned BN

Influence of the size of the database on the quality of learned parameters#

What is the effect of increasing the size of the database on the KL ? We expect that the KL decreases to 0.

res=[]
for i in range(200,10001,50):
    ParametersLearning(bn2,df[:i])
    g1=gum.ExactBNdistance(bn,bn2)
    res.append(g1.compute()['klPQ'])
fig=figure(figsize=(8,5))
plt.plot(range(200,10001,50),res)
plt.xlabel("size of the database")
plt.ylabel("KL")
plt.title("klPQ(bn,learnedBN(x))")
Text(0.5, 1.0, 'klPQ(bn,learnedBN(x))')
_images/BayesNetLearningWithPandas_42_1.svg