My research team was using a combination of R and GraphPad to conduct data analysis and visualization. I volunteered to update our workflow with Python since it’s clearly the trend for the future, and best of all, it’s entirely free!

In the process of generating confusion matrices, I encountered a major stumbling block. I used Scikit-learn to generate the visualization. By default, the true values are set as rows, while predicted values are set as columns.

The problem was that my team required the opposite setup to align our new results with the data from their previous papers. A solution for this rearrangement was surprisingly hard to find online. Another team member suggested swapping the true and predicted values in our data:

# Original code 
#'Ans' = true value & 'Rsp' = predicted value
cm = confusion_matrix(data['Ans'], data['Rsp'])

# The suggested swap
cm = confusion_matrix(data['Rsp'], data['Ans'])

However, that only led to problematic results. After some further digging, I discovered that the solution was to transpose the matrices.

cm_transposed = cm.T

Since this solution was so hard to find, I hope this post can go on to help others.

The order in which we manipulate the data matters a lot. I found that I should set up the matrices and confirm their accuracy before attempting to change the appearance of the data visualization.

Here’s the code for generating one of our confusion matrices:

import pandas as pd 
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay 
import matplotlib.pyplot as plt 

# Read the CSV file 
data = pd.read_csv('') 
sorted_labels = sorted(data['Ans'].unique())

# Calculate confusion matrix
cm = confusion_matrix(data['Ans'], data['Rsp'], labels=sorted_labels, normalize='true')

# Transpose the matrix, multiply by 100 and round the values
cm_transposed = cm.T
cm_percentage = cm_transposed * 100
cm_percentage_rounded = cm_percentage.round(1)

# Capitalize the labels
capitalized_sorted_labels = [label.capitalize() for label in sorted_labels]

# Display the transposed confusion matrix with percentages
fig, ax = plt.subplots(figsize=(10, 10), dpi=100)
ConfusionMatrixDisplay(cm_percentage_rounded, display_labels=capitalized_sorted_labels).plot(ax=ax, cmap='Blues', values_format='.1f')

ax.set_xlabel("Actual Emotion")
ax.set_ylabel("Participants' Response (%)")
ax.set_title('All Conditions')

plt.show()

Similar Posts