Data Manipulation
What is Python?
popular programming language that is often considered easy to learn, flexible, free
extensively used in science and in industry
strong set of add-on libraries that let you use it for all kinds of tasks including data analysis, statistical modeling, developing applications for the web, running experiments, programming video games, making desktop apps, programming robots, etc.
You can learn more about the language on
Python v.s. R
Python is a general-purpose programming language, while R is a statistical programming language.
This means that Python is more versatile and can be used for a wider range of tasks, such as web development, data manipulation, and machine learning.
Why Python?
Lots of crowdsourced support
Easy to share code: Jupyter Notebooks — upload to GitHub (demo mvlasceanu)
Getting started in Python#
We will use Colab to run Python scripts in this class
Click on File -> New Notebook
Sign in to your Google account
Name your notebook: MyFirstNotebook
Save your notebook in Drive
Comments in Python start with the hash character, #, and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code
# this is the first comment
spam = 1 # and this is the second comment
# ... and now a third!
text = "This is not a comment because it's inside quotes."
This is not a comment because it's inside quotes.
A function in Python is a piece of code, often made up of several instructions, which runs when it is referenced or “called”.
Functions are also called methods or procedures. Python provides many default functions (like print()) but also gives you freedom to create your own custom functions
Types of numbers:
Integers (int) = numbers without decimals
Floating-Point numbers (float) = numbers with decimals
You can check the type of number using the type() function
Simple calculations
17 / 3 # Classic division returns a float
17 // 3 #floor division discards the fractional part
17 % 3 # the % operator returns the remainder of the division
Variables & Functions#
named entities that refer to certain types of data inside the programming language. We can assign values to a variable in order to save a result or use it later.
the equal sign (=) assigns a value to a variable:
width = 20
height = 30
area = width * height
print("area = ", area)
area = 600
Rules for naming variables:
A variable name must start with a letter or the underscore character (e.g., _width)
A variable name cannot start with a number
A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )
Variable names are case-sensitive (age, Age, and AGE are three different variables)
small pieces of text that can be manipulated in Python.
Strings can be enclosed in single quotes (’…’) or double quotes (”…”) with the same result.
'spam eggs' # single quotes
'spam eggs'
'doesn\'t' #use \ to escape single quotes
"doesn't" #or use double quotes
To concatenate variables, use +
prefix = 'Py'
prefix + 'thon'
a sequence of comma-separated values (items) between square brackets
#list of numbers
squares = [1,4,9,16,25]
[1, 4, 9, 16, 25]
#list of strings
squares_strings = ["one", "four", "nine", "sixteen", "twenty-five"]
['one', 'four', 'nine', 'sixteen', 'twenty-five']
can be indexed and sliced:
#indexing returns the item in the referenced index position
#slicing returns a new list
[9, 16, 25]
The function len() can let can let you know how many objects are in the list:
letters = ['a', 'b', 'c', 'd']
Lists can also be concatenated with +
squares + [36, 49, 64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
You can also nest lists (create lists containing other lists)
a = ['a', 'b', 'c']
n = [1,2,3]
x = [a,n]
[['a', 'b', 'c'], [1, 2, 3]]
You can create lists from scratch with a number of different methods. For example, to create a list of numbers from 0 to 10, you can use the range function which automatically generates an iterator, which steps through a set of values:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Error Messages#
Most errors in Python generate what’s known as an exception, where the code doesn’t crash but a warning will be issued. For example, if there are too many parentheses in a print() command, a SyntaxError will occur. If you try to divide by 0, a ZeroDivisionError will occur, etc.
Importing Libraries#
import pandas as pd
import seaborn as sns
import numpy as np
import os, sys
import matplotlib.pyplot as plt
Python libraries really extend your computing and data options. For example, one library called the wikipedia library provides an interface to Wikipedia through your code.
Loading data#
Data is an organization of measurements into a collection (e.g., lists). In a spreadsheet (e.g., Excel) data are organized in rows and columns (metadata). In Python, we use pands (a data manipulaiton and analysis tool)
Download this dataset: mvlasceanu/RegressionData
Now import it into the current Colab Session:
Importing data
In Colab, you have to import data every time since the web browser will not store your datasets
# import data downloaded from
# df = pd.read_excel('data.xlsx')
# Or you can read the Excel file directly from the URL
url = ''
df = pd.read_csv(url)
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Age | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | |
0 | R_1d6rdZRmlD02sFi | FutureSelfCont | 100.00 | 100.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 40 | 100.0 | NaN | 2.0 | 1.0 | 2,3,4,6,7 | 7 | 81 | 25.566 | 1043.866 |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 50 | 3.0 | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 36 | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 50 | 100.0 | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 34 | 81.0 | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 66 | 72.0 | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 56 | 65.0 | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 43 | 50.0 | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 71 | 40.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 71 | 51.0 | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 |
221 rows × 51 columns
Working with data in Python#
Accessing individual columns/rows in a dataframe
To access a single column you can index it like a dictionary in Python using the column name as a key:
0 40
1 50
2 36
3 50
4 34
216 66
217 56
218 43
219 71
220 71
Name: Age, Length: 221, dtype: int64
Since brackets are used to find columns, special notation is needed to access rows. The best way to look up a single row is to use .iloc[] where you pass the integer row number of the row you want to access (remmeber that Python is 0 indexed!). So if you wanted to see the first row, you would type:
ResponseId R_1d6rdZRmlD02sFi
condName FutureSelfCont
BELIEFcc 100.0
POLICYcc 100.0
SHAREcc 0.0
WEPTcc 8
Intervention_order PolicySocialM
Belief1 100
Belief2 100
Belief3 100
Belief4 100
Policy1 NaN
Policy2 NaN
Policy3 NaN
Policy4 NaN
Policy5 NaN
Policy6 NaN
Policy7 NaN
Policy8 100.0
Policy9 NaN
Trust_sci1_1 NaN
Trust_sci2_1 NaN
Trust_gov_1 NaN
ID_hum_1 NaN
Enviro_ID_1 NaN
Enviro_ID_2 NaN
Enviro_ID_3 NaN
Enviro_ID_4 NaN
Enviro_motiv_1 NaN
Enviro_motiv_11 NaN
Enviro_motiv_12 NaN
Enviro_motiv_13 NaN
Enviro_motiv_14 NaN
Enviro_motiv_15 NaN
Enviro_motiv_16 NaN
Enviro_motiv_17 NaN
Enviro_motiv_18 NaN
Enviro_motiv_20 NaN
PlurIgnoranceItem_1 NaN
Gender 2
Age 40
Politics2_1 100.0
Politics2_9 NaN
Edu 2.0
Income 1.0
Indirect_SES 2,3,4,6,7
MacArthur_SES 7
PerceivedSciConsensu_1 81
Intro_Timer 25.566
condition_time_total 1043.866
Name: 0, dtype: object
Indexes and Columns
RangeIndex(start=0, stop=221, step=1)
Index(['ResponseId', 'condName', 'BELIEFcc', 'POLICYcc', 'SHAREcc', 'WEPTcc',
'Intervention_order', 'Belief1', 'Belief2', 'Belief3', 'Belief4',
'Policy1', 'Policy2', 'Policy3', 'Policy4', 'Policy5', 'Policy6',
'Policy7', 'Policy8', 'Policy9', 'Trust_sci1_1', 'Trust_sci2_1',
'Trust_gov_1', 'ID_hum_1', 'ID_GC_1', 'Enviro_ID_1', 'Enviro_ID_2',
'Enviro_ID_3', 'Enviro_ID_4', 'Enviro_motiv_1', 'Enviro_motiv_11',
'Enviro_motiv_12', 'Enviro_motiv_13', 'Enviro_motiv_14',
'Enviro_motiv_15', 'Enviro_motiv_16', 'Enviro_motiv_17',
'Enviro_motiv_18', 'Enviro_motiv_20', 'PlurIgnoranceItem_1', 'Gender',
'Age', 'Politics2_1', 'Politics2_9', 'Edu', 'Income', 'Indirect_SES',
'MacArthur_SES', 'PerceivedSciConsensu_1', 'Intro_Timer',
Deleting rows & columns from a dataframe
To delete a row, you can use the .drop() method to drop a particular item using its index value. The .drop() method returns a new datafrom with the particular rows removed.
df2 = df.drop([0])
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Age | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 50 | 3.0 | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 36 | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 50 | 100.0 | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 34 | 81.0 | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 |
5 | R_2RUhLjdsxOPqiAK | CollectAction | 74.50 | 51.111111 | NaN | 8 | PolicySocialM | 4 | 99 | 99 | ... | 31 | 55.0 | 50.0 | 2.0 | 5.0 | 1,2,3,4,5,6,7 | 6 | 99 | 6.126 | 126.278 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 66 | 72.0 | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 56 | 65.0 | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 43 | 50.0 | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 71 | 40.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 71 | 51.0 | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 |
220 rows × 51 columns
To drop one or more columns by name, note that the case of the column name must match; you have to also specicy axis=1 to refer to dropping columns instead of rows.
df.drop('Age', axis=1)
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Gender | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | |
0 | R_1d6rdZRmlD02sFi | FutureSelfCont | 100.00 | 100.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 2 | 100.0 | NaN | 2.0 | 1.0 | 2,3,4,6,7 | 7 | 81 | 25.566 | 1043.866 |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 2 | 3.0 | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 1 | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 2 | 100.0 | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 1 | 81.0 | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 1 | 72.0 | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 2 | 65.0 | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 1 | 50.0 | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 2 | 40.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 2 | 51.0 | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 |
221 rows × 50 columns
To drop the rows that have any missing data you can use the dropna() function
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Age | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 36 | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 |
24 | R_ZqaUaJxrm1YPuVP | Control | 100.00 | 100.000000 | 1.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 42 | 100.0 | 100.0 | 4.0 | 8.0 | 1,2,3,4,6,7 | 10 | 100 | 4.922 | 55.954 |
37 | R_2S2zxgInyyRBKba | Control | 88.25 | 72.222222 | 1.0 | 8 | PolicySocialM | 100 | 53 | 100 | ... | 38 | 100.0 | 0.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 6 | 100 | 14.775 | 285.329 |
38 | R_1N4GsDkFgtUPE6l | Control | 69.75 | 50.555556 | 0.0 | 0 | PolicySocialM | 88 | 63 | 72 | ... | 31 | 49.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 63 | 7.995 | 114.688 |
91 | R_3F3QEN3yvHxVljh | Control | 67.75 | 77.111111 | 1.0 | 1 | PolicySocialM | 41 | 73 | 84 | ... | 21 | 85.0 | 85.0 | 3.0 | 5.0 | 1,2,3,4,5,6,7 | 8 | 87 | 8.009 | 59.347 |
126 | R_24Gkisox0KIB6WP | Control | 71.75 | 80.000000 | 1.0 | 8 | PolicySocialM | 74 | 77 | 64 | ... | 35 | 95.0 | 75.0 | 4.0 | 8.0 | 1,2,3,4,5,6,7 | 6 | 77 | 8.852 | 52.992 |
138 | R_1ISs2aRDwlDONh5 | Control | 99.50 | 91.666667 | 1.0 | 3 | PolicySocialM | 100 | 99 | 100 | ... | 63 | 50.0 | 51.0 | 3.0 | 5.0 | 1,2,3,4,5,6,7 | 6 | 81 | 16.680 | 225.936 |
155 | R_1ddf9EWAgKznSWb | Control | 100.00 | 93.444444 | 0.0 | 7 | PolicySocialM | 100 | 100 | 100 | ... | 64 | 57.0 | 53.0 | 3.0 | 5.0 | 1,2,3,4,5,6,7 | 4 | 96 | 20.701 | 296.402 |
162 | R_sdQjwf0qaXnxTS9 | Control | 90.50 | 71.444444 | 0.0 | 0 | PolicySocialM | 79 | 95 | 98 | ... | 64 | 8.0 | 10.0 | 3.0 | 4.0 | 1,2,3,4,6,7 | 5 | 82 | 36.586 | 500.922 |
211 | R_22xjAMgarDlfWjP | Control | 76.75 | 70.000000 | 1.0 | 8 | PolicySocialM | 70 | 84 | 72 | ... | 41 | 72.0 | 87.0 | 4.0 | 7.0 | 1,2,3,4,5,6 | 9 | 95 | 5.381 | 53.634 |
10 rows × 51 columns
To only drop rows that have missing values in a single variable, use the argument subset in the function dropna()
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Age | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | |
0 | R_1d6rdZRmlD02sFi | FutureSelfCont | 100.00 | 100.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 40 | 100.0 | NaN | 2.0 | 1.0 | 2,3,4,6,7 | 7 | 81 | 25.566 | 1043.866 |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 50 | 3.0 | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 36 | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 50 | 100.0 | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 34 | 81.0 | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 66 | 72.0 | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 56 | 65.0 | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 43 | 50.0 | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 71 | 40.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 71 | 51.0 | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 |
221 rows × 51 columns
To drop rows based on a condition, for example drop all participants whose age is less than 20:
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Age | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | |
0 | R_1d6rdZRmlD02sFi | FutureSelfCont | 100.00 | 100.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 40 | 100.0 | NaN | 2.0 | 1.0 | 2,3,4,6,7 | 7 | 81 | 25.566 | 1043.866 |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 50 | 3.0 | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 36 | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 50 | 100.0 | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 34 | 81.0 | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 66 | 72.0 | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 56 | 65.0 | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 43 | 50.0 | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 71 | 40.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 71 | 51.0 | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 |
217 rows × 51 columns
To drop everything except the variables you name, use double brackets:
df[['BELIEFcc', 'POLICYcc']]
0 | 100.00 | 100.000000 |
1 | 100.00 | 100.000000 |
2 | 30.25 | 66.444444 |
3 | 4.50 | 16.000000 |
4 | 71.75 | 67.000000 |
... | ... | ... |
216 | 22.00 | 28.333333 |
217 | 92.75 | 68.000000 |
218 | 98.50 | 81.333333 |
219 | 100.00 | 80.000000 |
220 | 94.25 | 66.714286 |
221 rows × 2 columns
Adding columns to a dataframe
We can assign a new column that is the sum of two other columns like this:
df['sum'] = df['Belief1'] + df['Belief2']
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Politics2_1 | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | sum | |
0 | R_1d6rdZRmlD02sFi | FutureSelfCont | 100.00 | 100.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 100.0 | NaN | 2.0 | 1.0 | 2,3,4,6,7 | 7 | 81 | 25.566 | 1043.866 | 200 |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 3.0 | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 | 200 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 48.0 | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 | 81 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 100.0 | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 | 11 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 81.0 | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 | 151 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 72.0 | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 | 48 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 65.0 | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 | 181 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 50.0 | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 | 199 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 40.0 | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 | 200 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 51.0 | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 | 177 |
221 rows × 52 columns
We can also define new columns to be a constant value:
df['constant'] = 1
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | sum | constant | |
0 | R_1d6rdZRmlD02sFi | FutureSelfCont | 100.00 | 100.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | NaN | 2.0 | 1.0 | 2,3,4,6,7 | 7 | 81 | 25.566 | 1043.866 | 200 | 1 |
1 | R_1CjFxfgjU1coLqp | Control | 100.00 | 100.000000 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 5.0 | 4.0 | NaN | 1,3,4,5,6,7 | 9 | 96 | 16.697 | 367.657 | 200 | 1 |
2 | R_qxty9a2HTTEq7Xb | Control | 30.25 | 66.444444 | 0.0 | 8 | PolicySocialM | 3 | 78 | 3 | ... | 49.0 | 3.0 | 5.0 | 2,3,4,5,6,7 | 6 | 76 | 24.055 | 79.902 | 81 | 1 |
3 | R_1ONRMXgQ310zjNm | BindingMoral | 4.50 | 16.000000 | 0.0 | 8 | PolicySocialM | 6 | 5 | 3 | ... | 100.0 | 2.0 | 6.0 | 2,3,4,5,6,7 | 6 | 22 | 11.647 | 2.701 | 11 | 1 |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 | 151 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
216 | R_SCbUzWDoIpIodH3 | Control | 22.00 | 28.333333 | NaN | 8 | PolicySocialM | 17 | 31 | 16 | ... | 72.0 | 4.0 | 5.0 | 1,2,3,4,5,6,7 | 7 | 70 | 11.223 | 195.065 | 48 | 1 |
217 | R_27TYhr5VpeS4ejh | Control | 92.75 | 68.000000 | 0.0 | 8 | PolicySocialM | 94 | 87 | 100 | ... | 65.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 85 | 21.956 | 398.400 | 181 | 1 |
218 | R_ZC41XczQH7OQwUh | SystemJust | 98.50 | 81.333333 | 1.0 | 0 | PolicySocialM | 100 | 99 | 97 | ... | 52.0 | 3.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 80 | 15.358 | 124.334 | 199 | 1 |
219 | R_3fPjJLW85l37Mqb | PluralIgnorance | 100.00 | 80.000000 | 0.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 100 | 15.303 | 47.831 | 200 | 1 |
220 | R_23UgeVaaC1npjt2 | BindingMoral | 94.25 | 66.714286 | NaN | 8 | PolicySocialM | 77 | 100 | 100 | ... | 53.0 | 3.0 | 4.0 | 1,2,3,5,6,7 | 5 | 20 | 7.066 | 11.945 | 177 | 1 |
221 rows × 53 columns
Other useful functions
Checking the size/dimensions of your data:
(221, 53)
sort a dataset by specific columns
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | sum | constant | |
145 | R_3EKCcuDCNrAz78n | CollectAction | 68.25 | 42.111111 | 0.0 | 0 | PolicySocialM | 55 | 80 | 48 | ... | 18.0 | 2.0 | 5.0 | 2,3,4,6,7 | 5 | 51 | 22.828 | 188.043 | 135 | 1 |
70 | R_31aRJ2zm2MzORFM | SystemJust | 80.00 | 53.888889 | 1.0 | 1 | PolicySocialM | 96 | 91 | 47 | ... | 48.0 | 2.0 | 1.0 | 2,6,7 | 5 | 40 | 5.105 | 52.715 | 187 | 1 |
113 | R_1lrROkaaJczHOmc | PsychDistance | 100.00 | 76.333333 | 0.0 | 1 | PolicySocialM | 100 | 100 | 100 | ... | 30.0 | 2.0 | 6.0 | 1,2,3,4,5,6,7 | 7 | 85 | 9.327 | 223.541 | 200 | 1 |
17 | R_AjQsPrpTA07lwbf | Control | 82.25 | 84.142857 | NaN | 8 | PolicySocialM | 95 | 74 | 72 | ... | 0.0 | 2.0 | 5.0 | 1,2,3,4,5,6,7 | 6 | 70 | 23.331 | 514.694 | 169 | 1 |
91 | R_3F3QEN3yvHxVljh | Control | 67.75 | 77.111111 | 1.0 | 1 | PolicySocialM | 41 | 73 | 84 | ... | 85.0 | 3.0 | 5.0 | 1,2,3,4,5,6,7 | 8 | 87 | 8.009 | 59.347 | 114 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13 | R_eW20eHgDtQxwsEh | BindingMoral | 84.50 | 42.777778 | NaN | 0 | PolicySocialM | 100 | 100 | 78 | ... | 72.0 | 3.0 | 4.0 | 1,2,3,4,5,6,7 | 8 | 86 | 9.802 | 0.000 | 200 | 1 |
198 | R_pQsw8ppz2eYJtTz | SystemJust | 100.00 | 89.777778 | NaN | 8 | PolicySocialM | 100 | 100 | 100 | ... | 51.0 | 2.0 | 5.0 | 1,2,3,4,5,6,7 | 6 | 78 | 29.647 | 275.358 | 200 | 1 |
99 | R_3DxMLviGts0eYCd | BindingMoral | 92.50 | 86.333333 | 1.0 | 4 | PolicySocialM | 82 | 94 | 98 | ... | 4.0 | 2.0 | 3.0 | 1,2,3,5,6,7 | 5 | 75 | 8.900 | 0.000 | 176 | 1 |
20 | R_3HLVjLXLaVSQfLT | DynamicNorm | 100.00 | 100.000000 | 1.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 50.0 | 3.0 | 5.0 | 1,2,3,4,5,6,7 | 6 | 70 | 10.365 | 58.411 | 200 | 1 |
178 | R_86RsRnqYjy1pGcF | CollectAction | 62.50 | 71.875000 | 1.0 | 0 | PolicySocialM | 10 | 75 | 65 | ... | 49.0 | 3.0 | NaN | 1,2,3,4,6,7 | 5 | 30 | 12.587 | 126.884 | 85 | 1 |
221 rows × 53 columns
The keys() function shows you the names of the variables in the dataframe
Index(['ResponseId', 'condName', 'BELIEFcc', 'POLICYcc', 'SHAREcc', 'WEPTcc',
'Intervention_order', 'Belief1', 'Belief2', 'Belief3', 'Belief4',
'Policy1', 'Policy2', 'Policy3', 'Policy4', 'Policy5', 'Policy6',
'Policy7', 'Policy8', 'Policy9', 'Trust_sci1_1', 'Trust_sci2_1',
'Trust_gov_1', 'ID_hum_1', 'ID_GC_1', 'Enviro_ID_1', 'Enviro_ID_2',
'Enviro_ID_3', 'Enviro_ID_4', 'Enviro_motiv_1', 'Enviro_motiv_11',
'Enviro_motiv_12', 'Enviro_motiv_13', 'Enviro_motiv_14',
'Enviro_motiv_15', 'Enviro_motiv_16', 'Enviro_motiv_17',
'Enviro_motiv_18', 'Enviro_motiv_20', 'PlurIgnoranceItem_1', 'Gender',
'Age', 'Politics2_1', 'Politics2_9', 'Edu', 'Income', 'Indirect_SES',
'MacArthur_SES', 'PerceivedSciConsensu_1', 'Intro_Timer',
'condition_time_total', 'sum', 'constant'],
The unique() function shows you the unique values in a variable
array([40, 50, 36, 34, 31, 58, 27, 35, 60, 65, 74, 61, 68, 19, 38, 55, 56,
52, 42, 25, 26, 45, 62, 29, 41, 67, 30, 47, 66, 72, 70, 73, 48, 23,
44, 28, 71, 57, 63, 18, 33, 22, 21, 69, 32, 51, 59, 43, 37, 64, 53,
46, 54, 24, 49], dtype=int64)
Selecting rows from the dataframe:
grabbing subsets of a dataframe’s rows based on the values of some of the rows
different than slicing, which takes little chunks out of a larger dataframe using indexes or column names.
Here we are interested in selecting rows that meet a particular criterion
For example, select only the women participants
Or select only participants under the age of 35:
df.query('Age<35 & BELIEFcc>50 & Edu >3')
ResponseId | condName | BELIEFcc | POLICYcc | SHAREcc | WEPTcc | Intervention_order | Belief1 | Belief2 | Belief3 | ... | Politics2_9 | Edu | Income | Indirect_SES | MacArthur_SES | PerceivedSciConsensu_1 | Intro_Timer | condition_time_total | sum | constant | |
4 | R_2VQr7rPu2yI8TnK | CollectAction | 71.75 | 67.000000 | 1.0 | 2 | PolicySocialM | 86 | 65 | 66 | ... | 73.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 76 | 26.658 | 398.695 | 151 | 1 |
6 | R_OIo2Xe8idzzVIiJ | PluralIgnorance | 73.00 | 67.555556 | 1.0 | 8 | PolicySocialM | 80 | 64 | 90 | ... | 62.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 10 | 86 | 57.275 | 109.287 | 144 | 1 |
38 | R_1N4GsDkFgtUPE6l | Control | 69.75 | 50.555556 | 0.0 | 0 | PolicySocialM | 88 | 63 | 72 | ... | 53.0 | 4.0 | 4.0 | 1,2,3,4,5,6,7 | 6 | 63 | 7.995 | 114.688 | 151 | 1 |
45 | R_3Gs05E56hJbOtm6 | CollectAction | 100.00 | 97.666667 | 1.0 | 8 | PolicySocialM | 100 | 100 | 100 | ... | 83.0 | 4.0 | 6.0 | 1,2,3,4,5,6,7 | 8 | 100 | 15.206 | 338.225 | 200 | 1 |
159 | R_2E9Rcfb5J2oloDG | SciConsens | 82.50 | 83.000000 | 1.0 | 1 | PolicySocialM | 85 | 81 | 78 | ... | 76.0 | 4.0 | 7.0 | 2,3,5,7 | 10 | 92 | 7.291 | 8.098 | 166 | 1 |
215 | R_2SHtImaaJp36Aie | SciConsens | 51.25 | 53.444444 | 1.0 | 1 | PolicySocialM | 62 | 51 | 47 | ... | 59.0 | 4.0 | 8.0 | 1,2,3,4,5,6,7 | 9 | 50 | 5.001 | 5.959 | 113 | 1 |
6 rows × 53 columns
Select some values of a var that meet certain requirements based on other vars
Select the rows for which age is less than 35, belief is more than 50, and education is more than 3
Then only grab the policy support values
Then compute the mean
All in one line!
df.query('Age<35 & BELIEFcc>50 & Edu >3')['POLICYcc'].mean()
What is the mean of each variable for each condition?
Use the groupby() function
Do a median split according to one of the variables in a new var
Create a new column called ‘young’ in your data frame df
Populate the young column with 1s for responses higher than the median of age (median split by age) and with 0s for responses lower than the median of age
df['young'] = (df['Age'] > df['Age'].median()).astype(float)
Replace values within variables
df['SHAREcc'] = df['SHAREcc'].replace([0], 'No')
Data manipulation#
Data organization and structure:
Wide format == “not tidy”; how data is exported from Qualtrics
In wide format, each participant occupies one row in the dataframe, and their entire data (for each variable) is contained in that row:
Long format == “tidy format”; how analyses software expects data
In long format, each observation occupies one row in the dataframe, so a participant’s data now spans many rows:
Rules for the transformation:
Each observation must have its own row (observations could be each person, each timepoint, etc.)
Each variable must have its own column (variables are some kind of measurement: gender, age, score, etc.)
Each value must have its own cell (value are the actual measurement: female, 23 years, 12 points, etc.)
Reshape data from wide to long format using melt():
First argument is the data frame: here we give it all the rows and only the columns listed
Second argument is the columns you want to “bring along” or duplicate (Whatever you don’t bring along will become expanded).
That’s it. The rest of the arguments are optional, but nice:
var_name is the column name of the expanded variable
value_name is the column name of the expanded variable’s values
Let’s try to turn the 4 beliefs each participant rated, from wide format (in the data file) to long format (in a new dataframe).
df_long = pd.melt(
df.loc[:, ['ResponseId', 'condName', 'Income', 'Belief1', 'Belief2', 'Belief3', 'Belief4']],
id_vars=['ResponseId', 'condName', 'Income'],
Reshape data from wide to long format using wide_to_long():
Compared to melt(), wide_to_long() can expand more corresponding variables
First argument is also the data frame
Second argument “stubnames” is now the list of columns you want expanded
Third argument “i” is the list of columns you want to bring along (duplicate)
Fourth argument, “j” is the column name of the expanded variable
df_long_again = pd.wide_to_long(df.reset_index(), stubnames=['Belief', 'Policy'], i=['ResponseId', 'Income', 'Age'], j='Item').reset_index()
Save the reshaped dataset. Dataframes can be saved with DataFrame.to_csv() or DataFrame.to_excel(). If you’re doing your analysis in Python, CSVs are much easier to manage.