The purpose of this guide is to act as tutorial and lookup guide for students using jamovi software. The order of the steps of this guide are intentional - if using this guide as a tutorial, complete the steps in order.
If you want to follow this guide exactly as it is here, you will need the following accompanying files:
example_dataset.csv
my_codebook.xlsx
The example survey that was used to get this data is shown in the Appendix.
In this section, we will create a Project Folder, install jamovi, install some extra features we’ll need (“modules”), save the project, and import our data.
Create a Project Folder
1.1. Make a new folder somewhere we can find easily. Name it something like “Capstone Data Analysis”. This is where our project will live.
Save our data and codebook into our Project Folder.
Install jamovi software
3.1. Go to: <https://www.jamovi.org> and click Download
3.2. Install as we would any other software. If we are unable to install the software, we can contact the SLU IT Helpdesk by phone at 646-313-8440 or by email at ithelpDesk@slu.cuny.edu. They are responsive, friendly, and very helpful with these kinds of tasks.
Open jamovi
4.1. A fresh session should look like this:
Save our jamovi Project In our Project Folder
5.1. Click on “File” (for mac users, File will be three horizontal lines).
5.2. Click “Save As”.
5.3. Use “Browse” if we need to, to find our Poject Folder.
5.4. Save our project into the Project Folder.
Import our Data To jamovi
6.1. Click on “File” (for mac users, File will be three horizontal lines, as in the photos above).
6.2. Click on “Import”, find our Project File, then select our dataset (see image below for reference). In this case, the dataset is called example_dataset.csv
.
Figure 1: Saving and importing data. Don’t forget to save often while we are working!
Install the scatr module
7.1. Make sure we are on the “Analyses” ribbon (as in the photo above).
7.2. Click on “Modules” (see photo above), then “jamovi library”
7.3. A pop-up window should come up. Scroll until we find “scatr”. Click “INSTALL”.
7.4. When we are finished, the scatr module should be visible when we click the Exploration button.
Figure 2: This shows where modules will be jamovi after we install them.
Before we start our analyses, we want to inspect our data and create a list of tasks that need to be finished before we can analyze our data. There is a potentially endless list of tasks that could arise, but this guide will cover about 70% of the tasks required by the average social science project:
Cleaning messy data
Recoding variables
Filtering rows
Computing new variables from old ones
We will want something to write or type in. Preferably, it will be something digital - like a text file, Word doc, Excel spreadsheet, etc. - that we can save in our Project Folder.
If we do not already have a codebook, we will want to create one. At the start of our project, we will fill out the codebook to reflect our expectations - for example, what “should” the minimum value be based on what we know about this variable? If expect that the minimum should be 0, but we find negative numbers, that would indicate a problem.
Also, as we clean our data, we may need to record certain things into our codebook. For example, any values that we clean or recode need to be recorded in the Recode column, and any new variables we create will need to be added to codebook.
Below is an example codebook - have this open as you work through the data cleaning process:
Figure 3: An example codebook.
One common to-do is to clean messy data. But how do we know if data are messy? And what does “messy” even mean?
“Messy” means that the values do not make sense for that variable in some way. Examples include:
Impossible or nonsensical values, like someone entering “cat” as their income.
Values that have weird characters in them, like someone entering “21k” as their income instead of 21000.
Here are three methods we can use in jamovi to find messy values - for example, the values “21k” and “$45,998”:
Method 1: Use The Spreadsheet View To Inspect Values.
1.1. This is the most straightforward way to find messy data, but also the most tedious way when our data has a lot of rows.
1.2. Just scroll through the rows of the variable and look for values that don’t belong.
Method 2: View the Levels of Each Variable.
2.1. Click on the variable name.
2.2. Click the “Data” ribbon, followed by “Setup”.
2.3. Look at where it says “Levels”. That list has every unique value in that variable, including “$45,998” and “21k”. They are “messy” because they should be in a numeric format with no non-numeric characters (like “$”, “,”, or “k”).
Method 3: Create Frequency Tables and Inspect the Values.
3.1. Click the “Analysis” ribbon, followed by the “Exploration” button, followed by “Descriptives”.
3.2. Move the variable to the “Variables” box by double-clicking, clicking-and-dragging, or pressing the right-facing arrow.
3.3. Check the “Frequency tables” box. The Frequency table will show up in the Results Pane.
3.4. One strength of this method is that we can see how many times messy values show up in the variable. The more times a messy value shows up, the more times we will have to repeat ourselves when cleaning it. This might cause we to prefer using e.g., a transformation to clean the messy values over editing individual cells; both are covered later.
Regardless of method: record each instance of messy data that the to-do list.
4.1. Make a note about what needs to done about it. For example: “change ‘$45,998’ to ‘45990’ and ‘21k’ to ‘21000’.
Figure 4: Three ways of finding messing data: scrolling through the spreadsheet pane, checking the variable levels, or looking at a frequency table.
Rename variables to shorter names.
Determine whether we have to filter certain cases.
Recode variables.
For instance, if a variable has the values “strongly agree”, “somewhat agree”, and so on, but we want that variable expressed numerically as “strongly agree” = 5, “somewhat agree” = 4, etc.
Another example could be that we need to reverse-score a variable.
Set my levels of measurement.
Computing new variables from old ones.
Any other task we need to do that we feel should be recorded in the to-do list.
Shorter variable names are much less of a headache than long ones (you’ll see why later). For a lot of surveys, the default names will be the entire wording of question, or at least this is the case for our sample survey. Here is how we would rename the variables:
Open the Codebook.
Give each variable a short name:
2.1. Click the top of the column.
2.2. Then click the “Data” ribbon.
2.3. Then click “Setup”.
2.4. Under where it says, “DATA VARIABLE” (highlighted in the image below), we can type in the new, shorter name.
2.5. The new name updates when we click anywhere or type the enter
(or return
) key.
Record the new names in the codebook.
Figure 5: How to rename our variables. Shorter names are easier to work with.
Say that we inspected the income
variable and have in our to-do list that we need to change “$45,998” to “45990” and “21k” to “21000.” Because these are non-repeating kinds of messiness, we will have to fix them directly using the Spreadsheet Pane. Follow these steps:
Create a copy of our variable
1.1. Name it something like “[variable name]_recoded” to distinguish the new version of the variable the old one.
1.2. The purpose of the copy is that we are keeping a built-in record of what fixes were made within jamovi.
Replace the values of individual cells in the Spreadsheet Pane.
2.1. We can change values just like in an Excel spreadsheet.
2.2. Note that sometimes there is no “correct” value, like if someone entered “cat” as their income. In those cases, we can just delete the value. We will have a record in the old copy of the variable.
Figure 6: Fixing messy values using the Spreadsheet Pane.
Editing using the Spreadsheet Pane works fine in cases like the income
example, because each value that we need to change is unique. Like it or not, all we can do in this situation is change each individual value.
But what if we have a bunch of values that need to be changed in a systematic way, or “recoded”? We could recode manually, but there is an easier way - the transformation button.
In this example, we need to reverse-score the SE_insecure
variable such that any time there is a 5, it needs to change to a 1, and any time there is a 4, it needs to change to 2, and so on. Here is how to do a transformation:
Click on the Data ribbon, followed by the Transform button.
1.1. This will automatically create a copy of the variable.
Give the new variable a name (e.g., oldname_recoded
).
Click on the “using transform” drop-down menu, then click “Create New Transform”.
Click “Add recode condition”. A new line starting with \(f_x\) will appear - write the rules of the transformation here, adding new recode conditions as you go.
4.1. Basically, this new variable will update its values based on the rules, or “recode conditions” we give.
4.2. The rules are defined by logical operators. In this case, we want the recode condition to read as “if the value of the old variable is equal to 5, use 1 as the value of the new variable”. $source
means “value of the old variable”, and ==
means “equal to”. Therefore, we type if $source == 5 use 1
as the recode condition. The next one will be if $source == 4 use 2
, and so on until we have covered all possible values of the old variable. See List of Logical Operators and Explanation of Logical Operators for a deeper explanation of what is happening here.
Figure 7: Example transformation: reverse-scoring a variable.
Sometimes we don’t need to actually change any values, but we do need to exclude e.g., certain respondents values from our analyses. For example, if we are studying adults, we will want to exclude respondents who entered the “17 or younger” option on the survey. Here is how we would do this filter for the age
variable:
Click on “Filter” under the Data ribbon.
Use logical operators in the formula line (look for \(f_x\)) to tell jamovi what the rule will be for keeping rows in our data.
2.1. In this case, we want to tell jamovi to keep all rows except for those that have “17 or younger” as their age value. In other words, we want the rule to read as “age not equal to ‘17 or younger’”, which can be achieved with the !=
logical operator. For more in-depth discussion of this idea, see List of Logical Operators and Explanation of Logical Operators.
Figure 8: Filter data does not delete or remove any data. It only excludes those data from our analyses.
If you have to clean, recode, or filter data, do those before you set the levels of measurement. This is very important, because sometimes jamovi will delete messy values when the level of measurement for that variable changes.
Once we are sure we can safely change the levels of measurement, the actual act of doing it in jamovi is very simple:
Click the variable we want to change the level of measurement for.
Click the Data ribbon, followed by “Setup”.
Select the level of measurement we want from the “Measure type” drop-down list.
Figure 9: Changing levels of measurement.
Imagine we need to create a variable that is the composite of several other variables. In this example, we measured Self Efficacy using three items called SE_job
, SE_failure_recoded
, and SE_insecure_reversed
. To get each respondent’s Self Efficacy score, we need to get the mean of those three variables for each respondent. For example: respondent 1 has a 4 for SE_job
, a 3 for SE_failure_recoded
, and a 4 for SE_insecure_reversed
. This respondent’s Self Efficacy score should be \(\frac{(4 + 3 + 4)}{3} = 3.67\).
We don’t want to repeat that manually for each person, so jamovi does this for us using the Compute button:
Click on the Data ribbon, followed by the “Compute” button.
Give this new computed variable a name.
Type the formula into the formula box. There are two ways to do this:
3.1. Type the formula manually.
3.2. Use the formula dropbox (look for \(f_x\)) to find the formula we need, double-click the formula, then double-click each variable that we want to include in the computation. Note that we have to add commas in between each variable (we don’t need spaces, though they make the formula easier to read).
Click on the Analyses ribbon, then the Exploration button, then click Descriptives.
Move the variable into the Variables box, either by dragging and dropping, clicking it once then clicking the right-arrow, or double-clicking it.
2.1. We will automatically get a descriptives table. For continuous variables, we will also get the mean, minimum, and maximum. Note that now is a good time to check the mean, minimum, and maximum against our codebook.
Check the box next to “Frequency tables.
3.1. For non-continuous variables, this will generate a one-way frequency table.
If we want to group by another variable:
4.1. Say we want tables showing the mean age for women separate from the mean age for men. To do this, add a second Descriptives analysis by again clicking the Exploration button followed by Descriptives.
4.2. Move age
into the “Variables” box again, but this time move gender
into the “Split by” box.
4.3. We get a new Descriptives table, this time with separate statistics for the men and women groups.
4.4. Note that we got a new table, rather than replacing the table. This is because we clicked through the Exploration button again.
For this example, we created a nominal group called income_groups
, which splits the income
variable into two groups: high income and low income. Typically, we want to avoid this if we can, since splitting up continuous variables causes we to lose information about the variable (all those respondents of differing income are given the uniform label “low income”, which is misleading). This example is just for the sake of showing how to get the table.
Click on the Analyses ribbon, then on the Frequencies button.
Under “Contingency Tables”, click “Independent Samples” or “Paired Samples”, depending on the kind of data we have.
Move one of the variables into the Rows box, and the other into the Columns box.
Under the Cells tab, we can check the box next to the kinds of percentages we want (row, column, or total).
If we want a three-way table, we can move a third variable into the Layers box.
For most plots, we can use jamovi’s built-in Plots function.
Click on the Analyses ribbon, then Exploration button, then on Descriptives.
Move the variable into the Variables box.
Under “Plots”, we can choose from a variety of plots.
If we want to separate our plots by the groups of some other variable, we can move that variable into the “Split by” box.
For scatterplots, we can use the scatr module.
Click the Analyses ribbon, then Exploration, then “Scatterplot”.
Move our Independent Variable (“Predictor”) to the X-Axis, and Dependent Variable (“Outcome”) to the Y-Axis.
2.1. we can choose a Regression (“trend”) line as well as whether we want Density plots in the margins of the plot.
2.2. If we want separate lines for groups from a nominal variable, move the nominal variable to the “Group” box.
Right-click the part of our results we want to copy.
1.1. Notice that we can select the Image/Table (depending on whether we are right-clicking an plot or table), Group, Analysis, or All to copy. Image/Table selects that specific plot or table, Group selects the set of plot or tables that go together, Analysis selects any plot or table that was created as part of that analysis (i.5., everything we made since we clicked on Descriptives), and All selects everything in the Output Pane.
Paste into our Word document.
2.1. our tables will paste as pre-formatted tables, and our plots will be pasted as images.
Operator | Description |
---|---|
< | less than |
> | greater than |
<= | less than or equal to |
>= | greater than or equal to |
== | equal to |
!= | not equal to |
!x | not x |
x | y | x OR y |
x & y | x AND y |
Note: "x" and "y" are just stand-ins for variables or logical operators of their own. For example, one use of "x | y" could be something like this: "age == 18 | age > 75", which would reads as "age equal to 18 OR age greater than 75". Using this in a filter would leave my data with only those respondents who are exactly 18 years old or greater than 75 years old. |
Sometimes, we need to tell jamovi to follow instructions based on a logical rule. Instructions are always “if this, do that”, and the logical rule is the “this” in “if this, do that”, meaning if a given row in our data follows that rule, jamovi will “do that”, meaning it will do the thing we want it to do.
In the case of filtering, we want jamovi to follow the instructions, “if a given row does not follow this rule, keep that row”. That’s our “if this, do that” rule for filtering. In our example, we want jamovi to keep only those rows where the value for age
is not equal to “17 or younger”. If we’re successful, jamovi will keep only those respondents whose age is anything other than “17 or younger”. The computer version of “not equal to” is !=
. If we typed x != y
, that would read as “x not equal to y”. So in the formula box for Filter, we type age != "17 or younger"
. With that rule in place, jamovi will go through each row in the age
variable and check to see if the value for that row is equal “17 or younger”. If the row is not equal to “17 or younger”, jamovi just moves on to the next row. If it is equal to “17 or younger”, jamovi crosses that row out. Why? Because it does not follow the rule age != "17 or younger"
.
In the case of transforming, we also want a “if this, do that” set of instructions, only in this case we not only tell jamovi what the rule is, but also what to do if a row follows that rule. For example, if we want to reverse-score a variable, we would instruct jamovi to put the opposite value of the old variable in for the new variable. To reverse-score a 5 for old variable, the new variable would need a 1. To do this, we type if old_variable == 5 use 1
. The logical operator ==
means “equal to”, and use
simply tells what jamovi to do if the old_variable
is equal to 5. In other words, if old_variable == 5 use 1
simply means “if the value in the old variable is equal to 5, put a 1 as the value in the new variable”. Note that $source
means the value of the old variable, so we can often use that to refer to the old variable rather than typing the name of the old variable.
jamovi can do most statistical tests. If you’ve made it this far into the guide, you can probably navigate your way through the jamovi interface to find the tests you need.
Do not forget that jamovi has many more modules that you can download which each come with new features not included by default. There are many more tests, plots, and tables that you can generate in jamovi using those other modules. You can find the complete list of those modules here.
jamovi also has its own user guide, which you can find here, as well as a “getting started” series of how-to’s of common statistics tasks (here).