Eligibility Check (Preliminary)
Do Won Kim
2024-02-02
Preliminary Eligibility Check
Brief summary of data
As of Feb 2 (9:30 AM), I downloaded entries in
mercury_user
(=832) and following_result
(=721) tables from the DB.
I first filtered out those participants with the following criteria:
(1) those who failed to follow our study account, stored in
following_result
table; and (2) those without
vsid
field in the mercury_user
table, which
denotes that they didn’t finish the Wave 1 survey to the end. This
initial filtering process resulted in total 705
participants, the very sample which the eligibility checks are applied
to.
Employed eligibility criteria
To be eligible in this analysis,
The account should not be too new. If participants’ Twitter/X accounts were created before Oct 1st 2023, they are eligible; if else, they are not eligible, thus shouyld be filtered out.
save_user_info()
does this job. It retrieves and saves public Twitter/X information of an authorized user. I storeuser_id
and user.fields parameters includingfollowers_count
,following_count
,tweet_count
,listed_count
andlike_count
along with the time stamp.Participants should follow at least one low quality account in our list. For this analysis, I used the current version where there are 316 low quality accounts.
Participants should also follow our study account. There were a few cases where participants requested following in the survey and deleted the request before I accept them. Some participants requested following and I accepted them, but later to be removed from our study account’s list of followers (due to deletion of freindship by them, suspension or deletion of their accounts, etc.). Hence, we have to check before inviting these participants to Wave 2 that they actually keep following us.
relationship_check()
does #2 and #3 above. It checks dyad relationships (=connection_status) between participants and low quality accounts as well as our study account.
The results are stored in eligibility_results.csv
file.
This has three columns:
user_id
: participant’s Twitter idscriteria
:account_created
(whether the account was created before Oct 1st, 2023),following_us
(whether the participant is following our study account), andfollowing_NG
(whether the participant is following the low quality accounts from our NewsGuard list).
eligible
: True or Falsecount
: Forfollowing_NG
criterion, I also counted the number of low quality accounts that the participant is following to get some distribution.
Analysis
Load the eligibility_results.csv
.
library(readr)
df <- read_csv("eligibility_results.csv", col_names = FALSE)
names(df) = c("user_id", "criteria", "eligible", "count")
df |> unique() -> df
df$user_id = as.character(df$user_id)
head(df)
## # A tibble: 6 × 4
## user_id criteria eligible count
## <chr> <chr> <lgl> <chr>
## 1 1718379006532132864 account_created FALSE null
## 2 1692009847821148160 account_created TRUE null
## 3 29118011 account_created TRUE null
## 4 25723848 account_created TRUE null
## 5 339348834 account_created TRUE null
## 6 1703281715417108480 account_created TRUE null
Since the data is long-type, let’s reshape the data into wide type to ease the analysis.
library(tidyverse)
df |>
pivot_wider(id_cols = "user_id", id_expand = TRUE, names_from = "criteria", values_from = "eligible") -> df_wide
head(df_wide, 15)
## # A tibble: 15 × 4
## user_id account_created following_us following_NG
## <chr> <lgl> <lgl> <lgl>
## 1 1.704648478906e+18 TRUE TRUE NA
## 2 1.724529718744e+18 FALSE TRUE NA
## 3 1.743257044143e+18 FALSE TRUE NA
## 4 1.753273996274e+18 FALSE TRUE NA
## 5 1002525322305310720 TRUE TRUE NA
## 6 1015739034046550016 TRUE TRUE NA
## 7 1028796257706430464 TRUE TRUE TRUE
## 8 1036959386130935808 TRUE TRUE NA
## 9 10371592 TRUE TRUE TRUE
## 10 1045529565999726592 TRUE TRUE NA
## 11 1050832125660782592 TRUE TRUE NA
## 12 1055172856638713856 TRUE TRUE NA
## 13 1055424530 TRUE TRUE NA
## 14 1056130845579202560 TRUE TRUE NA
## 15 1059239413 TRUE TRUE NA
Starting N = 705
Accounts created before Oct 23, 2023: n=540 (=76.6%)
Among these n=540 sample, those that follow our study account: 530 (=98% out of 540; 75.2% out of starting N)
Among these n=399 sample, those that follow low quality account: 97 (=18.3% out of 399; 13.76% out of starting N)
df_wide |>
filter(account_created == "TRUE") |>
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 540
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 530
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE" & following_NG == "TRUE") |>
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 97
So, if we trust this estimate (=13.76% out of those finished Wave 1), to invite 375 participants to Wave 2, we need:
13.76% X (participants who passed other criteria (thus finished Wave
1 survey without being dropped out, as well as
account_created
, following_us
criteria) =
375
We need 2725 (finished Wave 1) to have 375 invited to Wave 2.