Eligibility Check (Preliminary)

Preliminary Eligibility Check

Brief summary of data

As of Feb 2 (9:30 AM), I downloaded entries in mercury_user (=832) and following_result (=721) tables from the DB.

I first filtered out those participants with the following criteria: (1) those who failed to follow our study account, stored in following_result table; and (2) those without vsid field in the mercury_user table, which denotes that they didn’t finish the Wave 1 survey to the end. This initial filtering process resulted in total 705 participants, the very sample which the eligibility checks are applied to.

Employed eligibility criteria

To be eligible in this analysis,

  1. The account should not be too new. If participants’ Twitter/X accounts were created before Oct 1st 2023, they are eligible; if else, they are not eligible, thus shouyld be filtered out.

    save_user_info() does this job. It retrieves and saves public Twitter/X information of an authorized user. I store user_id and user.fields parameters including followers_count, following_count, tweet_count, listed_count and like_count along with the time stamp.

  2. Participants should follow at least one low quality account in our list. For this analysis, I used the current version where there are 316 low quality accounts.

  3. Participants should also follow our study account. There were a few cases where participants requested following in the survey and deleted the request before I accept them. Some participants requested following and I accepted them, but later to be removed from our study account’s list of followers (due to deletion of freindship by them, suspension or deletion of their accounts, etc.). Hence, we have to check before inviting these participants to Wave 2 that they actually keep following us.

    relationship_check() does #2 and #3 above. It checks dyad relationships (=connection_status) between participants and low quality accounts as well as our study account.

The results are stored in eligibility_results.csv file. This has three columns:

  • user_id : participant’s Twitter ids

  • criteria :

    • account_created (whether the account was created before Oct 1st, 2023),

    • following_us (whether the participant is following our study account), and

    • following_NG (whether the participant is following the low quality accounts from our NewsGuard list).

  • eligible : True or False

  • count : For following_NG criterion, I also counted the number of low quality accounts that the participant is following to get some distribution.

Analysis

Load the eligibility_results.csv.

library(readr)
df <- read_csv("eligibility_results.csv", col_names = FALSE)
names(df) = c("user_id", "criteria", "eligible", "count")
df |> unique() -> df
df$user_id = as.character(df$user_id)
head(df)
## # A tibble: 6 × 4
##   user_id             criteria        eligible count
##   <chr>               <chr>           <lgl>    <chr>
## 1 1718379006532132864 account_created FALSE    null 
## 2 1692009847821148160 account_created TRUE     null 
## 3 29118011            account_created TRUE     null 
## 4 25723848            account_created TRUE     null 
## 5 339348834           account_created TRUE     null 
## 6 1703281715417108480 account_created TRUE     null

Since the data is long-type, let’s reshape the data into wide type to ease the analysis.

library(tidyverse)
df |> 
  pivot_wider(id_cols = "user_id", id_expand = TRUE, names_from = "criteria", values_from = "eligible") -> df_wide

head(df_wide, 15)
## # A tibble: 15 × 4
##    user_id             account_created following_us following_NG
##    <chr>               <lgl>           <lgl>        <lgl>       
##  1 1.704648478906e+18  TRUE            TRUE         NA          
##  2 1.724529718744e+18  FALSE           TRUE         NA          
##  3 1.743257044143e+18  FALSE           TRUE         NA          
##  4 1.753273996274e+18  FALSE           TRUE         NA          
##  5 1002525322305310720 TRUE            TRUE         NA          
##  6 1015739034046550016 TRUE            TRUE         NA          
##  7 1028796257706430464 TRUE            TRUE         TRUE        
##  8 1036959386130935808 TRUE            TRUE         NA          
##  9 10371592            TRUE            TRUE         TRUE        
## 10 1045529565999726592 TRUE            TRUE         NA          
## 11 1050832125660782592 TRUE            TRUE         NA          
## 12 1055172856638713856 TRUE            TRUE         NA          
## 13 1055424530          TRUE            TRUE         NA          
## 14 1056130845579202560 TRUE            TRUE         NA          
## 15 1059239413          TRUE            TRUE         NA

Starting N = 705

  • Accounts created before Oct 23, 2023: n=540 (=76.6%)

  • Among these n=540 sample, those that follow our study account: 530 (=98% out of 540; 75.2% out of starting N)

  • Among these n=399 sample, those that follow low quality account: 97 (=18.3% out of 399; 13.76% out of starting N)

df_wide |> 
  filter(account_created == "TRUE") |>
  count() 
## # A tibble: 1 × 1
##       n
##   <int>
## 1   540
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  count() 
## # A tibble: 1 × 1
##       n
##   <int>
## 1   530
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE" & following_NG == "TRUE") |>
  count() 
## # A tibble: 1 × 1
##       n
##   <int>
## 1    97

So, if we trust this estimate (=13.76% out of those finished Wave 1), to invite 375 participants to Wave 2, we need:

13.76% X (participants who passed other criteria (thus finished Wave 1 survey without being dropped out, as well as account_created, following_us criteria) = 375

We need 2725 (finished Wave 1) to have 375 invited to Wave 2.