Eligibility Check

1 Initial step

I compared Verasight’s raw data with our DB data to filter out those (1) who failed to finish the survey to the end, and (2) who took the survey several times with multiple Verasight accounts but authorized the same Twitter account each time. Please refer the this notebook for this initial step: https://do-won.github.io/design/240208.html

This process led to identifying a total of 674 users who successfully passed attention checks, authorized their Twitter account, followed our study account at the time of the survey, and fully completed the survey.

Based on this list of 674 users, I will ran eligibility checks.

2 Employed eligibility criteria

To be eligible for Wave 2,

[1] The account should not be too new. If participants’ Twitter/X accounts were created before Nov 1st 2023, they are eligible; if else, they are not eligible, thus should be filtered out.

account_created (whether the account was created before Nov 1st, 2023)

[2] Participants should follow our study account. There were a few cases where participants requested following in the survey and deleted the request before I accept them. Some participants requested following and I accepted them, but later to be removed from our study account’s list of followers (due to deletion of friendship by them, suspension or deletion of their accounts, etc.). Hence, we have to check before inviting these participants to Wave 2 that they actually keep following us.

following_us (whether the participant is following our study account)

The below two criteria (Following vs. Home timeline) will give us different results depending on which list (inventory of low quality accounts) we use. I use four different lists (please click the toggle below to see the description!).

2.1 Description of Lists

Click the arrow to open the toggle

List 1. NewsGuard list of 438 accounts

- Retrieved most recent NewsGuard list (from Jan 30)

- Filter applied:

Rating == N (Score < 60)
Country == US
Language == en
Type == TWITTER (+a)
Active accounts
More than 0 followers

Note on `Type == TWITTER (+a)`: The absence of a Twitter account in NG list(= Type!=TWITTER) doesn't necessarily imply the source isn't on Twitter. Therefore, with help of Brendan's RAs, I manually reviewed those cases where there were domains listed but no corresponding Twitter accounts listed. So I added these cases with no Twitter accounts listed but actually have Twitter accounts(=`+a`).

This process resulted in a total of 438 accounts.

Lists 2-4. NewsGuard list (438 accounts) merged with FIB superspreader list

- Retrieved Twitter Top Fibers data (shared by Matt and Fil) from April 2023.

- Based on FIB index (https://osome.iu.edu/tools/topfibers/about), applied three different thresholds:

`FIB index >= 10` (which means that the user has shared at least 10 posts linking to low-credibility sources, each of which has been reshared at least 10 times.)
`FIB index >= 20`
`FIB index >= 30`

Then I merged each with NewsGuard list to make a superset. Of course, FIB list is from April 2023 so it might contain inactive accounts. Hence, I filtered only active accounts in.

Finally, these are the four lists that I use in the eligibility analysis:

[List 1] NewsGuard list of 438 accounts
[List 2] NewsGuard + FIB index >=30 (N=667)
[List 3] NewsGuard + FIB index >=20 (N=870)
[List 4] NewsGuard + FIB index >=10 (N=1515)

You can access each list from this (link) (List1=NG_list.csv, List2=merged_30.csv, List3= merged_20.csv, List4= merged_10.csv)

[3] (Following) Participants should follow at least one low quality account in our list.

following_NG (whether the participant is following at least one of the low quality accounts from the List 1 (NewsGuard only)
following_NG_30 (whether the participant is following at least one of the low quality accounts from the List 2 (NewsGuard + FIB 30)
following_NG_20 (whether the participant is following at least one of the low quality accounts from the List 3 (NewsGuard + FIB 20)
following_NG_10 (whether the participant is following at least one of the low quality accounts from the List 4 (NewsGuard + FIB 10)

[4] (Home timeline) Participants should follow at least one low quality account in our list.

hometimeline (whether participant’s home timeline has at least one low quality accounts from the List 1 (NewsGuard only)
hometimeline_30 (whether participant’s home timeline has at least one low quality accounts from the List 2 (NewsGuard + FIB30)
hometimeline_20 (whether participant’s home timeline has at least one low quality accounts from the List 3 (NewsGuard + FIB20)
hometimeline_10 (whether participant’s home timeline has at least one low quality accounts from the List 4 (NewsGuard + FIB10)

3 Summary of the data

The eligibility check results are stored in new_eligibility_results.csv file.

Four columns:

user_id : participant’s Twitter ids
criteria :
- account_created (whether the account was created before Nov 1st, 2023),
- following_us (whether the participant is following our study account),
- already_muted (whether the participant has not already muted over 30% of the NG list) ➔ this criteria seems to be useless (basically everyone passed this criteria) so I won’t include it in analysis.
- following_NG / following_NG_30 / following_NG_20/ following_NG_10 (whether the participant is following the low quality accounts from different lists).
- hometimeline / hometimeline_30 / hometimeline_20/ hometimeline_10 (whether the participant’s home timeline has the low quality accounts’ tweets from different lists).
eligible : True or False
count : For following and hometimeline criterion, I also counted the number of low quality accounts (or tweets from these accounts).

Let’s load the new_eligibility_results.csv.

library(readr)
library(tidyverse)
library(DT)
library(caret)
df <- read_csv("new_eligibility_results.csv", 
               col_types = cols(id = col_skip(), user_id = col_character()))

df |> datatable()

Since the data is long-type, let’s reshape the data into wide type to ease the analysis.

df |> 
  pivot_wider(id_cols = "user_id", id_expand = TRUE, names_from = "criteria", values_from = "eligible") -> df_wide

datatable(df_wide)

4 Result

4.1 Eligibility rate

Starting N = 674

Accounts created before Nov 1 2023: n=570 (=84.6%)
(Among these accounts created before Nov 1 2023) Those that follow our study account: 565 (=99% out of 570; 97.3% out of starting N)

Below shows comparison between Following vs. Home timeline criteria:

(Among these accounts are not too new & follow our study account) Those that follow any low quality accounts from …
- List 1: 131 (=23.2% out of 565; 19.4% out of starting N)
- List 2: 222 (=39.3% out of 565; 32.9% out of starting N)
- List 3: 253 (=44.78% out of 565; 37.5% out of starting N)
- List 4: 269 (=47.6% out of 565; 39.9% out of starting N)

Having FIB list merged seems to increase eligibility rate a lot!¹

(Among these accounts are not too new & follow our study account) Those that has in their home timeline tweets from any low quality accounts from …
- List 1: 209 (=37% out of 565; 31% out of starting N)
- List 2: 291 (=51.5% out of 565; 43.2% out of starting N)
- List 3: 315 (=55.8% out of 565; 46.7% out of starting N)
- List 4: 338 (=59.8% out of 565; 50.1% out of starting N)

Home timeline criterion seems to be a better option; it increases eligibility rate even with List 1 (NewsGuard list only) since it includes indirect tweets (= retweets/quote tweets from participants’ friends that contains low quality accounts).

# Accounts created before Nov 23, 2023 
df_wide |> 
  filter(account_created == "TRUE") |>
  count() 
# + Those that follow our study account
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  count() 

# Those that follow any low quality accounts from [List 1]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(following_NG == "TRUE") |>
  count() 

# Those that follow any low quality accounts from [List 2]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(following_NG_30 == "TRUE") |>
  count() 

# Those that follow any low quality accounts from [List 3]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(following_NG_20 == "TRUE") |>
  count() 

# Those that follow any low quality accounts from [List 4]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(following_NG_10 == "TRUE") |>
  count() 

# Those with home timeline with any low quality tweets from [List 1]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(hometimeline == "TRUE") |>
  count() 

# Those with home timeline with any low quality tweets from [List 2]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(hometimeline_30 == "TRUE") |>
  count() 

# Those with home timeline with any low quality tweets from [List 3]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(hometimeline_20 == "TRUE") |>
  count() 

# Those with home timeline with any low quality tweets from [List 2]
df_wide |> 
  filter(account_created == "TRUE" & following_us == "TRUE") |>
  filter(hometimeline_10 == "TRUE") |>
  count()

4.2 Confusion Matrices

Based on those 565 users who passed the #1 and #2 criteria (among these accounts are not too new & follow our study account), let’s make some confusion matrices.

We assume that Following is the ground truth.

df_wide |> 
  filter(account_created == TRUE & following_us == TRUE) |>
  select(user_id, following_NG:following_NG_10, 
         hometimeline, hometimeline_30, hometimeline_20, hometimeline_10) |>
  mutate(following_NG = ifelse(is.na(following_NG), "FALSE", "TRUE"),
         following_NG_30 = ifelse(is.na(following_NG_30), "FALSE", "TRUE"),
         following_NG_20 = ifelse(is.na(following_NG_20), "FALSE", "TRUE"),
         following_NG_10 = ifelse(is.na(following_NG_10), "FALSE", "TRUE"),
         ) -> df_subset

df_subset |> mutate(
  EG_following_list1 = as.factor(ifelse(following_NG == TRUE, 1, 0)),
  EG_following_list2 = as.factor(ifelse(following_NG_30 == TRUE, 1, 0)),
  EG_following_list3 = as.factor(ifelse(following_NG_20 == TRUE, 1, 0)),
  EG_following_list4 = as.factor(ifelse(following_NG_10 == TRUE, 1, 0)),
  EG_hometimeline_list1 = as.factor(ifelse(hometimeline == TRUE, 1, 0)),
  EG_hometimeline_list2 = as.factor(ifelse(hometimeline_30 == TRUE, 1, 0)),
  EG_hometimeline_list3 = as.factor(ifelse(hometimeline_20 == TRUE, 1, 0)), 
  EG_hometimeline_list4 = as.factor(ifelse(hometimeline_10 == TRUE, 1, 0))
) -> df_conf

List 1 | NewsGuard list of 438 accounts
		Following (ground truth)	Following (ground truth)
		Positive (TRUE)	Negative (FALSE)
Home Timeline	Positive (TRUE)	TP (118)	FP (91)
Home Timeline	Negative (FALSE)	FN (13)	TN (343)

# prediction: hometimeline, reference: following 
confusionMatrix(df_conf$EG_hometimeline_list1, df_conf$EG_following_list1, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 343  13
##          1  91 118
##                                          
##                Accuracy : 0.8159         
##                  95% CI : (0.7815, 0.847)
##     No Information Rate : 0.7681         
##     P-Value [Acc > NIR] : 0.003451       
##                                          
##                   Kappa : 0.5722         
##                                          
##  Mcnemar's Test P-Value : 4.337e-14      
##                                          
##             Sensitivity : 0.9008         
##             Specificity : 0.7903         
##          Pos Pred Value : 0.5646         
##          Neg Pred Value : 0.9635         
##              Prevalence : 0.2319         
##          Detection Rate : 0.2088         
##    Detection Prevalence : 0.3699         
##       Balanced Accuracy : 0.8455         
##                                          
##        'Positive' Class : 1              
##

List 2 | NewsGuard + FIB index >=30 (N=667)
		Following (ground truth)	Following (ground truth)
		Positive (TRUE)	Negative (FALSE)
Home Timeline	Positive (TRUE)	TP (204)	FP (87)
Home Timeline	Negative (FALSE)	FN (18)	TN (256)

confusionMatrix(df_conf$EG_hometimeline_list2, df_conf$EG_following_list2, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 256  18
##          1  87 204
##                                           
##                Accuracy : 0.8142          
##                  95% CI : (0.7796, 0.8454)
##     No Information Rate : 0.6071          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6307          
##                                           
##  Mcnemar's Test P-Value : 3.22e-11        
##                                           
##             Sensitivity : 0.9189          
##             Specificity : 0.7464          
##          Pos Pred Value : 0.7010          
##          Neg Pred Value : 0.9343          
##              Prevalence : 0.3929          
##          Detection Rate : 0.3611          
##    Detection Prevalence : 0.5150          
##       Balanced Accuracy : 0.8326          
##                                           
##        'Positive' Class : 1               
##

List 3 | NewsGuard + FIB index >=20 (N=870)
		Following (ground truth)	Following (ground truth)
		Positive (TRUE)	Negative (FALSE)
Home Timeline	Positive (TRUE)	TP (228)	FP (87)
Home Timeline	Negative (FALSE)	FN (25)	TN (225)

confusionMatrix(df_conf$EG_hometimeline_list3, df_conf$EG_following_list3, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 225  25
##          1  87 228
##                                           
##                Accuracy : 0.8018          
##                  95% CI : (0.7665, 0.8339)
##     No Information Rate : 0.5522          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6082          
##                                           
##  Mcnemar's Test P-Value : 8.216e-09       
##                                           
##             Sensitivity : 0.9012          
##             Specificity : 0.7212          
##          Pos Pred Value : 0.7238          
##          Neg Pred Value : 0.9000          
##              Prevalence : 0.4478          
##          Detection Rate : 0.4035          
##    Detection Prevalence : 0.5575          
##       Balanced Accuracy : 0.8112          
##                                           
##        'Positive' Class : 1               
##

List 4 | NewsGuard + FIB index >=10 (N=1515)
		Following (ground truth)	Following (ground truth)
		Positive (TRUE)	Negative (FALSE)
Home Timeline	Positive (TRUE)	TP (249)	FP (89)
Home Timeline	Negative (FALSE)	FN (20)	TN (207)

confusionMatrix(df_conf$EG_hometimeline_list4, df_conf$EG_following_list4, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 207  20
##          1  89 249
##                                           
##                Accuracy : 0.8071          
##                  95% CI : (0.7721, 0.8388)
##     No Information Rate : 0.5239          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6177          
##                                           
##  Mcnemar's Test P-Value : 7.356e-11       
##                                           
##             Sensitivity : 0.9257          
##             Specificity : 0.6993          
##          Pos Pred Value : 0.7367          
##          Neg Pred Value : 0.9119          
##              Prevalence : 0.4761          
##          Detection Rate : 0.4407          
##    Detection Prevalence : 0.5982          
##       Balanced Accuracy : 0.8125          
##                                           
##        'Positive' Class : 1               
##

4.3 ROC

True Positive Rate (TPR)

Also known as sensitivity, recall, or hit rate, it measures the proportion of actual positives that are correctly identified. It is calculated as follows: TP / (TP + FN)

False Positive Rate (FPR)

It is the proportion of actual negatives that are incorrectly identified as positives. It is calculated as: FP / (FP + TN)

data <- list(
  "List 1" = list(FP = 91, TP = 118, FN = 13, TN = 343),
  "List 2" = list(FP = 87, TP = 204, FN = 18, TN = 256),
  "List 3" = list(FP = 87, TP = 228, FN = 25, TN = 225),
  "List 4" = list(FP = 89, TP = 249, FN = 20, TN = 207)
)

# Calculate the rates
for (name in names(data)) {
  data[[name]]$FP_rate <- data[[name]]$FP / (data[[name]]$FP + data[[name]]$TN)
  data[[name]]$TP_rate <- data[[name]]$TP / (data[[name]]$TP + data[[name]]$FN)
}

# Prepare vectors for plotting
FP_rates <- sapply(data, function(x) x$FP_rate)
TP_rates <- sapply(data, function(x) x$TP_rate)
labels <- names(data)

ROC_data <- data.frame(
  List = rep(labels, each = 1),
  FP_rate = round(FP_rates,3),
  TP_rate = round(TP_rates,3)
)

print(ROC_data)

##          List FP_rate TP_rate
## List 1 List 1   0.210   0.901
## List 2 List 2   0.254   0.919
## List 3 List 3   0.279   0.901
## List 4 List 4   0.301   0.926

library(ggplot2)
library(ggrepel)

ggplot(ROC_data, aes(x = FP_rate, y = TP_rate, label = List, col=List)) +
  geom_point() +
  geom_text_repel(
    aes(label = List),
    box.padding = unit(0.35, "lines"), # Adjust the padding within the bounding box
    point.padding = unit(0.5, "lines"), # Adjust the space between the point and text
    segment.color = 'grey50', # Color of the line connecting text and point
    direction = 'y', # Spread out labels vertically
    hjust = 0.5, # Center text horizontally
    vjust = 0.5 # Center text vertically
  ) +
  xlab("False Positive Rate") +
  ylab("True Positive Rate") +
  ggtitle("ROC Plot") +
  labs(caption="Following as the reference/ground truth; Hometimeline used for prediction") + 
  theme_bw() + 
  xlim(c(0,1.0)) + ylim(c(0,1.0)) + 
  geom_vline(xintercept=0.5, linetype="dotted") + geom_hline(yintercept=0.5, linetype="dotted") +
  theme(legend.position = "none")

5 Next steps?

We should decide (1) which list (List1 ~ List4) to use for eligibility check and also for muting, and (2) which eligibility criteria (Following vs. Hometimeline or both) to adopt.

Note that FIB lists are somewhat arbitrarily chosen (we don’t know what threshold is good enough; FIB index >= 10? 20? 30?). And among these three different lists, even we choose the one with the most conservative threshold (=FIB index >= 30), the list includes Trump and Musk.

Also, we should also decide (3) on the WTA option (scale vs. open-ended). Current Wave 2 survey file has the scale version. Please refer to this note for more information on WTA distribution (this note also has information on recruitment mode, so it would help us decide (4) whether we should do further experiments in recruiting) : https://do-won.github.io/design/verasight_wave1.html

Lastly, if we are going to move on to Wave 2 without spending more time for recruitment, we should decide what to do with (5) muting design (100% vs. 70%) and (6) Wave 3 exposure questions. If we start Wave 2, we cannot stop or interrupt the deployment instance so every codes related to Wave 2 ~ 3 should be all set.

6 Additional | Is it all Musk?

library(readr)
new_connection_status <- read_csv("new_connection_status.csv", 
    col_types = cols(...1 = col_skip(), 
                     user_id = col_character(), 
                     target_user_id = col_character()))

new_home_match_data <- read_csv("new_home_match_data.csv", 
    col_types = cols(user_id = col_character(), 
        target_user_id = col_character()))


List1 <- read_csv("inventory_lists/NG_list.csv", 
    col_types = cols_only(target_user_id = col_character(), 
                          twitter_handle = col_guess(),
                          Score = col_guess(), 
                          followers = col_guess()))

List2 <- read_csv("inventory_lists/merged_30.csv", 
    col_types = cols_only(target_user_id = col_character(),
                          twitter_handle = col_guess(), 
                          fib_index = col_guess(), 
                          total_reshares = col_guess(), 
                          Score = col_guess(), 
                          followers = col_guess()))

List3 <- read_csv("inventory_lists/merged_20.csv", 
    col_types = cols_only(target_user_id = col_character(), 
                          twitter_handle = col_guess(), 
                          fib_index = col_guess(), 
                          total_reshares = col_guess(), 
                          Score = col_guess(), 
                          followers = col_guess()))

List4 <- read_csv("inventory_lists/merged_10.csv", 
    col_types = cols_only(target_user_id = col_character(), 
                          twitter_handle = col_guess(), 
                          fib_index = col_guess(), 
                          total_reshares = col_guess(), 
                          Score = col_guess(), 
                          followers = col_guess()))

new_connection_status |> 
  merge(List4, by="target_user_id", all.x = TRUE) |> 
  mutate(List4 = ifelse(!is.na(twitter_handle), TRUE, FALSE)) -> following_tb

List3 |> select(target_user_id) |> mutate(list="List3") -> List3_for_merge
List2 |> select(target_user_id) |> mutate(list="List2") -> List2_for_merge
List1 |> select(target_user_id) |> mutate(list="List1") -> List1_for_merge

following_tb |> 
  merge(List3_for_merge, by="target_user_id", all.x = TRUE) |>
  mutate(List3 = ifelse(!is.na(list), TRUE, FALSE)) |>
  select(-list) -> following_tb


following_tb |> 
  merge(List2_for_merge, by="target_user_id", all.x = TRUE) |>
  mutate(List2 = ifelse(!is.na(list), TRUE, FALSE)) |>
  select(-list) -> following_tb


following_tb |> 
  merge(List1_for_merge, by="target_user_id", all.x = TRUE) |>
  mutate(List1 = ifelse(!is.na(list), TRUE, FALSE)) |>
  select(-list) -> following_tb

6.1 Descriptions of the tables

target_user_id , twitter_handle : Twitter ID and handle of low quality accounts
fib_index , total_reshares : FIB index and total reshares count from FIB list. If accounts have this information, they exist in/are from FIB list.
Score : NewsGuard score of accounts. If accounts have this information, they exist in/are from NewsGuard list.
followers : number of followers
n : count/frequency. In case of following, it would be the number of eligible participants who follow each account. In case of home timeline, it would be the number of direct + indirect tweets from these accounts found in participants’ home timeline.
You can search and filter on the tables.

6.2 Which accounts from each list were followed by participants?

# List 1 
following_tb |> 
  filter(List1 == TRUE) |>
  group_by(target_user_id, twitter_handle, 
           fib_index, total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "Accounts from List 1 (NewsGuard only) followed by `n` participants")

following_tb |> 
  filter(List2 == TRUE) |>
  group_by(target_user_id, twitter_handle, 
           fib_index, total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "Accounts from List 2 (NewsGuard + Fib >=30) followed by `n` participants")

following_tb |> 
  filter(List3 == TRUE) |>
  group_by(target_user_id, twitter_handle, 
           fib_index, total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "Accounts from List 3 (NewsGuard + Fib >=20) followed by `n` participants")

following_tb |> 
  filter(List4 == TRUE) |>
  group_by(target_user_id, twitter_handle, 
           fib_index, total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "Accounts from List 4 (NewsGuard + Fib >=10) followed by `n` participants")

6.3 Which accounts from each list were found in participants’ home timelines?

new_home_match_data |>
  merge(List4, by="target_user_id") -> hometimeline_tb

hometimeline_tb |>
  filter(list=="List 1") |> 
  select(-list) |>
  group_by(target_user_id, twitter_handle, fib_index, 
           total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "List 1 (NewsGuard only) found in home timeline")

hometimeline_tb |>
  filter(list=="List 2") |> 
  select(-list) |>
  group_by(target_user_id, twitter_handle, fib_index, 
           total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "List 2 (NewsGuard + Fib >=30) found in home timeline")

hometimeline_tb |>
  filter(list=="List 3") |> 
  select(-list) |>
  group_by(target_user_id, twitter_handle, fib_index, 
           total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "List 3 (NewsGuard + Fib >=20) found in home timeline")

hometimeline_tb |>
  filter(list=="List 4") |> 
  select(-list) |>
  group_by(target_user_id, twitter_handle, fib_index, 
           total_reshares, Score, followers) |>
  count() |>
  datatable(filter = "top", selection = "multiple",
            caption = "List 4 (NewsGuard + Fib >=10) found in home timeline")

BUT we have to be careful about whether to use this merged list to mute, because FIB list includes accounts such as Donald Trump (FIB index = 83)/ Elon Musk (FIB index = 32).↩︎