Eligibility Check
Do Won Kim
2024-02-11
1 Initial step
I compared Verasight’s raw data with our DB data to filter out those (1) who failed to finish the survey to the end, and (2) who took the survey several times with multiple Verasight accounts but authorized the same Twitter account each time. Please refer the this notebook for this initial step: https://do-won.github.io/design/240208.html
This process led to identifying a total of 674 users who successfully passed attention checks, authorized their Twitter account, followed our study account at the time of the survey, and fully completed the survey.
Based on this list of 674 users, I will ran eligibility checks.
2 Employed eligibility criteria
To be eligible for Wave 2,
[1] The account should not be too new. If participants’ Twitter/X accounts were created before Nov 1st 2023, they are eligible; if else, they are not eligible, thus should be filtered out.
account_created
(whether the account was created before Nov 1st, 2023)
[2] Participants should follow our study account. There were a few cases where participants requested following in the survey and deleted the request before I accept them. Some participants requested following and I accepted them, but later to be removed from our study account’s list of followers (due to deletion of friendship by them, suspension or deletion of their accounts, etc.). Hence, we have to check before inviting these participants to Wave 2 that they actually keep following us.
following_us
(whether the participant is following our study account)
The below two criteria (Following vs. Home timeline) will give us different results depending on which list (inventory of low quality accounts) we use. I use four different lists (please click the toggle below to see the description!).
2.1 Description of Lists
Click the arrow to open the toggle
List 1. NewsGuard list of 438 accounts
- Retrieved most recent NewsGuard list (from Jan 30)
- Filter applied:
- Rating == N (Score < 60)
- Country == US
- Language == en
- Type == TWITTER (+a)
- Active accounts
- More than 0 followers
Note on `Type == TWITTER (+a)`: The absence of a Twitter account in NG list(= Type!=TWITTER) doesn't necessarily imply the source isn't on Twitter. Therefore, with help of Brendan's RAs, I manually reviewed those cases where there were domains listed but no corresponding Twitter accounts listed. So I added these cases with no Twitter accounts listed but actually have Twitter accounts(=`+a`).
This process resulted in a total of 438 accounts.
Lists 2-4. NewsGuard list (438 accounts) merged with FIB superspreader list
- Retrieved Twitter Top Fibers data (shared by Matt and Fil) from April 2023.
- Based on FIB index (https://osome.iu.edu/tools/topfibers/about), applied three different thresholds:
- `FIB index >= 10` (which means that the user has shared at least 10 posts linking to low-credibility sources, each of which has been reshared at least 10 times.)
- `FIB index >= 20`
- `FIB index >= 30`
Then I merged each with NewsGuard list to make a superset. Of course, FIB list is from April 2023 so it might contain inactive accounts. Hence, I filtered only active accounts in.
Finally, these are the four lists that I use in the eligibility analysis:
- [List 1] NewsGuard list of 438 accounts
- [List 2] NewsGuard + FIB index >=30 (N=667)
- [List 3] NewsGuard + FIB index >=20 (N=870)
- [List 4] NewsGuard + FIB index >=10 (N=1515)
You can access each list from this (link) (List1=NG_list.csv, List2=merged_30.csv, List3= merged_20.csv, List4= merged_10.csv)
[3] (Following) Participants should follow at least one low quality account in our list.
following_NG
(whether the participant is following at least one of the low quality accounts from theList 1
(NewsGuard only)following_NG_30
(whether the participant is following at least one of the low quality accounts from theList 2
(NewsGuard + FIB 30)following_NG_20
(whether the participant is following at least one of the low quality accounts from theList 3
(NewsGuard + FIB 20)following_NG_10
(whether the participant is following at least one of the low quality accounts from theList 4
(NewsGuard + FIB 10)
[4] (Home timeline) Participants should follow at least one low quality account in our list.
hometimeline
(whether participant’s home timeline has at least one low quality accounts from theList 1
(NewsGuard only)hometimeline_30
(whether participant’s home timeline has at least one low quality accounts from theList 2
(NewsGuard + FIB30)hometimeline_20
(whether participant’s home timeline has at least one low quality accounts from theList 3
(NewsGuard + FIB20)hometimeline_10
(whether participant’s home timeline has at least one low quality accounts from theList 4
(NewsGuard + FIB10)
3 Summary of the data
The eligibility check results are stored in
new_eligibility_results.csv
file.
Four columns:
user_id
: participant’s Twitter idscriteria
:account_created
(whether the account was created before Nov 1st, 2023),following_us
(whether the participant is following our study account),already_muted
(whether the participant has not already muted over 30% of the NG list) ➔ this criteria seems to be useless (basically everyone passed this criteria) so I won’t include it in analysis.following_NG
/following_NG_30
/following_NG_20
/following_NG_10
(whether the participant is following the low quality accounts from different lists).hometimeline
/hometimeline_30
/hometimeline_20
/hometimeline_10
(whether the participant’s home timeline has the low quality accounts’ tweets from different lists).
eligible
: True or Falsecount
: Forfollowing
andhometimeline
criterion, I also counted the number of low quality accounts (or tweets from these accounts).
Let’s load the new_eligibility_results.csv
.
library(readr)
library(tidyverse)
library(DT)
library(caret)
df <- read_csv("new_eligibility_results.csv",
col_types = cols(id = col_skip(), user_id = col_character()))
df |> datatable()
Since the data is long-type, let’s reshape the data into wide type to ease the analysis.
4 Result
4.1 Eligibility rate
Starting N = 674
Accounts created before Nov 1 2023: n=570 (=84.6%)
(Among these accounts created before Nov 1 2023) Those that follow our study account: 565 (=99% out of 570; 97.3% out of starting N)
Below shows comparison between Following
vs. Home timeline
criteria:
(Among these accounts are not too new & follow our study account) Those that follow any low quality accounts from …
List 1
: 131 (=23.2% out of 565; 19.4% out of starting N)List 2
: 222 (=39.3% out of 565; 32.9% out of starting N)List 3
: 253 (=44.78% out of 565; 37.5% out of starting N)List 4
: 269 (=47.6% out of 565; 39.9% out of starting N)
Having FIB list merged seems to increase eligibility rate a lot!1
(Among these accounts are not too new & follow our study account) Those that has in their home timeline tweets from any low quality accounts from …
List 1
: 209 (=37% out of 565; 31% out of starting N)List 2
: 291 (=51.5% out of 565; 43.2% out of starting N)List 3
: 315 (=55.8% out of 565; 46.7% out of starting N)List 4
: 338 (=59.8% out of 565; 50.1% out of starting N)
Home timeline criterion seems to be a better option; it increases eligibility rate even with List 1 (NewsGuard list only) since it includes indirect tweets (= retweets/quote tweets from participants’ friends that contains low quality accounts).
# Accounts created before Nov 23, 2023
df_wide |>
filter(account_created == "TRUE") |>
count()
# + Those that follow our study account
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
count()
# Those that follow any low quality accounts from [List 1]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(following_NG == "TRUE") |>
count()
# Those that follow any low quality accounts from [List 2]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(following_NG_30 == "TRUE") |>
count()
# Those that follow any low quality accounts from [List 3]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(following_NG_20 == "TRUE") |>
count()
# Those that follow any low quality accounts from [List 4]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(following_NG_10 == "TRUE") |>
count()
# Those with home timeline with any low quality tweets from [List 1]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(hometimeline == "TRUE") |>
count()
# Those with home timeline with any low quality tweets from [List 2]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(hometimeline_30 == "TRUE") |>
count()
# Those with home timeline with any low quality tweets from [List 3]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(hometimeline_20 == "TRUE") |>
count()
# Those with home timeline with any low quality tweets from [List 2]
df_wide |>
filter(account_created == "TRUE" & following_us == "TRUE") |>
filter(hometimeline_10 == "TRUE") |>
count()
4.2 Confusion Matrices
Based on those 565 users who passed the #1 and #2 criteria (among these accounts are not too new & follow our study account), let’s make some confusion matrices.
We assume that Following
is the ground truth.
df_wide |>
filter(account_created == TRUE & following_us == TRUE) |>
select(user_id, following_NG:following_NG_10,
hometimeline, hometimeline_30, hometimeline_20, hometimeline_10) |>
mutate(following_NG = ifelse(is.na(following_NG), "FALSE", "TRUE"),
following_NG_30 = ifelse(is.na(following_NG_30), "FALSE", "TRUE"),
following_NG_20 = ifelse(is.na(following_NG_20), "FALSE", "TRUE"),
following_NG_10 = ifelse(is.na(following_NG_10), "FALSE", "TRUE"),
) -> df_subset
df_subset |> mutate(
EG_following_list1 = as.factor(ifelse(following_NG == TRUE, 1, 0)),
EG_following_list2 = as.factor(ifelse(following_NG_30 == TRUE, 1, 0)),
EG_following_list3 = as.factor(ifelse(following_NG_20 == TRUE, 1, 0)),
EG_following_list4 = as.factor(ifelse(following_NG_10 == TRUE, 1, 0)),
EG_hometimeline_list1 = as.factor(ifelse(hometimeline == TRUE, 1, 0)),
EG_hometimeline_list2 = as.factor(ifelse(hometimeline_30 == TRUE, 1, 0)),
EG_hometimeline_list3 = as.factor(ifelse(hometimeline_20 == TRUE, 1, 0)),
EG_hometimeline_list4 = as.factor(ifelse(hometimeline_10 == TRUE, 1, 0))
) -> df_conf
Following (ground truth) | Following (ground truth) | ||
Positive (TRUE) | Negative (FALSE) | ||
Home Timeline | Positive (TRUE) | TP (118) | FP (91) |
Home Timeline | Negative (FALSE) | FN (13) | TN (343) |
# prediction: hometimeline, reference: following
confusionMatrix(df_conf$EG_hometimeline_list1, df_conf$EG_following_list1, positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 343 13
## 1 91 118
##
## Accuracy : 0.8159
## 95% CI : (0.7815, 0.847)
## No Information Rate : 0.7681
## P-Value [Acc > NIR] : 0.003451
##
## Kappa : 0.5722
##
## Mcnemar's Test P-Value : 4.337e-14
##
## Sensitivity : 0.9008
## Specificity : 0.7903
## Pos Pred Value : 0.5646
## Neg Pred Value : 0.9635
## Prevalence : 0.2319
## Detection Rate : 0.2088
## Detection Prevalence : 0.3699
## Balanced Accuracy : 0.8455
##
## 'Positive' Class : 1
##
Following (ground truth) | Following (ground truth) | ||
Positive (TRUE) | Negative (FALSE) | ||
Home Timeline | Positive (TRUE) | TP (204) | FP (87) |
Home Timeline | Negative (FALSE) | FN (18) | TN (256) |
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 256 18
## 1 87 204
##
## Accuracy : 0.8142
## 95% CI : (0.7796, 0.8454)
## No Information Rate : 0.6071
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6307
##
## Mcnemar's Test P-Value : 3.22e-11
##
## Sensitivity : 0.9189
## Specificity : 0.7464
## Pos Pred Value : 0.7010
## Neg Pred Value : 0.9343
## Prevalence : 0.3929
## Detection Rate : 0.3611
## Detection Prevalence : 0.5150
## Balanced Accuracy : 0.8326
##
## 'Positive' Class : 1
##
Following (ground truth) | Following (ground truth) | ||
Positive (TRUE) | Negative (FALSE) | ||
Home Timeline | Positive (TRUE) | TP (228) | FP (87) |
Home Timeline | Negative (FALSE) | FN (25) | TN (225) |
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 225 25
## 1 87 228
##
## Accuracy : 0.8018
## 95% CI : (0.7665, 0.8339)
## No Information Rate : 0.5522
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6082
##
## Mcnemar's Test P-Value : 8.216e-09
##
## Sensitivity : 0.9012
## Specificity : 0.7212
## Pos Pred Value : 0.7238
## Neg Pred Value : 0.9000
## Prevalence : 0.4478
## Detection Rate : 0.4035
## Detection Prevalence : 0.5575
## Balanced Accuracy : 0.8112
##
## 'Positive' Class : 1
##
Following (ground truth) | Following (ground truth) | ||
Positive (TRUE) | Negative (FALSE) | ||
Home Timeline | Positive (TRUE) | TP (249) | FP (89) |
Home Timeline | Negative (FALSE) | FN (20) | TN (207) |
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 207 20
## 1 89 249
##
## Accuracy : 0.8071
## 95% CI : (0.7721, 0.8388)
## No Information Rate : 0.5239
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6177
##
## Mcnemar's Test P-Value : 7.356e-11
##
## Sensitivity : 0.9257
## Specificity : 0.6993
## Pos Pred Value : 0.7367
## Neg Pred Value : 0.9119
## Prevalence : 0.4761
## Detection Rate : 0.4407
## Detection Prevalence : 0.5982
## Balanced Accuracy : 0.8125
##
## 'Positive' Class : 1
##
4.3 ROC
True Positive Rate (TPR)
- Also known as sensitivity, recall, or hit rate, it measures the proportion of actual positives that are correctly identified. It is calculated as follows: TP / (TP + FN)
False Positive Rate (FPR)
- It is the proportion of actual negatives that are incorrectly identified as positives. It is calculated as: FP / (FP + TN)
data <- list(
"List 1" = list(FP = 91, TP = 118, FN = 13, TN = 343),
"List 2" = list(FP = 87, TP = 204, FN = 18, TN = 256),
"List 3" = list(FP = 87, TP = 228, FN = 25, TN = 225),
"List 4" = list(FP = 89, TP = 249, FN = 20, TN = 207)
)
# Calculate the rates
for (name in names(data)) {
data[[name]]$FP_rate <- data[[name]]$FP / (data[[name]]$FP + data[[name]]$TN)
data[[name]]$TP_rate <- data[[name]]$TP / (data[[name]]$TP + data[[name]]$FN)
}
# Prepare vectors for plotting
FP_rates <- sapply(data, function(x) x$FP_rate)
TP_rates <- sapply(data, function(x) x$TP_rate)
labels <- names(data)
ROC_data <- data.frame(
List = rep(labels, each = 1),
FP_rate = round(FP_rates,3),
TP_rate = round(TP_rates,3)
)
print(ROC_data)
## List FP_rate TP_rate
## List 1 List 1 0.210 0.901
## List 2 List 2 0.254 0.919
## List 3 List 3 0.279 0.901
## List 4 List 4 0.301 0.926
library(ggplot2)
library(ggrepel)
ggplot(ROC_data, aes(x = FP_rate, y = TP_rate, label = List, col=List)) +
geom_point() +
geom_text_repel(
aes(label = List),
box.padding = unit(0.35, "lines"), # Adjust the padding within the bounding box
point.padding = unit(0.5, "lines"), # Adjust the space between the point and text
segment.color = 'grey50', # Color of the line connecting text and point
direction = 'y', # Spread out labels vertically
hjust = 0.5, # Center text horizontally
vjust = 0.5 # Center text vertically
) +
xlab("False Positive Rate") +
ylab("True Positive Rate") +
ggtitle("ROC Plot") +
labs(caption="Following as the reference/ground truth; Hometimeline used for prediction") +
theme_bw() +
xlim(c(0,1.0)) + ylim(c(0,1.0)) +
geom_vline(xintercept=0.5, linetype="dotted") + geom_hline(yintercept=0.5, linetype="dotted") +
theme(legend.position = "none")
5 Next steps?
We should decide (1) which list (List1
~ List4
) to use for eligibility check and also for muting,
and (2) which eligibility criteria
(Following
vs. Hometimeline
or both) to
adopt.
- Note that FIB lists are somewhat arbitrarily chosen (we don’t know what threshold is good enough; FIB index >= 10? 20? 30?). And among these three different lists, even we choose the one with the most conservative threshold (=FIB index >= 30), the list includes Trump and Musk.
Also, we should also decide (3) on the WTA option (scale vs. open-ended). Current Wave 2 survey file has the scale version. Please refer to this note for more information on WTA distribution (this note also has information on recruitment mode, so it would help us decide (4) whether we should do further experiments in recruiting) : https://do-won.github.io/design/verasight_wave1.html
Lastly, if we are going to move on to Wave 2 without spending more time for recruitment, we should decide what to do with (5) muting design (100% vs. 70%) and (6) Wave 3 exposure questions. If we start Wave 2, we cannot stop or interrupt the deployment instance so every codes related to Wave 2 ~ 3 should be all set.
6 Additional | Is it all Musk?
library(readr)
new_connection_status <- read_csv("new_connection_status.csv",
col_types = cols(...1 = col_skip(),
user_id = col_character(),
target_user_id = col_character()))
new_home_match_data <- read_csv("new_home_match_data.csv",
col_types = cols(user_id = col_character(),
target_user_id = col_character()))
List1 <- read_csv("inventory_lists/NG_list.csv",
col_types = cols_only(target_user_id = col_character(),
twitter_handle = col_guess(),
Score = col_guess(),
followers = col_guess()))
List2 <- read_csv("inventory_lists/merged_30.csv",
col_types = cols_only(target_user_id = col_character(),
twitter_handle = col_guess(),
fib_index = col_guess(),
total_reshares = col_guess(),
Score = col_guess(),
followers = col_guess()))
List3 <- read_csv("inventory_lists/merged_20.csv",
col_types = cols_only(target_user_id = col_character(),
twitter_handle = col_guess(),
fib_index = col_guess(),
total_reshares = col_guess(),
Score = col_guess(),
followers = col_guess()))
List4 <- read_csv("inventory_lists/merged_10.csv",
col_types = cols_only(target_user_id = col_character(),
twitter_handle = col_guess(),
fib_index = col_guess(),
total_reshares = col_guess(),
Score = col_guess(),
followers = col_guess()))
new_connection_status |>
merge(List4, by="target_user_id", all.x = TRUE) |>
mutate(List4 = ifelse(!is.na(twitter_handle), TRUE, FALSE)) -> following_tb
List3 |> select(target_user_id) |> mutate(list="List3") -> List3_for_merge
List2 |> select(target_user_id) |> mutate(list="List2") -> List2_for_merge
List1 |> select(target_user_id) |> mutate(list="List1") -> List1_for_merge
following_tb |>
merge(List3_for_merge, by="target_user_id", all.x = TRUE) |>
mutate(List3 = ifelse(!is.na(list), TRUE, FALSE)) |>
select(-list) -> following_tb
following_tb |>
merge(List2_for_merge, by="target_user_id", all.x = TRUE) |>
mutate(List2 = ifelse(!is.na(list), TRUE, FALSE)) |>
select(-list) -> following_tb
following_tb |>
merge(List1_for_merge, by="target_user_id", all.x = TRUE) |>
mutate(List1 = ifelse(!is.na(list), TRUE, FALSE)) |>
select(-list) -> following_tb
6.1 Descriptions of the tables
target_user_id
,twitter_handle
: Twitter ID and handle of low quality accountsfib_index
,total_reshares
: FIB index and total reshares count from FIB list. If accounts have this information, they exist in/are from FIB list.Score
: NewsGuard score of accounts. If accounts have this information, they exist in/are from NewsGuard list.followers
: number of followersn
: count/frequency. In case of following, it would be the number of eligible participants who follow each account. In case of home timeline, it would be the number of direct + indirect tweets from these accounts found in participants’ home timeline.You can search and filter on the tables.
6.2 Which accounts from each list were followed by participants?
# List 1
following_tb |>
filter(List1 == TRUE) |>
group_by(target_user_id, twitter_handle,
fib_index, total_reshares, Score, followers) |>
count() |>
datatable(filter = "top", selection = "multiple",
caption = "Accounts from List 1 (NewsGuard only) followed by `n` participants")
following_tb |>
filter(List2 == TRUE) |>
group_by(target_user_id, twitter_handle,
fib_index, total_reshares, Score, followers) |>
count() |>
datatable(filter = "top", selection = "multiple",
caption = "Accounts from List 2 (NewsGuard + Fib >=30) followed by `n` participants")
6.3 Which accounts from each list were found in participants’ home timelines?
hometimeline_tb |>
filter(list=="List 1") |>
select(-list) |>
group_by(target_user_id, twitter_handle, fib_index,
total_reshares, Score, followers) |>
count() |>
datatable(filter = "top", selection = "multiple",
caption = "List 1 (NewsGuard only) found in home timeline")
hometimeline_tb |>
filter(list=="List 2") |>
select(-list) |>
group_by(target_user_id, twitter_handle, fib_index,
total_reshares, Score, followers) |>
count() |>
datatable(filter = "top", selection = "multiple",
caption = "List 2 (NewsGuard + Fib >=30) found in home timeline")
BUT we have to be careful about whether to use this merged list to mute, because FIB list includes accounts such as Donald Trump (FIB index = 83)/ Elon Musk (FIB index = 32).↩︎