STAT 340 Final Project: LSAT Questions

Introduction

Each year, over 100,000 prospective law students will take the Law School Admission Test or LSAT. Its a rigorous test that measures students’ skill in reading comprehension, logical reasoning, and analytical reasoning. While there are many factors at play in law school admissions, scoring well on the LSAT can boost a student’s chances of admission and can make the difference between going to the school of one’s dreams or not pursuing law at all.

As such, any tips or strategies that may be able to give test-takers an edge when completing the test are hugely valuable. Of course the easiest way to improve one’s skills would be to actually study for the exam, but we are interested to see if there is a strategy to select an answer that is more likely to be correct than a random guess. Looking at past LSAT exams, there are no statistically significant patterns/trends that allow a student to “guess” the answer to a multiple choice question which would give them an advantage over guessing at random. This includes selecting the longest or shortest answer, picking a particular letter answer over the others, or selecting an answer based on the sequence of previously correct answers.

Data

The primary data source is a collection of past official LSAT prep tests from the years 2003 through 2006. Of these, we looked at the four most recent tests from June 2005, October 2005, December 2005, and June 2006. Every LSAT test is comprised of four sections with ~25 5-choice multiple choice questions each. The 4 sections are comprised of passage-based reading comprehension, word-puzzle-based logical reasoning, and situation-based analytical reasoning, as well as an additional section covering one of those three categories.

Limitations:

While the tests we have access to were very valuable to our analysis, there are some limitations. For starters, the tests are relatively old, having been released 15 years ago. While the format of LSAT tests hasn’t changed significantly since then, its possible that the way specific questions and answers are worded have shifted since then. Also, while these questions come from an officially recognized test prep partner for the LSAT (Cambridge LSAT), they are not past administered prep tests, and there may be a difference in how the questions on these prep tests are written and how actual LSAT questions are written. With this in mind, we will treat these past prep questions as stand-ins for actual LSAT questions.

Data Cleaning:

The past LSAT tests were released in PDF form, so a significant amount of cleaning needed to be done in order to compile a workable dataset suitable for analysis. First, we copy-pasted the content of these tests into specially formatted text files (see LSAT_Dec_2005.txt for an example) so that we could later transform then into tabular data. With over 400 questions across 4 tests, and with some pdf formatting issues, this process was quite time-intensive. We then used a custom python script to convert these text files into CSV files with information about each section of the tests, each question of the tests, and each answer A-E to every question (available as cleaner.ipynb).

Methods

Question Length

The first question we looked to answer was: Are you more likely to get an answer correct by selecting either the longest or shortest answer to a question than by guessing completely at random?

In terms of a hypothesis test, we state this as:

\[ Let \ c \ be \ the \ mean \ probability \ of \ getting \ an \ answer \ correct. \\ H_0: P(c|Strategy) = 0.2 = P(c|Guess) \\ H_a: P(c|Strategy) > 0.2 \]

To answer this question and test this hypothesis, we created multiple 95% confidence intervals for the true probability of getting a question correct, given you employ a particular guessing strategy. We consider a guessing strategy to “beat” guessing at random if all the bounds of the CI for that strategy are greater than 0.2. This is because for a 5-answer question, the probability of getting that question correct by guessing at random will be 1/5 = 0.2. In other words, if 0.20 is in the 95% CI, then the p-value for the hypothesis test is greater than 0.05, and we fail to reject the null hypothesis. The 5 guessing strategies we used were:

Guess an answer completely at random (Baseline)
Guess the answer(s) with the most words
Guess the answer(s) with the most characters
Guess the answer(s) with the least words
Guess the answer(s) with the least characters

The reason that our CIs are posed in terms of probability of a correct answer is that there are often multiple answers to a question with the same number of characters / words. For example, if there are two answers with 8 words, and all other answers have less than 8 words, and one of those 2 longest answers is the correct one, then the probability of getting that answer correct using the “Most Words” strategy is 1/2 = 0.5.

Answer Letter

For “Answer Letter” we will answer the question of whether or not you can guess strategically by picking a particular answer letter (A, B, C, D, or E) over the others. To answer this we will run 10,000 monte carlo experiments, simulating a random answer sheet (of 401 questions), and the proportion of times a particular letter is correct. We will then compare that proportion a letter answer is correct to the true proportion using statistical significance.

We chose monte carlo and statistical significance because it effectively allows us to understand if the LSAT favors one particular answer over the others. Numerous randomly generated sample answer sheets should not favor one over the other.

Sequence of Previous Answers

Ever taken a standardized test, taken a step back, and realized you just put several of the same answer letter in a row? That’s the phenomenon we will test for the LSAT. There are many ways we could have tested a possible strategy of picking an answer based on the previous answers. For our experiment we will test the strategy of, “Assuming you’re confident on the previous answer being correct, can picking the same letter again be more advantageous than randomly guessing?” Similar to the last topic, we will be running 1,000 monte carlo experiments to determine the statistical significance of this strategy. Our null and alternative hypothesis is given in the Results section.

Results

Question Length

reading_questions = questions %>% filter(section_kind == "reading")
analysis_questions = questions %>% filter(section_kind == "analysis")
logical_questions = questions %>% filter(section_kind == "logic")

First, we looked at the efficacy of the four guessing strategies across all sections of all tests.

make_length_95_chart(questions, 
                     "95% CIs for P(Correct Guess) among guess strategies.")

The figure depicting the 95% confidence interval shows that we not conclude that there is a significant difference between probability of a correct guess under the 4 experimental guess strategies, and thus we fail to reject our null hypothesis.

While there may not be any guessing strategy that confers an advantage across the whole test, there may be some strategy that is advantageous when completing a particular section of the LSAT. We examine the guess strategies’ efficacy across different test sections below.

make_length_95_chart(reading_questions, 
                     "95% CIs for P(Correct Guess) among guess strategies. [Reading Comprehension]")

make_length_95_chart(logical_questions, 
                     "95% CIs for P(Correct Guess) among guess strategies. [Logical Reasoning]")

make_length_95_chart(analysis_questions, 
                     "95% CIs for P(Correct Guess) among guess strategies. [Analytical Reasoning]")

It can be seen that there are observable differences between the effectiveness of the guess strategies, between different LSAT sections. However, there is no evidence that any of the 4 guessing strategies are better than guessing at random in any LSAT sections.

The one thing we have evidence for is that for the analytical reasoning section, the probability of getting a question correct given a test taker chooses from the shortest answers, either by number of words or number of characters is less than 0.2, meaning it is a worse strategy than guessing at random.

Answer Letter

We will extract the proportion each letter answer is right.

questions %>% 
  select(correct_letter) %>% 
  count(correct_letter) %>% 
  mutate(probability_correct = n / nrow(questions))

 # # A tibble: 5 x 3
 #   correct_letter     n probability_correct
 #   <chr>          <int>               <dbl>
 # 1 A                 80               0.200
 # 2 B                 75               0.187
 # 3 C                 81               0.202
 # 4 D                 88               0.219
 # 5 E                 77               0.192

This shows us that B is the least likely correct answer and D is the most likely correct answer. We will examine these two probabilities.

We will simulate the proportion a singular letter is correct assuming 401 questions and determine if B is significantly less correct and if D is significantly more correct.

# Simulate random proportion a singular letter is correct
simulate_X = function() {
  # Assume 401 questions
  return(sum(rbernoulli(n = 401, p = .2)) / 401)
}

# Compare random proportion to given probability
check_if_X_in_A = function(X, is_greater, value) {
  # greater = 1 means compare by >=, else <=
  if (is_greater == T)
    return(X >= value)
  else 
    return(X <= value)
}

# Run monte carlo to determine p-value of a given probability
mc = function(r, value, is_greater) {
  monte_carlo = data.frame(replicate = 1:r, 
                           X = rep(NA,r), 
                           X_in_A = rep(NA, r)) 
  for(i in 1:r){
    monte_carlo$X[i] = simulate_X()
    monte_carlo$X_in_A[i] = check_if_X_in_A(monte_carlo$X[i], is_greater, value)
  }
  
  monte_carlo = as_tibble(monte_carlo)
  return(mean(monte_carlo$X_in_A))
}

# Answer D
mc(10000, is_greater = T, 0.2194514)

 # [1] 0.1446

# Answer B
mc(10000, is_greater = F, 0.1870324)

 # [1] 0.2343

The p-value of D’s probability is >0.05. We interpret this that D is not significantly more correct than other answers on the LSAT test.
The p-value of B’s probability is >0.05. We interpret this that B is not significantly less correct than the other answers on the LSAT test.

Since the LSAT probabilities for the remaining letters are in between the values for answer B and D, we can assume the rest of the letters are not significantly more right than the others.

Sequence of Previous Answers

Determine how many times a streak occurs. Specifically, the number of times a correct answer’s letter is the same as the question before it. Assumption: We will assume streaks carry over from section to section and test to test.

# df - formatted questions dataframe
calculateStreak = function(df){
  currentRun = 1
  streak_count = 0
  streaks = c()
  for(i in 2:nrow(df)) {
    if(df[i, "correct_letter"] == df[i - 1, "correct_letter"]) {
      currentRun = currentRun + 1
      # Update streak counts
      streak_count = streak_count + 1
      streaks = append(streaks, currentRun)
    }
    if(df[i, "correct_letter"] != df[i - 1, "correct_letter"])
      currentRun = 1
  }
  
  streaks_df = data.frame(streak_len = streaks) %>% 
    count(streak_len)
  return(streak_count)
}

calculateStreak(questions)

 # [1] 66

The questions have streaks 66 times. We will see if this is statistically significant presence of streaks by running monte carlo on a randomly simulated answer sheet. Our null hypothesis is that the streak count of a randomly generated answer sheet is 66 or less. Alternative hypothesis is 66 or greater.

simulate_X = function() {
  # Assume 401 questions
  # We will represent letters as numbers for simplicity.
  answers = floor(runif(401, min=1, max=6))
  df = data.frame("correct_letter" = answers)
  return(calculateStreak(df))
}

# Calculate proportion of randomly generated answer sheets that have a streak less than 66
mc(1000, value = 66, is_greater = F)

 # [1] 0.048

The p-value of our null hypothesis is less than 0.05! This means a streak count of 66 is significantly low. Randomly generated answer sheets typically have MORE streaks. Put another way, if the tests were truly random, you would expect the answer to question \(i\) to be the same as question \(i-1\) 20% of the time. However, this only occurs \(\frac{66}{401} = 16.4\%\) of the time, and the hypothesis test shows that this difference is statistically significant. Therefore, on the LSAT test, if you’re confident you have the previous answer correct, you should NOT pick the same letter answer.

Conclusion

Summary

Again, we did not find many ways to beat guessing on on LSAT questions. This included the “urban myths” of selecting a particular letter for a question, or selecting the longest or smallest answer. For such strategies, we determined this by showing that the difference in probability of getting an answer correct given you apply one of these strategies is no different than guessing. We did this through running hypothesis tests and constructing confidence intervals as shown above. However, by looking at sequences of questions, we determined that one can beat guessing by choosing an answer different to the one before. This was determined by comparing our actual LSAT tests to randomly generated simulated tests.

Limitations:

While the tests we have access to were very valuable to our analysis, there are some limitations. For starters, the tests are relatively old, having been released 15 years ago. While the format of LSAT tests hasn’t changed significantly since then, its possible that the way specific questions and answers are worded have shifted since then. Also, while these questions come from an officially recognized test prep partner for the LSAT (Cambridge LSAT), they are not past administered prep tests, and there may be a difference in how the questions on these prep tests are written and how actual LSAT questions are written. With this in mind, we treated these past prep questions as stand-ins for actual LSAT questions.

Directions for further work.

Other than collecting more and more recent data, there are some other areas we could have explored.

As for determining a possible strategy based off the previous answers, there exist a multitude of untested strategies to examine. For example, determining if the LSAT has particularly more or less streaks of two or streaks of three in its answer sheet.
In would be interesting to see if there are some question/answer similarity metrics, like % of words that a question and answer share, that may correlate with correct answers.

STAT 340 Final Project: LSAT Questions

Ben Kizaric, Matt Satula

4/28/2021

Introduction

Data

Methods

Question Length

Answer Letter

Sequence of Previous Answers

Results

Question Length

Answer Letter

Sequence of Previous Answers

Conclusion