Understanding the Fizz-Buzz-Whatever problem with R
September 15, 2018 - 8 minutes
algorithms baseR runtimeIntro
The “Fizz-Buzz” problem is a classic of the computer scientist interview. It is designed to see if an [applicant can actually code](https://blog.codinghorror.com/why-cant-programmers-program/. It is grounded in CS algorithm education and has become increasingly popular since its introduction in 2007. In a recent data science interview, I encountered this problem for the first time.
I struggled to give the “right” answer, flailing away in the Google document I was granted as an “editor.” So this post is my retrospective solution, written with R
in an actual IDE (RStudio), to understand and share the algorithm complexity discussion at the core of this question.
Fizz, Buzz, Whatever
The problem space starts off with an array of integers.
integers <- 1:15
Then a set of rules for each integer is given:
- if evenly divisible by 3, replace with “fizz”.
- if evenly divisible by 5, replace with “buzz”.
And a meta-rule:
- if rules overlap, replace with the concatenation.
So 15 would get replaced by “fizzbuzz”.
With only these three rules, it is tempting to code this with conditionals. This is the “wrong” way but is exactly what I during the interview.
for (i in 1:length(integers)) {
if ( (i %% 3 == 0) & (i %% 5 == 0) ) {
integers[i] <- "fizzbuzz"
}
else if ( (i %% 3 == 0) ) {
integers[i] <- "fizz"
}
else if ( (i %% 5 == 0) ) {
integers[i] <- "buzz"
}
}
integers
## [1] "1" "2" "fizz" "4" "buzz" "fizz"
## [7] "7" "8" "fizz" "buzz" "11" "fizz"
## [13] "13" "14" "fizzbuzz"
But then the interviewer asked me to add a new rule for integer replacement:
- if evenly divisible by 7, replace with “whatever”.
My stressed brain said, “Just keep adding conditionals”. Because that will work right?
integers <- 1:105
for (i in 1:length(integers)) {
if ( (i %% 3 == 0) & (i %% 5 == 0) && (i %% 7 == 0)) {
integers[i] <- "fizzbuzzwhatever"
}
else if ( (i %% 5 == 0) && (i %% 7 == 0)) {
integers[i] <- "buzzwhatever"
}
else if ( (i %% 3 == 0) && (i %% 7 == 0)) {
integers[i] <- "fizzwhatever"
}
else if ( (i %% 3 == 0) && (i %% 7 == 0)) {
integers[i] <- "fizzwhatever"
}
else if ( (i %% 3 == 0) && (i %% 5 == 0)) {
integers[i] <- "fizzbuzz"
}
else if ( (i %% 3 == 0) ) {
integers[i] <- "fizz"
}
else if ( (i %% 5 == 0) ) {
integers[i] <- "buzz"
}
}
integers
## [1] "1" "2" "fizz"
## [4] "4" "buzz" "fizz"
## [7] "7" "8" "fizz"
## [10] "buzz" "11" "fizz"
## [13] "13" "14" "fizzbuzz"
## [16] "16" "17" "fizz"
## [19] "19" "buzz" "fizzwhatever"
## [22] "22" "23" "fizz"
## [25] "buzz" "26" "fizz"
## [28] "28" "29" "fizzbuzz"
## [31] "31" "32" "fizz"
## [34] "34" "buzzwhatever" "fizz"
## [37] "37" "38" "fizz"
## [40] "buzz" "41" "fizzwhatever"
## [43] "43" "44" "fizzbuzz"
## [46] "46" "47" "fizz"
## [49] "49" "buzz" "fizz"
## [52] "52" "53" "fizz"
## [55] "buzz" "56" "fizz"
## [58] "58" "59" "fizzbuzz"
## [61] "61" "62" "fizzwhatever"
## [64] "64" "buzz" "fizz"
## [67] "67" "68" "fizz"
## [70] "buzzwhatever" "71" "fizz"
## [73] "73" "74" "fizzbuzz"
## [76] "76" "77" "fizz"
## [79] "79" "buzz" "fizz"
## [82] "82" "83" "fizzwhatever"
## [85] "buzz" "86" "fizz"
## [88] "88" "89" "fizzbuzz"
## [91] "91" "92" "fizz"
## [94] "94" "buzz" "fizz"
## [97] "97" "98" "fizz"
## [100] "buzz" "101" "fizz"
## [103] "103" "104" "fizzbuzzwhatever"
Wrong. Well, technically you could but its a PIA.
If you have two rules, you need three conditionals. If you have three rules, you need seven conditionals. If you have four rules, you need twenty-five!
The trap of conditionals is that the operations expand factorially, # conditionals = rules! + 1
. In big O notation that is O(n!)
, which is literally the worst O
possible. For 12 rules you would need 479,001,600 conditionals!
This ggplot
compares the common big O values to appreciate the poor performance of O(n!)
.
And that is the “goal” of this exercise, to see if you can find a solution with algorithm efficiency in mind.
In my data science experience the consideration of run time is at the end of product development. First we actually have to discover a solution, to an unknown problem. Only once we have something vialbe, then we can go back to re-factor and optimized code for memory/CPU usage.
Doing it right
To work around this conditional computation nightmare, you could use string concatenation. Starting with empty strings and doing subsequent string joins for each rule, will reduce the time-complexity to O(n)
.
integers <- 1:105
results <- rep("", length.out = length(integers))
fizz_idx <- integers %% 3 == 0
results <- ifelse(fizz_idx, paste0(results, "fizz"), results)
buzz_idx <- integers %% 5 == 0
results <- ifelse(buzz_idx, paste0(results, "buzz"), results)
whatever_idx <- integers %% 7 == 0
results <- ifelse(whatever_idx, paste0(results, "whatever"), results)
integer_idx <- results == ""
results <- ifelse(integer_idx, integers, results)
results
## [1] "1" "2" "fizz"
## [4] "4" "buzz" "fizz"
## [7] "whatever" "8" "fizz"
## [10] "buzz" "11" "fizz"
## [13] "13" "whatever" "fizzbuzz"
## [16] "16" "17" "fizz"
## [19] "19" "buzz" "fizzwhatever"
## [22] "22" "23" "fizz"
## [25] "buzz" "26" "fizz"
## [28] "whatever" "29" "fizzbuzz"
## [31] "31" "32" "fizz"
## [34] "34" "buzzwhatever" "fizz"
## [37] "37" "38" "fizz"
## [40] "buzz" "41" "fizzwhatever"
## [43] "43" "44" "fizzbuzz"
## [46] "46" "47" "fizz"
## [49] "whatever" "buzz" "fizz"
## [52] "52" "53" "fizz"
## [55] "buzz" "whatever" "fizz"
## [58] "58" "59" "fizzbuzz"
## [61] "61" "62" "fizzwhatever"
## [64] "64" "buzz" "fizz"
## [67] "67" "68" "fizz"
## [70] "buzzwhatever" "71" "fizz"
## [73] "73" "74" "fizzbuzz"
## [76] "76" "whatever" "fizz"
## [79] "79" "buzz" "fizz"
## [82] "82" "83" "fizzwhatever"
## [85] "buzz" "86" "fizz"
## [88] "88" "89" "fizzbuzz"
## [91] "whatever" "92" "fizz"
## [94] "94" "buzz" "fizz"
## [97] "97" "whatever" "fizz"
## [100] "buzz" "101" "fizz"
## [103] "103" "104" "fizzbuzzwhatever"
Only 4 ifelse
array updates are needed, instead of the 7 from conditional-land. Now we have to touch the vector # rules + 1
times or O(n)
. Because bigO
is estimated as n
approaches infinity, the constant gets dwarfed by # rules
and is dropped to simplify the formula.
While this code is good in terms of operation time, it does suffer from pattern repetition. And when writing code you don’t want to repeat yourself.
Polishing things up
So, to go all the way here and write some real “good”" code, I’m re-packaging the rules as key-value pairs. The function replace_with_rules()
will handle the string-mashing and iteration, so we can replace the previously repetitious code with a loop.
rules <- c("fizz" = 3, "buzz" = 5, "whatever" = 7)
replace_with_rules <- function(values, rules) {
container <- rep("", length.out = length(values))
for (name in names(rules)) {
idx <- values %% rules[name] == 0
container <- ifelse(idx, paste0(container, name), container)
}
idx <- container == ""
container <- ifelse(idx, values, container)
return(container)
}
replace_with_rules(1:105, rules)
## [1] "1" "2" "fizz"
## [4] "4" "buzz" "fizz"
## [7] "whatever" "8" "fizz"
## [10] "buzz" "11" "fizz"
## [13] "13" "whatever" "fizzbuzz"
## [16] "16" "17" "fizz"
## [19] "19" "buzz" "fizzwhatever"
## [22] "22" "23" "fizz"
## [25] "buzz" "26" "fizz"
## [28] "whatever" "29" "fizzbuzz"
## [31] "31" "32" "fizz"
## [34] "34" "buzzwhatever" "fizz"
## [37] "37" "38" "fizz"
## [40] "buzz" "41" "fizzwhatever"
## [43] "43" "44" "fizzbuzz"
## [46] "46" "47" "fizz"
## [49] "whatever" "buzz" "fizz"
## [52] "52" "53" "fizz"
## [55] "buzz" "whatever" "fizz"
## [58] "58" "59" "fizzbuzz"
## [61] "61" "62" "fizzwhatever"
## [64] "64" "buzz" "fizz"
## [67] "67" "68" "fizz"
## [70] "buzzwhatever" "71" "fizz"
## [73] "73" "74" "fizzbuzz"
## [76] "76" "whatever" "fizz"
## [79] "79" "buzz" "fizz"
## [82] "82" "83" "fizzwhatever"
## [85] "buzz" "86" "fizz"
## [88] "88" "89" "fizzbuzz"
## [91] "whatever" "92" "fizz"
## [94] "94" "buzz" "fizz"
## [97] "97" "whatever" "fizz"
## [100] "buzz" "101" "fizz"
## [103] "103" "104" "fizzbuzzwhatever"
Outro
I learned a lot about classic computer science interview questions from this writing this post. But I am not sure the value of this type of interview. To me writing good code is something much bigger (and harder) than “pseudo-coding”. I can empathize with employees who are trying to quickly evaluate candidates, but I think rehashing design patterns is a good metric of communication and creative thinking skills.
Instead asking about a candidate had to re-factor a piece of code, or share the design details of a personal project would be more useful. These questions shift the focus from the technical details to the conceptual.
At the very least technical coding during job interviews should be done in the candidates editor of choice. Good coders will have no problem showing people their work on the fly, but I have never met a coder who picks Google Docs over VIM or Emacs!