Writeup: Hop Whirlpool - Does steeping at lower temperature improve final hop character?

drew's picture



Gravity Reading (courtesy Jason Mundy)
Second experiment in the can and this time we're looking at everyone's favorite beer ingredient - hops. Ok, every "Murican" Craft Beer Drinker's favorite ingredient. This experiment was suggested by IGOR, Klickitat Jim, who like a great many of us is trying to find better ways to shove more hop aroma into their glass. In this case, we wanted to test the notion - does performing your hop whirlpool at a lower temperature produce a different hop character than steeping at a higher temp?

Executive Summary of Results

What? You don't want to read how things went? You just want the results? Geez.. ok, be that way. The high level results for this round of testing - yes, tasters were able in the trials to detect the correct "different" beer. That's not surprising - but what about preference and noticeable organoleptic differences? Well, that's a bit more confusing. Read on!

The Experiment

Read the initial writeup for the closer details, but at the high level, we asked the IGORs to brew a back of Jim's Whirlpool Pale - which uses a neutral bittering hop followed by doses of Centennial and Cascade. Hey, they're classics! The IGORs bittered the batch and then chilled the beer to 170F. They split the batch in half and started whirlpooling one batch at 170F while chilling the other batch to 120F before whirlpooling the second batch. After each batch whirlpooled for 30 minutes, they finished chilling the beer and prepared for fermentation. So in other words - two portions with a 30 minute whirlpool - one at 170F, one at 120F. Fermented the same and packaged the same. Drop them in front of the tasters and see if they can tell the difference! Easy peasy! Jason's Buckets ready for the fermenting. Pretty slick!

The Experimenters

We had seven IGORs report in their results before the podcast deadline. We'd like to thank Robert Allaway, James Bird, Ryan Casey, Nicki Forster, Jason Mundy, Casey Price and Tristan Smith for taking their brew time to make the beer and harrass their friends! (You can always see how many experiments people have participated in here). Mr. Bird also wins the prize for our first International IGOR!

The Brews

All in all the brew days were pretty unspectacular - no limbs lost. A few substitutions as necessitated by brew day mis-supplies, but nothing out of the norm - all the batches ended up in the target gravity/ibu ranges and off they went to their respective homes. I'm really going to have to think about how we design some pyrotechnics into our experiments to necessitate something truly adult, like insurance. Jason's Brewtech Brew Buckets with hops and wort Reading through the notes, it was interesting to see some of the challenges were centered around keeping the 170F wort from dropping too far. Ryan Casey had the easiest time because he used his PicoBrew (hey sponsor!) to help maintain all of his temps. One piece of amusement for me - how many of our IGORs, besides being amazing brewers still keep old fashioned brew logs? I know Denny does - quick ask him for a batch number < 500 and he can tell you what it was and what he thought of it. (It's a sickness for the man). Me? I'm a computer geek, so all my notes are on my computer or in articles like this! Jason's Brew Log Robert Allaway's Brew Notebook

The Tastings

Our seven reporting IGORs performed 14 tasting sessions with a total of 68 tasters. (nice group of tasting panels, gang). Overall, a really nice mix of experiences, knowledge levels and what not. Casey Price enlisted a few partners in crime remotely. In the spirit of being a family effort, Casey enlisted his wife to serve as beer courier. The crew video'd their tasting panel which I've embedded below. (Tasting notes begin around the 11 minute mark.) Together they're hosting a podcast - http://www.haveadrinkshow.com/. More beer content!

James Bird began what I suspect will be a trend for us - electronic data gathering of taster reactions. He also shared his most surprising finding - it's really difficult for him to find people to drink free beer! Nicki took advantage of the great national holiday of the "Big Game" and shanghai'd her party attendees and made them test beer samples before they could get too engrossed in drinking, eating and betting on the game. I, for one, applaud the efficient use of captive audience members.

Outliers - Too Much of A Good Thing?

So in our last experimental tasting we had a classic outlier scenario - the trial gone wrong. That's the one that makes you stop and grumble and chuck the results. In this experiment, we got a different sort of head scratcher, courtesy of our big game tasters - this time we got results that almost seem too good to be true. Nicki's panel had an astonishing 86% of the tasters score a correct identification. At some point we'll have to do a tasteoff, but if you take Marshall's assertion that you could serve a group of tasters a lager and porter and still have misses, you gotta raise an eye brow when things look too good. We reached back out to Nicki for more information. Was it possible the samples had some tell? Were we back in the realm of "one of these beers has an obvious flaw"? Nicki went back through her notes and observations - no cross talk amongst the tasters, no apparent tells in service, etc. In her words:

  • Each taster was poured three 2 oz. samples. I had two sample sets: sample A & sample B, so the tasters were not all sampling things in the same order. I had to use 2 oz. clear plastic sample cups because I don't have enough 4+ oz. sample glasses to go around.
  • The only directions I gave were to taste the samples, be sure to put them back where they came from, and then identify the one that was different.
  • After that was done, I asked them to record their tasting impressions on the sheet (very few actually did - thanks, Super Bowl) and even included a tasting sheet for possible aroma and flavor vocabulary on the reverse side.

Nicki even went back and waited for her work day to end before tackling the hard task of tasting her beers again. See the photo for yourself. Look pretty damn similar. All in all, her notes show beers that seem very similar except in the hop aroma/flavor department - which you'd expect from this. Points toward her results being accurate, just spooky weird. Science hates spooky weird but sometimes things are spooky. We'll show you the stats in both cases though. Nicki's also enrolled in a Brewing Science program and wants to pull this experiment off again - only this time instead of using a relatively brew process blind tasting group (i.e. only four of her tasters were brewers, but they were all drinkers) - use her classmates. Can't wait to see what happens! FYI - Why aren't we having as much a fit about Jason's #'s (see below) where his tasters only got an 18% hit rate? His numbers are much closer to what you'd expect from random chance (33%) and therefore aren't as eye-popping.

The Results

All righty - so we spilled the beans above, but let's look at the final numbers. With all the test runs in the mix, the aggregate results show a significant finding that yes testers can detect the difference between beers with a whirlpool done at 170F vs. 120F. If we subtract the super positive results from Nicki's run, the p-Value still shows significance. We had exactly the number of positive identifications to keep the p-Value below our magic threshold of 0.05. Skin of our teeth doesn't even begin to describe it! But really is that truly surprising?

  Total Tasters Successful ID's %age Correct p-Value
Magic Threshold 68 30 44% 0.0296
All Data 68 36 53% 0.0003
Magic Threshold (without Outliers 54 24 44% 0.042
Without Positive Outlier 54 24 44% 0.042

Tasting Panel Numeric Data

IGOR Tasters Successful ID's p-Value
Robert Allaway 10 3 0.588 (NOT significant)
James Bird 17 9 0.043 (significant)
Ryan Casey 6 4 0.042 (significant)
Nicki Forster 14 12 0.000 (significant - scary number)
Jason Mundy 11 2 0.857 (NOT significant)
Casey Price 4 2 0.240 (NOT significant)
Tristan Smith 6 4 0.042 (significant)


Tasting Panels Qualitative Data

IGOR Beer Thoughts Experiment Thoughts
Robert Allaway I did not perform a triangle test but was served the beers blind. I was able to correctly identify the 120 vs 170 degree beer. I found that the 120 degree beer had somewhat greater hop presence, and the 170 seemed more muted. I particularly found that the citrus notes were greater in the 120 beer than the 170 degree. "The most experienced beer drinkers in the cohort were the 3 that identified the odd beer out. I suspect that there may be a difference between the two hopping methods, but that the difference may require a trained palate. Furthermore, I think it would be worth retrying this experiment with a very hop forward beer - I suspect any differences might be more notable there. Finally, I suspect that a good dry-hopping might overcome any potential differences in whirlpool temperature...so even if one whirlpool temperature makes a better beer, in practice it may be undetectable in hop-forward dry-hopped beers.>
James Bird Decent-ish pale ale, though not entirely to my tastes (a little too malty and no dry hops...). I detected some diacetyl in the beer (10 day turnaround?) but this was not commented on by any of the 17 tasters. "I have been brewing since 2006 and in one of my first few batches I remember splitting a batch to investigate how much effect dry hopping had. I noticed a very big difference in flavour and have generously dry hopped every pale ale and IPA I have made since. So the premise of this experiment really appealed to me. In several blind tastings I could successfully identify the different beer on smell alone. To me the amount of hop aroma / favour in the 120F whirlpool beer was far greater. There was also a noticeable difference in bitterness. Despite this opinions of my tasters were split with a number of tasters reporting the opposite experience (?!). Even though I considered the lower whirlpool temp beer to have a detectable increase in hop aroma / flavour the amount of hop aroma / flavour for these beers is still not close to that achieved with dry hopping. My typical pale ale / IPA recipes have focussed heavily on flameout additions (4-6 g/L) and large dry hops additions (7-10 g/L). In future recipes I will definitely add flameout / whirlpool additions at lower temperatures for those beers were I want maximum aroma / flavour."
Ryan Casey Once the beers warmed up the taste differences were more apparent, but man, the aroma differences were astounding. I will likely be lowering my whirlpool temps from now on. This was an eye-opening experiment for me. Had I been able to tell people to focus on the aroma I think everyone would have been able to pick out the different beer.
Nicki Forster   The 170 hop charge seemed to have a bolder flavor and aroma, better overall flavor profile. I didn't sense that the lower hop charge at 120 lent a bigger hop aroma than the 170 sample. My findings were essentially the exact opposite of the hypothesis.
Jason Mundy When I transferred the beer to kegs, I thought the 120F beer had a great aroma and flavor. I thought the 170F hop character was more muted than 120F. After two weeks in the keg, I had trouble telling the difference between the two when sampling with other in the triangle test. Very tasty beer. I've never put that many hops in at the finish. I've never added hops past the 0 minute addition (other than dry hopping). It was a very nice hop forward flavor. I think that the 120F did have a better more fragrant hop character in the beginning... I don't know if that character dropped... or 170F improved... the just seemed more even over time.
Casey Price All three were a little solventy (My evaluation). We used standard 5oz squat plastic cups.  
Tristan Smith I will likely combine both beers into a keg and maybe dry hop to give this beer a more complex hop characteristic. The chilled whirlpool provided a more bitter hop taste but didn't seem to provide any added aroma.

Subjective Responses

As we noted above, the fact that the tasters could tell the difference isn't super surprising, but what did they find in terms of actual impact? Looking through these taster comments (which are only the comments from successful tasters and the IGOR experimenter), do we see any consistent trends to the tasting notes for 120F vs. 170F? Looking at the previous collection of quotes - it looks a little confusing! Here's what I see in the comments: 170F

  • Less sharpness x 2
  • More fruity aroma x2
  • More muddled flavor
  • More bitter, crisper, strong aroma


  • Sharper x 2
  • More Grapefruit Aroma
  • Less fruity flavor
  • Slightly sweeter, more smooth hop flavor
  • Better hoppier aroma

What to make of this - it seems there's a consensus that the 170F samples were less sharp. Why? In theory, they spent more time in the range where isomerization still happens. It also definitely appears that we're expressing different oils at the different temps. As for taster's preference - 120F or 170F - well, like a great many things subjective - it was all a matter of taste. Tasters seemed to fall fairly evenly on either side of the divide. In other words, welcome to the new trick in your arsenal, but it's not going to make everyone sing hosannas on high. Now when Jim first proposed this experiment, his initial target was "can we use this a 'replacement' for dry hopping to get a prettier beer with an acceptable hop aroma". Seems the answer is no, but there's some value to the lower temp whirlpools. I'm curious how it changes with all of these newer more oil heavy hops, unlike our traditional Cascade/Centennial charges here in this recipe? More experiments!