Writeup: Comparing First Wort Hopping to 60 minute Bittering Additions

drew's picture



Ok, so this writeup as a little delayed. Blame it on my schedule or something! I've been a busy man. But here we go - the results haven't changed in the last few weeks and things are primed to re-experiment!

This is a variant of an experiment Denny's done a few times over in the past always with mixed results. Will this time be any different?

Executive Summary

I think it's safe to say this was as befuddling a set of results as I've ever dug through. We had 4 sets of IGORs 8 testing sessions. One of the tasting sessions had an absolute perfect guess rate (i.e. all of the tasters detected the "different" beer.) - that result stands as an outlier and we made the decision to throw those results into quarantine.

That leaves the results of three IGORs in the pile - Bob Givens with 39 tasters, with a 75% successful id rate and Ryan Casey and Robert Allaway with smaller tasting panels (8 a piece) each showing non-significance with their panels. With Bob's massive tasting panel size, his results drive the overall panel results to show significance meaning the tasters could reliably detect the odd beer which would mean there's something there - right?

Well, we'll discuss that further on down the line.

The Experiment

This experiment came to us courtesy of Mr. Denny. Basically the question becomes, is there an organoleptic impact to using First Wort Hopping or are we really just adding bittering hops in a different fashion? To test this the IGORs brewed an extra simple PAle Ale. One with only a single hop addition - 2 oz of Cascade at either 60 minutes or a first wort hop addition. Could their tasters reliably tell the difference in a blind triangle test?

For the full detals of the experiment, read the experment design!

The Brew Day

Robert Allaway's Sexy Old Fashioned Brew Notes. The man's a scientist for sure!

All of our IGORs managed to have really great brew days, mostly nailing their gravities and figures dead on. Not many details to share - more pics next time! 

The Tastings

The IGORs managed to put together an impressive set of panels. All told there were 78 testers spread out amongst 8 panels. Bob Givens managed three panels subjecting a ton of folks to this simple Pale Ale. 

Overall, the crews assembled an admirable mix of experienced and inexperienced brewers and beer drinkers as we like to see. Apparently the results of the experiment were close enough in some of the panels to drive the tasters batty. We like that.

Robert Allaway noted that he had trouble recruiting because people he can reach were frustrated by getting things "wrong" in previous tastings. It's interesting to see that reaction since there's inherently no wrong answer here. Humans need to feel smart and right, naturally, but it makes me wonder how to combat that feeling of "I'm a horrible and stupid person because I can't tell the difference." What's a good posistive feedback mechanism for these folks? I'll take suggestions!

Miguel's tasting occurred across the border at the Ensenada Beer Festival. By "strict" rules, you're supposed to conduct tastings in a quiet distraction free environment, but whatever. I think there's incredible value to bringing beer science/education into the public realm.

Outlying Data Discussion

As mentioned in the summary, this time through our outlier data belongs to Miguel, who presented his beers to tasters at a beer festival in Mexico. Of the 23 tasters he roped into tasting (a mix of experience levels as appropriate to a general question like this) all 23 correctly guessed the odd beer. Woo! Right? 

Well, except here's the thing - a perfect score makes my sciencey radar twitch. Human beings just aren't that reliable. I swear you could put a glass of stout down between two glasses of pilsner and have St. Albert the Great himself annotate the glass with a holy glow that radiates benefience and warmth and some portion of your tasters are still going to choose one of the damned glasses of pilsner. (More on that at a later date!)

So just like we did with Nikki during the last experiment, we'll put Miguel's data to the side. Its unfortunate because it would make all of this even more clear, but that's the way the statistical significance thingy crumbles. And just like with Nikki, this move isn't to cast aspersions on Miguel and his brewing/testing, it's common practice to hold data out if there's questions about the test.

The Results

So let's dig into the results from our testers and see what there is to be seen. As we noted in the executive summary - the numerical results aren't mixed at all, but the results of the panels are mixed in such a way that it throws the results into a shady area of uncertainity. (At least for this guy). Also as discussed, we'll be using the results without the 100% taster success since that falls outside the standard spread you'd expect for this type of testing..

  Total Tasters Successful ID's %age Correct p-Value
Magic Threshold 78 34 44% 0.038
All Data 78 60 77% 0.000
Magic Threshold (without Outliers) 55 25 45% 0.041
Without Positive Outlier 55 37 67% 0.000

Tasting Panel Numeric Data

IGOR Tasters Successful ID's p-Value
Robert Allaway 8 4 0.259 (NOT significant)
Ryan Casey 8 3 0.532 (NOT significant)
Bob Givens 39 30 0.000 (significant) (~77% successful ids)
Miguel Loza 23 23 0.000 (significant - scary number)

Tasting Panels Qualitative Data

IGOR Beer Thoughts Experiment Thoughts
Robert Allaway

I could not reliably tell the difference. Nice easy drinking beer, though. I might try this again with a lighter crystal and bump down the OG a bit to make a nice sessionable beer. 

Aroma: sweet, malty, slightly nutty, sweet, very slight stone fruit, 
Flavor: mild, slightly sticky malt flavor, fairly sweet, fruity, somewhat bitter (but indiscernably so), surprising amount of hop presence given that it only had a 60 minute/FWH addition

Had trouble recruiting! I have a less homebrew-y crowd and I think they are getting discouraged already at the difficulty of telling the difference with these minor process variables. I suspect it would be easier to recruit if the variable resulted in a bigger difference. It would still be informative, since we could test a lot of claims that are frequently bandied about (ie X malt confers y flavor to beer or y hop confers z flavor...etc).
Ryan Casey I personally tasted them blind 3 times and only guessed correctly once. The differences were minute, at best. In general people were completely puzzled with what the differences might be. Tasted blind I can barely taste a difference, but when I know which is which I actually prefer the FWH beer.
Bob Givens The FWH beer appears to have a rounder bite and an increase of aroma. FWH also has a more bitter finish.

FWH makes for a softer bitterness using less hops to achieve the same amount of IBUs. Different hops may have a different result for higher Alpha Acid crops.

I find that FWH beers have a refined bitterness that is easier to manipulate to additional flavor (depending on the hop variety). I will continue to use FWH for my hop-forward beers.

Miguel Loza Most everyone would describe the flavor of the hops, citrus, fruity, herbal...whether one had more bitterness or hop flavor than the other. The best was people describing how smooth the FWH sample tasted.
In order to get our data, we printed some flash cards with 5 questions, 1. More Aromatic? 2. Cloudy (yes or no) 3. Flavor 4. Differences 5. Best of both samples?
Based on the results, I was able to learn how much difference both additions make. Dependent on the style of the beer, I will be using one or the other. People seem to like both and its only a matter of preference. It was a great experiment...Thank you!

Subjective Responses

As we noted above, the strange mix of results makes us a little wary of giving an endorsement of the overall meta score that indicates significance. Is there anything we can draw out of the successful taster's comments to help draw a conclusion?


  • Slightly more bitter. (almost universal reaction from the tasters)
  • 3 panels noted a more rounded, floral character to the FWH, 2 no difference
  • More fruity, citrus hop character

60 Minute Addition

  • Sharper x 2 panels
  • More Malt Aroma evident (1 panel)

There you go - if there's any conclusion we can make from these taster observations - almost all of the successful tasters noted the FWH beer as being more bitter. Remember, this was a beer with only one hop addition! You know what would have been great - having IBU measurements! Sounds like our next step. 

From a purely hedonistic point of view, I like a note from Miguel's tasting: 

Out of the 23 samples we gave out, 12 liked the smoothness of FWH and 11 liked the bitterness and aroma that the 60 minute addition provided.

In other words, regardless of whether or not tasters could blindly tell a difference, taste preference is a coin flip so make the beer that pleases you!

In conclusion, here's what we know - this is Denny's n'th time trying a FWH experiment. It's also the n'th time the results have been strikingly inconclusive. Jerky results! 

But what about that 67% successful id rate? Well, here's what makes us nervous about accepting it at face value. Almost all of those successes are driven by Bob's tasting panels. He clearly had a detectable difference in the beer. Do we say "Woot! Success" because of that one set of results? The other two remaining data sets both skew more closely with crazy random happenstance. Sorry to be wishy washy about it but I'd feel more solid with more results in Bob's vein to say "Yes, we have evidence this makes a real difference." So we're stuck with our current mealy mouthed conclusion until we revisit!

Dr. Pivo
disgruntled tasters

It was asked:

" Robert Allaway noted that he had trouble recruiting because people he can reach were frustrated by getting things "wrong" in previous tastings. It's interesting to see that reaction since there's inherently no wrong answer here. Humans need to feel smart and right, naturally, but it makes me wonder how to combat that feeling of "I'm a horrible and stupid person because I can't tell the difference." What's a good posistive feedback mechanism for these folks? I'll take suggestions!"

This is a problem of wounded pride, and should be taken seriously. I usually let failed tasters know that it is a known fact that girls have better tasting acuity than boys, and that our taste buds atrophy with age. If they still look dissapointed and their thoughts haven't wandered in the more positive direction I am trying to lead them, you can always add: "You must be a REAL man!" or maybe "Wow! you must have a lot of testosterone courseing through you."

You'll have to make up something yourselves to molllify the ladies that fail, but in my expereince that gender generally places less pride in these kind of thing, and can take it for what it is... that you are genuinely interested in their opinion.