How we test: TVs



Go behind the curtain with David Katzmaier (left) and Ty Pendlebury, your friendly neighborhood TV geeks.


Did you know CNET reviews TVs by comparing them directly in a side-by-side lineup, after each has undergone a thorough calibration? Did you know that the main instrument we use to calibrate and measure those televisions costs about $28,000? Did you know that last year we reviewed and rated 54 individual TVs and revamped our ratings system to incorporate value?

Yes, I'm biased, but I consider CNET's TV reviews the best in the business. We've come up with a set of tools and procedures designed to arrive at unbiased results by utilizing industry-accepted video-quality evaluation tools, objective testing criteria, and trained experts. The goal is to tell you what TVs are better than others, and why. Here's the complete guide to how we do it, updated in anticipation of 2013's crop of new TVs.


Test environment and equipment


The most important piece of test equipment is a trained, expert eye. Test patterns and the latest gear are no substitute for a knowledgeable, keen-eyed evaluator with a background in reviewing similar types of TVs. CNET's TV reviewers, David Katzmaier and Ty Pendlebury, have extensive experience reviewing and calibrating displays, and perform all measurements and tests themselves.

Our main TV lab is a 1200-square-foot room where we set up our comparison lineups. A curtain can divide the room in half so we can set up two different, independently light-controlled lineups at the same time. Light control is a big deal for TV testing. We have blackout shades we keep down (resulting in complete darkness) for most tests, but we can also raise them to evaluate a TV's bright-room performance. The walls are painted black and the floor and ceiling are dark gray to minimize contamination from light and maximize background contrast.

Our primary mechanical test device is a Konica Minolta CS-2000 spectroradiometer (right; about $28,000), which replaced an older
CS-200 in June 2008. The CS-2000 improves upon the CS-200 in its capability to measure low-luminance sources, and is regarded as one of the most-accurate devices of its kind. It measures luminance and color from any type of display, including plasma, LCD and LED-based flat-panels, as well as projectors and even rear-projection TVs (if they return from the dead).

Here's a list of our other test equipment and hardware:

  • Current reference displays: A reference display provides the best baseline possible to compare various aspects of TV performance. CNET uses the Pioneer Elite Kuro PRO-111FD, which as of early 2013 it still produces the best overall 2D picture quality we've tested. In 2013 we will also use the Panasonic TC-P65VT50 and Sharp Elite PRO-60X5FD for reference to compare to other high-end TVs, as well as the Samsung UN55ES8000 for 3D reference. We also use other lesser TVs as references for mid-range and budget lineups.


  • Quantum Data 780: A signal generator that outputs a variety of test patterns at various resolutions and formats, including all HDTV resolutions, 1080p and 3D, via HDMI. As of early 2013, this is the primary generator we use for calibration and evaluation.

  • AV Foundry
    VideoForge:
    Our secondary test pattern signal generator, with similar capabilities to the Quantum Data. Depending on the test we're performing, we may use it instead.

  • Key
    Digital 1x8 HDMI distribution amplifier, Key Digital 4x1 HDMI switch:
    This eight-output HDMI distribution amplifier/switch combo can send any of four HDMI sources (including 3D) to as many as eight displays simultaneously without any signal degradation. We use this setup for side-by-side comparison testing. There are two such combos in our lab, one for each comparison lineup.

  • Extron DA6 YUV A: A six-output component-video/RGBHV distribution amplifier that can send one SD or HD source to as many as six different displays simultaneously without any signal degradation. We use it primarily for side-by-side comparison testing of component-video.

  • Sony PlayStation 3 Slim: Blu-ray player (reference, 3D compatible). There are two PS3s in our lab, one for each comparison lineup.

  • Oppo DV-980H: DVD player

  • Motorola QIP7232 High-definition DVR for Verizon's FIOS service. In late 2012 we upgraded from DirecTV to FIOS, which provides better picture quality on most HD channels.

  • Monoprice, Amazon Basics and Key Digital HDMI cables (Reminder: All HDMI cables are the same.)
Here's a list of the reference and test software we use:
  • CalMAN 5 Ultimate by SpectraCal: This flexible software program controls both our spectroradiometer and signal generators via a laptop PC to aid in the calibration process. It provides a step-by-step procedure for adjusting TV picture controls, including advanced grayscale and color management, according to guidelines used by the Imaging Science Foundation (ISF). Every TV CNET reviews is calibrated prior to evaluation using this procedure, and the reports and many of the numeric evaluation results at the end of the review are generated by CalMAN.

  • Digital Video Essentials: HD Basics (Blu-ray): This test disc is a secondary source for the patterns used for calibration and evaluation.
  • HQV Benchmark (Blu-ray): Patterns from this disc are used to help evaluate video processing.
  • FPD Benchmark Software for Professional (Blu-ray): Patterns from this disc are used to evaluate motion resolution.

TV review samples and series reviews

Unless noted otherwise, CNET HDTV reviews are based on one reviewer's hands-on experience with a single particular sample of one model. While our experiences are usually representative of other samples with the same name by the same manufacturer, we can't always be sure of that since performance can vary somewhat from sample to sample--particularly if newer samples receive updated firmware, or if manufacturers make changes without updating the model name. We typically review models as quickly as possible, so we often receive early versions of firmware that are sometimes corrected later. However, we never review preproduction samples. All of the samples used in CNET HDTV reviews represent, as far as we can tell, shipping models. Sometimes a firmware update will have a direct effect on the performance of a television, and thus on its final review score. When this is the case and we're made aware of it--usually after a CNET reviewer or a reader finds a performance-related problem--we'll post related follow-up information in a note referenced in the review body.

It's worth noting that CNET obtains most of its review samples directly from manufacturers, typically by an editor asking a public relations representative for the desired model. This, unfortunately, can lead to manufacturers sending nonrepresentative samples, or even tampering with the units before they are sent, to help ensure more-positive reviews. If we spot a blatant case of tampering, we'll note it in the review, but we can't always prove it (and in case you're wondering, no, we've never spotted a case of tampering that we could prove enough to mention in a review). If a manufacturer cannot ship us a sample or doesn't want us to review a particular set, we sometimes buy the model in question ourselves.

TV makers generally group their models into series, which share identical features, styling, and specifications across multiple screen sizes. In 2009, CNET's TV reviews were expanded to cover other sizes in the series, not just the one size we typically review hands-on. While we don't test these other sizes directly, we feel that the performance-related remarks, as well as other portions of the review, apply closely enough to all sizes to warrant a "series review" approach. Even so, we are careful to check with the manufacturer to make sure there aren't any "odd" members of the series to which the review wouldn't apply. Check out our in-depth explanation for more.

Test procedure

We strive to consistently test all TVs we review using the procedure below. In cases where not all of the tests are followed, we'll note the missing items in the review.

Aside from the bright-room portion of the test (see below), all CNET HDTV reviews take place in a completely darkened environment. We realize that most people don't always watch TV in the dark, but we use a dark environment ourselves for a number of reasons. Most importantly, darkness eliminates the variable of light striking the TV's screen, which can skew the appearance of the image. It makes differences in image quality easier to spot, especially perceived black-level performance, which is severely affected by ambient light. Darkness also allows viewers at home to more easily match the experiences written about by the CNET reviewer. Finally, darkness is the environment we find most satisfying for watching high-quality material on a high-performance TV.



Calibration

Before we perform formal evaluations of HDTVs, we first calibrate their picture settings, with the help of the CalMAN software, to achieve peak performance in our dark room. Though it may seem more realistic to test TVs in the default picture settings, those settings often don't represent the TV's peak picture quality. Some are designed for maximum brightness, saturation, and impact on the showroom floor. That might sound desirable, but we believe a more natural, realistic picture looks better--in other words, one that most accurately reproduces the incoming signal. Calibration also provides a level playing field for comparisons.


Unlike some of the third-party TV calibrations offered today, the ones performed for CNET TV reviews do not utilize settings in the hidden "service menus" of televisions. Nearly all TVs have these menus, and previously we would access them to better calibrate our review samples. In the last few years, however, we have posted our ideal dark-room picture settings as part of our reviews, and since users cannot typically access those service menus (at least, not without voiding the warranty), we decided to no longer use them in our calibrations. We recommend that TV viewers avoid accessing the service menus themselves, because without proper training they can do more harm than good. Happily, many new HDTVs offer ample controls to achieve optimum picture quality without having to resort to service menus. Check out this Q&A for more.

CNET TV calibrations follow a few steps, utilizing CalMAN 5 and patterns from the Quantum Data signal generator at 1080p/60 connected via HDMI to the TV.

  • Choose the picture mode (typically Movie or Cinema) and color temperature preset (typically Warm or Low) that produces the most accurate initial dim-room picture, allows full access to detailed controls and comes closest to D65, or 6500K.
  • Disable or minimize any automatic picture adjustment controls, dynamic contrast, ambient light sensors, auto black, auto color/flesh tone, or other circuits that change the picture on the fly. Engage settings, such as local dimming on LED displays, that generally improve picture quality.
  • Adjust brightness and contrast for maximum dynamic range without clipping, using the Black and White Pluge patterns.
  • Adjust maximum light output to 40 fL (footlambert) from a 100 percent window pattern. This light level is bright enough to provide excellent contrast but not be overwhelming in dim and dark rooms; it is achievable by most TVs we test.
  • Choose the gamma preset (if available) that comes closest to an average of 2.2, the standard for professional monitors.
  • Calibrate color management system, if available. We attempt to achieve proper absolute luminance for primary colors and proper hue for secondary colors, as dictated by CalMAN and the Rec709 HD color standard. CMS adjustments are made using 75 percent luminance window patterns. If CMS can't improve on default settings or introduces artifacts, we disable it.
  • Calibrate grayscale using 2-point and/or multipoint system, if available. We attempt to adjust all levels of gray, in 5 percent increments using window patterns, to come as close as possible to D65 (x=0.3127, y=0.329) while maintaining 2.2 gamma.
  • Adjust brightness, contrast, light output (luminance), color, tint, and sharpness a final time

The results of the calibration are captured in a CalMAN report posted at the end of the review.



All of our picture settings used to achieve the calibrated image are published on a post specific to each TV in CNET's picture settings forum. Each review contains a link and image (right) to that page. The picture settings are usually accompanied by detailed calibration notes as well as a link to the calibration report (see below). Users are free to reply and even submit their own picture settings. Here's an example.


Side-by-side comparison
Every HDTV CNET reviews is compared with others in the room during the evaluation. This is a direct, side-by-side comparison; the TVs are literally lined up next to one another and compared in real-time, with the reviewer recording observations on a laptop computer. We use numerous sources fed through a switch and a distribution amplifier--a device that can feed multiple TVs the exact same signal with no degradation. TVs being compared often share similar price points, screen sizes, and other characteristics, but can just as often be more or less expensive or have different characteristics to better illustrate major differences (such as between LCD and plasma, or an extremely expensive set versus a less-expensive model).

These comparisons allow CNET's to make definitive, in-context statements about virtually every area of a TV's performance, and their accuracy depends on each of the TVs sharing a level playing field. For that reason, we compare only calibrated televisions. We know of no other professional publication that regularly performs side-by-side comparisons as a part of nearly every review.


Image-quality tests

We perform a broad range of tests on all televisions we review, organized into a few key categories. Most comments in a TV review's picture quality section are based on observations of a Blu-ray movie, since Blu-ray is the highest-quality source typically available to HDTV viewers today. We use a variety of films, as opposed to one or two "reference" films, to better illustrate that performance characteristics are universal and apply regardless of which movie's being watched (they also make the reviews more fun to read and write). An argument can be made for using the same movie every time, and we do have a few scenes in certain films that we return to over and over, but in general we prefer to spread it around.

Here are the main picture quality areas addressed in CNET reviews:

  • Black level: We comment on the depth of black a TV is capable of producing. Since deeper, "blacker" blacks lead to more-realistic pictures, higher contrast, and more "pop" and color saturation, we consider black level the most important single performance characteristic of a TV. We may also talk about shadow detail, gamma and dimming-related processing in this section. Subjective observations are supported by the "Black luminance (0%)" and "Avg. gamma" measurements in the Geek Box (see below).

  • Color accuracy: We evaluate the combination of color temperature and primary and secondary color accuracy according to the Rec709 HD color standard. Subjective observations are supported by the majority of measurements in the Geek Box, everything from "Avg. grayscale error" to "Yellow error."


  • Video processing: This broad range of tests includes objective measurements such as resolution capabilities and 1080i de-interlacing and subjective tests with both patterns and real-world material. One of the most important is the ability to properly handle 1080p/24 cadence (see HDTV resolution explained for more). As of September 2008, we also began testing for motion resolution, which has both subjective and objective elements and so is usually reported as a range, e.g. "between 300 and 400 lines." If a TV has motion processing, such as 120Hz or 240Hz smoothing (dejudder), we also address its real-world effects in this section. We'll also talk about excessive video noise here, if we can trace its fault to the TV, as well as other miscellaneous issues such as false contouring (aka solarization) not dealt with elsewhere. The remainder of the Geek Box below hue is devoted to video processing.


  • Uniformity: With LCDs and rear-projection sets, we use this section to address backlight uniformity across the screen, making subjective observations with full-raster test patterns, letterbox bars and flat-color scenes, such as shots of skies, from program material. We also talk about off-angle viewing in this section, using similar material and subjective comparisons. Plasma TVs usually have effectively perfect uniformity and off-angle viewing, so we don't typically don't include this section in plasma reviews--but we will if the plasma's uniformity is atypical to our eye.


  • Bright lighting: We turn on the lights in our testing area and open the windows during daytime to see how the TV handles ambient light. We note the screen's reflectivity compared with its peers, as well as its ability to maintain black levels. This test is entirely subjective.


  • 3D: Our final tests involve 3D picture quality, and at the moment they're entirely subjective as well. Moreover we don't perform calibrations in 3D, although if the default "Movie" or "Cinema" settings for 3D seem particularly incorrect, we'll do some tweaking of the basic controls. In this section we usually address crosstalk, the depth effect, overall luminance, and video processing in 3D (see the 3D TV FAQ for more on these issues). We don't normally evaluate a TV's 2D to 3D conversion, however. Note that a TV's 3D picture quality is the sole item from this list that doesn't factor into the TV's numeric Performance score.

In 2012 we also stopped testing TVs with PC sources since we saw little variation in how TVs handled digital (HDMI) video from computers, and analog (VGA) computer connections are less common. Check out How to use your TV as a computer monitor if you're interested in doing so.


In early 2013 we began currently implementing new tests for projectors, as well as a test for input lag. We'll update this article when those tests are finalized.



TV sound quality (by Ty Pendlebury)
Due to reader demand we began subjectively testing the quality of TVs' built-in audio in 2013. To test a TV we first set the sound mode to Standard or Flat at 50 percent volume and turn off modes like Surround or "Enhanced Voice". If a TV has a specific music mode we may test it at our discretion, but most importantly we want to test how clear dialog is. We use the following three components.

  • Speech: A prerecorded CBS News broadcast. The newscast starts with a helicopter-based report which means it is a good test of both sonic detail and speech clarity.

  • Movie: Mission Impossible III Chapter 11 (Blu-ray). The scene involves some scratchy dialogue and highly dynamic sound due to
    car crashes and explosions. Action movies really stress your TV's speakers and are a good test of how it will perform from quiet moments through to the loudest.

  • Music: "Red Right Hand" by Nick Cave and the Bad Seeds in lossless feed via a
    PS3. This song features a deep bassline and Cave's baritone. The song features both strong dynamics and subtle details and as such is a good test of a TV speakers ability to handle music playback.

Geek Box and CalMAN report



The Geek Box (example) is where we put many of the objective results we attain from measurements. It's been overhauled continually over the years as our testing evolves, and changed again in 2013 when we switched to using CalMAN 5.

The box contains three columns: Test, Result and Score. Each test is detailed below. The result of each test is either numeric or pass/fail. Each score is either Good, Average or Poor. We determined the cutoffs for those scores based on guidelines in the CalMAN software (namely delta error levels), data gathered from past reviews and editorial discretion.

Note that while these numbers and scores are useful, they don't necessarily represent the full picture quality of a display, and we consider many other factors when arriving at the numeric performance score in a CNET review.

Unless otherwise noted, all test patterns measured are windows--a rectangle of white, gray, or color in the center of the screen surrounded by black--generated by the Quantum Data 780; all numbers reported are taken directly from CalMAN; "error" is Delta Error 2000 (dE2000) per CalMan; all percentages refer to test pattern's luminance, where 0 percent is black and 100 percent is white.

Geek Box key

Black luminance (0%) Example result: 0.0140
This is the measure of the luminance of "black" in fL (footlamberts), and a lower number is better. It's often referred to as MLL, for minimum luminance level, but since this measurement is taken post-calibration it may be higher than the TV's minimum. We consider the post-calibration black level most important because the calibration process aims to prevent crushing of shadow detail and "tricks" like dynamic contrast that can affect this measurement. The measurement is taken of a completely black screen (except for a 5% stripe on near the bottom), created by using the Quantum Data's 0% window pattern.
Good: +/- less than 0.009
Average: +/- 0.009 to 0.019
Poor: +/- 0.02 or higher

Avg. gamma (10-100%) Example result: 2.24
Gamma is a measure of how much light a display produces when fed a certain level of signal. The score is based on the result's +/- deviation from 2.2, the standard for professional video monitors.
Good: less than 0.1 deviation
Average: 0.2 or less deviation
Poor: more than 0.2 deviation

Error tests and results After gamma, the next 11 tests report results as an "error." Every result is reported as Delta Error 2000, where zero is perfect, and a lower number is better. The cutoffs for scores are based on targets within CalMAN 5, designed to represent human perception. Generally errors less than 3 are not perceptible.
Good: 3 or less
Average: 5 or less
Poor: more than 5

Avg. grayscale error (10-100%)
An average of all ten error results from 10 to 100% luminance grayscale windows. dE 2000 in this context (and for the next three tests) combines errors from gamma and the color of gray.
Good: less than 0.1 deviation
Average: 0.2 or less deviation
Poor: more than 0.2 deviation

Near-black error (5%)
The color of gray at 5 percent luminance, slightly brighter than black. Near-black is often difficult to get correct.

Dark gray error (20%) and Light gray error (70%)
The color of gray at 20 percent and 70 percent luminance, the points at which we perform 2-point grayscale calibrations.

Avg. color error
The average of all six of the color error numbers below. Color errors in this context (and the next six tests) combine errors for luminance, saturation and hue.

Red, Green, Blue, Cyan, Magenta, Yellow error
The three primary and three secondary colors' errors, measured using a 75% luminance window.

1080p/24 Cadence (IAL) (Pass/Fail)
In this subjective test we look at our favorite test for proper film cadence, a helicopter flyover from the Blu-ray of "I Am Legend" (Chapter 7, 24:58 in) played back at 1080p/24 resolution. If the TV, in its most favorable setting, delivers the same look to the scene as our reference display, it passes. If it introduces smoothing or the hitching motion of 2:3 pull-down, it fails.
Good: Proper film cadence (denoted by "Pass").
Poor: Improper film cadence (denoted by "Fail").
No average score possible

1080i Deinterlacing (film) (Pass/Fail)
We use the HQV Benchmark on Blu-ray's Film Resolution Loss Test to determine whether the display can recognize film-based content recorded at 24fps and convert it to the display's native resolution without losing detail.
Good: Fine horizontal lines visible in corner boxes (denoted by "Pass")
Poor: Boxes exhibit strobing and/or vertical bands (denoted by "Fail")
No average score possible

Motion resolution (max) and (dejudder off)
We use the FPD Benchmark Software for Professional Blu-ray's moving Monoscope pattern to measure the maximum number of horizontal lines of resolution the display preserves during motion. Higher results are better. This test is often difficult to evaluate so it's subjective to a certain extent; we report the higher number in the range if in doubt. Check out our in-depth explanation for more. In the (max) row the TV is set to the most-favorable picture setting, while in the (dejudder off) row video processing that introduces smoothing is disabled to the largest extent possible. If such processing is impossible to turn off, we list a result of "N/A."
Good: 900 lines or more
Average: 500 to 899 lines
Poor: fewer than 500 lines




A sample CalMAN report


Calman report
Beginning in April 2011, CNET reviews include the complete calibration report from CalMAN, available as a PDF document at the end of the review. It's generally entitled "CNET review calibration results." The report provides a visual representation of the TV's color and gamma characteristics both before and after calibration.

TV power consumption
As of 2012 CNET no longer tests the power consumption of LED and LCD-based TVs 60 inches or smaller. The differences in energy use between them amount to only a few dollars per year. We will test larger LED and LCD TVs, however, as well as all sizes of plasma and OLED TV.

You're reading an article about
How we test: TVs
This article
How we test: TVs
can be opened in url
http://newsrevelatory.blogspot.com/2013/02/how-we-test-tvs.html
How we test: TVs