Scales O Rater Training To Reduce Errors example essay topic

2,017 words
Performance Appraisal Reasons for appraising employees 1) For administrative decisions: so we can determine promotions / demotions, assign raises, fire someone. 2) To give employee development and feedback: so they can correct what they do wrong and continue to do the things they do well. 3) research reasons: to develop criteria, to validate predictor. Ultimate vs. actual criterion - Ultimate / theoretical criterion: refers to your standard, your expectations of what good performance is considered.

It's the construct that you are trying to measure. It's the definition of WHAT good performance is rather than how it is measured. - Actual criterion: operational ization of your theoretical / ultimate criterion. Exactly what physical measure you are trying to use to measure your theoretical / ultimate criterion. The measure of theoretical / ultimate criterion. Criterion relevance: what you are interested in and can capture.

The extent to which actual criterion assesses theoretical / ultimate criterion it is designed to measure, or its construct validity. Closer the correspondence between actual criterion and theoretical criterion, greater the relevance of actual criterion. Relevance concerns the inferences and interpretations made about the meaning of the measurements of performance. Criterion deficiency: information you are interested in but can! |t capture with actual criterion measure. The missing info you want to know. To the extent that you have an unreliable measure, you are capturing irrelevant info and you capture what you don! |t need to capture if your measure is unreliable or not valid.

Actual criterion does not adequately cover the entire theoretical criterion. Actual criterion is an incomplete representation of what we are trying to assess. [ex. Students test scores in math could be used as an actual performance criterion for elementary school teachers. But it would be a deficient criterion because elementary school teachers teach more than just math.

A less deficient criterion = student scores on a comprehensive achievement test battery, including math, reading, science, and writing. ]. Criterion contamination: Information you capture but aren! |t interested in (erroneous, irrelevant, etc). Part of the actual criterion that reflects something other than what it was designed to measure. Can arise from biases in the criterion and from unreliability. "X Biases: people have cognitive limitations (forget things).

"X Unreliability: to the extent that your measure is not capturing what you are looking fro (irrelevant / unreliable info), you will have a contamination problem. Problems with actual criteria Criterion Complexity - Most jobs are multidimensional (due to that, you! |ve to identify those dimensions and to assess them on all those dimensions) "^3 attendance, motivation, quality, quantity. - Need to assess all dimensions (two possible approaches). Let's say my job is 10 dimensional, do I keep those things separate or combine them into a single score. o Composite criterion approach: to make meaningful interpretations, you aggregate them somehow (combine! X avg. or add them up, etc) o Multidimensional approach: keep them separate which is best? "^3 If you wanna make a decision you might want to combine.

"^3 If you wanna provide feedback you want to keep separate. Dynamic criteria: criterion itself does not change, the conditions around it do. Variability of performance over time, although it's the performance and not the standard that changes. This Variability makes assessment of job performance difficult because performance would not be the same throughout the entire time period being studied.

Objective vs. subjective measures of job performance Objective measures! V counts of job related behaviors and results. Counts of various behaviors (# of days absent) or the results of job behaviors (total monthly sales). Advantages: - usually easy to interpret (dollars gained / lost, # days absent) - individuals easily comparable (Todd sold 20 cars, Mary sold 10) - can be tied to organizational objectives - often already in corporate records (specially pos (+) stuff like sales vol). Disadvantages: - not suitable for all jobs (you can! |t quantify everything"^3 ex. Art?) - meaning of #s not always obvious (a mnt.

Of scrap material wasted; may be person is not doing job right or machine is fucking up) data from records may be incorrect (human make errors as typing into record). - Criterion deficiency & contamination (ex. Sales vol. doesn! |t tell how well a person does customer follow ups, at morning may be busier than at night). -What is measured may not be under individual's control. (sales vol.

"^3 the product might not be selling due to market conditions not cuz you are a bad person). Subjective measures! V most frequently used means of assessing the job performance of employees. - Graphic rating form: usually a scale from 1-5 (very poor to very good). Assess people on several dimensions of performance. Focuses on characteristics or traits of the person or person's performance.

Drawback: what's a 5? Some people have different standards of what! SS very good is!" . - Behavior Focused Rating Forms: attempt to solve the issue of graphic ratings. Concentrate on specific instances of behavior that the person has done or could be expected to do. Behaviors are chosen to represent different levels of performance. (for attendance, good behavior!

SSb e at work everyday on time!" , bad behavior! SS comes to work late several times a week!" o Behaviorally Anchored Rating Scale (BARS): like a graphic rating form but you have specific examples of what a 1 is (#s are defined). Response choices are defined in behavioral terms. BARS can be used to assess same dimensions as graphic rating form, difference is that BARS uses response choices that represent behaviors, graphing rating form asks for a rating of how well the person performs along the dimension in question. o Mixed Standard Scale (MSS): provides rater with a list of behaviors that vary in effectiveness. Rater is asked to indicate if rate is better than the statement, statement fits the ratee, or if ratee is worse than the statement. Statements for various dimensions are represented in random order. o Behavior Observation Scale (BOS): like MSS but you don! |t ask!

Share you better than this!" , you ask ppl how frequently have you observed the individual perform the activity. (ex. How often your prof. use power pt. presentations on lectures). Contains items base don critical incidents, making it somewhat like a mixed standard scale. Raters are asked to indicate for each item the amount of time the employee engaged in that behavior. (recommended in %, like 40% of time he does so). Raters indicate frequency rather than comparisons of employee behavior (MSS) Rater Biases and Error (or control) behavior focused ratings scales (BARS, BOS... ) to try to minimize these bias (particularly distribution biases). They are not necessarily all that more effective than graphic rating scales in terms of accuracy.

They do provide some benefit but it's limited (specially for the cost of developing and time / trouble it takes to develop a BARS or BOS). - Halo: idea that good or bad performance on something will spill over to other dimensions. Tendency to rate that person high / low on other things as well simply because they are high on that one. It's an issue because you can! |t tell from true halo (employee performs at same level on all dimensions).

- Primacy and Recency: we tend to remember the first impressions or the most recent memories of individuals (ex. You! |ve a performance appraisal review and it's been a year that you appraised that person, you are most likely to base opinion on last time you appraised him or very 1st impression). - Distributional errors "^3 similar to me: tend to load into people that have something in common (same Uni, neighborhood). o central tendency: idea that a supervisor's tendency to rate an employee right down the middle on all dimensions (not taking a stand whether employee is good or bad). Companies want justification for rating (ex. For rises, merit purposes, want to know why they have to spend additional $ on that employee).

For low ratings you also have to justify it (reason is not monetary). For middle of the road you don! |t. Can be seen across ratings of diff. people. o Leniency: (loading on top (favorable, positive)! Can see it across many people- and severity (loading at bottom (negative) of scale). Problem: you are not creating any variance whether you are loading in middle, bottom, or top, you are not giving the company any way to distinguish among individuals (to separate them and determine who! |ll get a raise in $) - Controlling rater bias and error: you have a normal curve / distribution and the idea is that you are loading on one part of the distribution (high, low end or middle).

Occur when rater tends to rate everyone the same. Its possible that distributional error pattern does not reflect errors. All raters might performed the same. o Error-Resistant Forms to Assess Performance: behavior-focused rating scales (BARS, MSS) were developed to eliminate rating error. Raters will be able to make more accurate ratings if they focus on specific behaviors rather than on traits. Behaviors are more concrete than traits so rating them requires less idiosyncratic judgment.

Carefully developed behavior-focused scales "^3 better rating error resistance than graphic rating scales. o Rater Training to Reduce Errors: most popular training is rater error training (RET). It's to familiarize rater with rater errors and to teach them to avoid these rating patterns. Reduces rating errors, but often at the cost of rating accuracy"^3 raters might reduce the # of halo and leniency patterns in ratings, but those ratings are less accurate in reflecting the true level of performance. WHY / HOW?

"^3 rater errors are inferred from the pattern of ratings. Its possible that the performance of individuals perform their job equally well. Training raters to avoid same ratings across either dimensions or people will result in their concentrating on avoiding certain patterns rather than on accurately assessing job performance. RET might be substituting one series of rating errors for another. o Training: one person does judgement (or performing evaluations) of somebody else.

Clearly there's something about their thinking pattern that is incorrect so therefore we can train them the misbehavior. "X Rater error training: point out or try to make the raters aware of the various biases and errors"^3 you go training and you learn about halo, leniency, severity! K [reduces error but does not increase accuracy] "X Observation Training: train supervisors / raters to be more aware of their subordinate's behaviors. To watch more closely, keep documentation (journal) "^3 so when comes time for performance appraisal you just look into it to analyze what they did! K otherwise you won! |t remember from the top of your head). [more accuracy, but does not necessarily reduce error] "X Frame of reference: you train your supervisor to really understand what that! yen 1! | really means on the scales. You make sure everyone is on the same page about exactly what the ratings mean.

You are almost teaching the test, you are teaching them to use the performance tool that you have given them accurately and adequately. (ex. Out of the 2000 best astronauts, you pick 20 (the cream of the cream) because they all (2000) have PhDs, etc! K a! SS 1!" in this purpose is not a! SS 1!" in the general population (were it's someone without a high school degree), they are all!

SS 5!" in the general population. [pretty effective].