Team Building

Performance Reviews That Work in Agency Teams

A practical performance review framework for agency teams in 2026. Cadence, structure, calibration, and the practices that make reviews useful.

Bilal Azhar
Bilal Azhar
14 min read
#performance reviews#team building#agency operations#manager training#feedback

Performance reviews in agencies fail in a specific way. The annual review form asks generic questions ("met expectations / exceeded expectations") that flatten the difference between a senior strategist who landed a $400K renewal, a designer whose portfolio piece won a Webby, an account manager who saved a churning client, and a producer who shipped 47 projects on time. All four might score "exceeds expectations" on the same form. None of them got a useful conversation. The agencies that have actually fixed their review process in 2026 have done something specific: they built role-specific rubrics, separated utilization measurement from outcome measurement, ran calibration sessions across managers, and tied ratings to documented raise ranges. This guide is a practical framework for performance reviews that work in agency teams, with role-specific rubrics for creatives, strategists, account managers, producers, and engineers.

Key Takeaways:

  • Generic review forms do not work; agencies need role-specific rubrics for creatives, strategists, account managers, and producers.
  • Utilization-based and outcome-based reviews answer different questions; use both, not one.
  • Calibration sessions across managers prevent rating drift and are the single highest-leverage practice.
  • Compensation should be tied transparently to ratings, with documented raise ranges by role and tier.
  • The review document is the artifact; the conversation is the work. Train managers explicitly on the conversation.

This guide covers cadence, role-specific rubrics, calibration, utilization versus outcome reviews, raise tables tied to ratings, and the manager training that makes the whole thing function.

Why Agency Reviews Fail Specifically

Most agency reviews fail for one or more of these structural reasons:

  • The same form for every role. A designer, an engineer, a strategist, and an account manager have wildly different success criteria. A single 5-question form cannot evaluate all of them honestly.
  • Manager bias. Some managers rate 4s and 5s liberally. Others rate strictly. Across the team, the ratings are not comparable, and promotion decisions become unfair.
  • No connection to compensation. Employees do the review, get a rating, and then a raise number lands with no visible relationship to the rating. The review becomes theatrical.
  • Annual only. A single annual conversation cannot drive behavior change. People hear feedback once a year and have to wait twelve months to apply it.
  • Surprises in the room. The first time an employee hears that they have been underperforming is in the review itself. That is a manager failure, not an employee failure.

Harvard Business Review has documented these failure modes consistently in its performance management coverage. The fix is structural: role-specific rubrics, layered cadence, calibration, transparent comp tables, and trained managers.

A Layered Review Cadence

Most mature agencies in 2026 run a four-layer review cadence:

| Layer | Cadence | Time | Purpose | | --- | --- | --- | --- | | 1:1s | Weekly or biweekly | 30 to 45 min | Continuous feedback, blocker removal, real-time recognition | | Project retros | Per project | 45 min | Project-level learning, individual contribution surfaced | | Quarterly check-in | Quarterly | 60 min | Goal progress, rubric preview, course corrections | | Formal review | Biannually or annually | 90 min | Rubric scoring, calibration, compensation, growth path |

The combination prevents annual surprises and keeps feedback in the system. The formal review becomes a summary of conversations that already happened, not a single high-stakes event.

The quarterly check-in is the most underused layer. It is short, structured, and prevents the annual review from becoming a fire drill. It also gives the employee a chance to act on direction with three months of runway before the formal review.

Role-Specific Rubrics

The single most important change agencies can make to their review process is to ship a role-specific rubric for each major function. Below are rubrics for the five most common agency roles. Each dimension is scored on a 1 to 5 scale, where 3 is "meeting expectations for current level" and 5 indicates promotion readiness.

Creative (Designer, Art Director, Copywriter, Creative Director)

| Dimension | What to evaluate | What a 5 looks like | | --- | --- | --- | | Craft quality | Output quality against agency standard, awards, portfolio strength | Work consistently selected as agency reel, multiple industry recognitions | | Conceptual range | Breadth of solutions, willingness to take creative risk | Pitches multiple distinct concepts, pushes beyond first idea | | Client presence | Ability to present and defend work, take direction | Owns presentations to senior client stakeholders, recovers from hard feedback | | Collaboration | Effectiveness with cross-functional partners | Strategists and producers ask to be staffed with this person | | Mentorship | Development of junior creatives | At least one junior visibly developed under their guidance | | Velocity | Output relative to project demands | Maintains craft standards under tight timelines |

Strategist (Strategy, Planning, Research)

| Dimension | What to evaluate | What a 5 looks like | | --- | --- | --- | | Strategic insight | Depth and originality of recommendations | Insights regularly referenced by client leadership, used in their internal decks | | Research rigor | Quality of supporting evidence, methodology | Builds repeatable research approaches the agency adopts as standard | | Client influence | Ability to move client decisions | Client cites this person's recommendations when defending choices internally | | Written craft | Clarity and persuasion of written work | Decks read like a senior partner's, not a junior analyst's | | Cross-discipline fluency | Understanding of creative, media, technology | Translates strategy across discipline boundaries effectively | | Business acumen | Connection of work to client commercial outcomes | Strategy linked to revenue, retention, or category share |

Account Manager (Account Lead, Account Director)

| Dimension | What to evaluate | What a 5 looks like | | --- | --- | --- | | Client relationship health | Tenure, NPS, account expansion | Client renews above contract value, refers other clients | | Revenue ownership | Account revenue growth, gross margin | Account revenue growth above 20 percent year over year, margin maintained | | Internal team management | Project health on owned accounts | Producers and creatives consistently want to work on this person's accounts | | Commercial judgment | Scope conversations, pricing decisions | Catches scope creep early, negotiates expansion confidently | | Strategic partnership | Quality of strategic advice to client | Client treats this person as a senior counselor, not just a vendor liaison | | Crisis management | Handling of client escalations | Has owned a major client crisis to successful resolution |

Producer / Project Manager

| Dimension | What to evaluate | What a 5 looks like | | --- | --- | --- | | Delivery reliability | On-time, on-budget project completion | 95+ percent of projects ship on time and within 5 percent of budget | | Scope discipline | Catching and managing scope changes | Scope creep flagged within 48 hours, change orders signed routinely | | Resource judgment | Staffing and capacity decisions | Calls capacity risks 2 to 4 weeks before they materialize | | Cross-functional leadership | Aligning creative, strategy, engineering | Discipline leads describe this PM as a force multiplier | | Client confidence | Client trust in delivery | Client copies this PM into strategic decisions, not just status emails | | Process improvement | Contribution to SOPs, templates, tooling | Has shipped at least one durable process improvement per cycle |

Engineering / Development

| Dimension | What to evaluate | What a 5 looks like | | --- | --- | --- | | Technical craft | Code quality, architectural decisions | Their code is referenced as the standard for the agency | | Velocity | Throughput against estimates | Estimates land within 15 percent on 80 percent of projects | | Cross-discipline collaboration | Work with design, strategy, PM | Designers describe this engineer as a creative partner | | Client-facing communication | Translation of technical work for non-engineers | Can run a client technical review without a translator | | Mentorship | Junior engineer development | Visible upgrade in one or more juniors per cycle | | Risk management | Identification and mitigation of technical risk | Catches architectural and security risks before they become incidents |

These rubrics are starting points. Tailor the dimensions to your agency's specific work and standards. The key principle is role specificity, not one-size-fits-all.

Utilization-Based Versus Outcome-Based Reviews

A specific failure mode in agency reviews: confusing utilization (hours billed) with outcomes (work delivered). These are different questions and they need to be separated in the rubric.

| Measure | What it tells you | What it does not tell you | | --- | --- | --- | | Utilization | Whether the person is working on billable client work | Whether the work is good, whether the client is happy, whether scope was managed | | Project margin | Whether the engagement is profitable | Whether the individual contributed positively to that margin | | Client NPS | Whether the client values the team | Whether this specific person drove the NPS | | Peer review | Whether colleagues find this person effective | Whether colleagues are seeing the full scope of work | | Manager assessment | A senior-level judgment | Limited by what the manager personally observes |

The right pattern is to use all five, weighted differently by role. A producer should be weighted more on margin and delivery reliability; a creative more on craft and peer review; an account manager more on client NPS and revenue.

A utilization-only review tells designers to grind hours. An outcome-only review without utilization context misses that the team is overworked. Both, together, in different weights per role, is the answer.

Calibration Sessions Across Managers

Calibration is the single highest-leverage practice in agency performance reviews. Without it, manager bias makes ratings non-comparable and promotions feel unfair.

A practical calibration structure:

| Step | What happens | Time | | --- | --- | --- | | Manager drafts ratings | Each manager writes ratings for their team using the rubric | 1 to 2 hours per direct report | | Pre-calibration submission | Ratings submitted to HR or COO before session | Asynchronous | | Calibration session | All managers for a function plus senior leader review ratings | 2 to 4 hours | | Discussion of outliers | High ratings, low ratings, and inconsistencies discussed openly | Within session | | Final ratings | Adjustments made; final ratings documented | End of session | | Conversation with direct reports | Each manager delivers final rating in the formal review | 1 to 2 weeks post-calibration |

McKinsey's research on performance management has consistently emphasized calibration as the practice that separates effective performance systems from ritual ones. The discipline of putting your ratings on the table in front of peers prevents drift.

Useful calibration norms:

  • Pre-share ratings before the session so managers come prepared.
  • Use the rubric language, not adjectives, when defending a rating.
  • Distribute final ratings across the rubric. If 80 percent of your team scores 4 or 5, the rubric is broken or the managers are lenient.
  • Track manager-level bias over time. Some consistently rate high; some consistently rate low. Note it for management development.

Tying Ratings to Compensation Transparently

The biggest credibility killer in performance reviews is opaque compensation. The fix is a public raise table tied to the rating system.

A representative raise table for an agency:

| Rating | Description | Annual raise range | Promotion eligibility | | --- | --- | --- | --- | | 1 | Below expectations, on PIP | 0 percent | No | | 2 | Inconsistent, development plan required | 0 to 2 percent | No | | 3 | Meeting expectations for level | 3 to 5 percent | After 12 to 24 months at level | | 4 | Exceeding expectations | 5 to 8 percent | Eligible at next promotion cycle | | 5 | Outstanding, promotion-ready | 6 to 10 percent | Promote at next cycle, market-adjustment possible |

The table is public. Every employee knows what their rating maps to. Surprises are eliminated by definition.

Some practical refinements:

  • Add a separate market-adjustment budget for employees whose market value has moved faster than their level. Without this, top performers leave for competitors.
  • Promotions trigger a larger raise (often 10 to 15 percent) plus a title and responsibility change.
  • Equity, profit-sharing, and bonus structures sit on top of the base raise table and follow their own rules.

Compensation Benchmarks by Role

Realistic compensation ranges for US-based agency roles in 2026, based on Robert Half, BLS Occupational Employment Statistics, and agency-specific compensation surveys:

| Role | Junior | Mid | Senior | Director | | --- | --- | --- | --- | --- | | Designer | $58,000 to $75,000 | $80,000 to $105,000 | $110,000 to $145,000 | $150,000 to $210,000 | | Copywriter | $55,000 to $72,000 | $75,000 to $100,000 | $105,000 to $135,000 | $140,000 to $190,000 | | Strategist | $65,000 to $85,000 | $90,000 to $120,000 | $125,000 to $165,000 | $170,000 to $240,000 | | Account Manager | $55,000 to $70,000 | $75,000 to $110,000 | $120,000 to $160,000 | $165,000 to $230,000 | | Producer / PM | $58,000 to $75,000 | $78,000 to $105,000 | $110,000 to $145,000 | $150,000 to $200,000 | | Engineer | $80,000 to $110,000 | $115,000 to $150,000 | $155,000 to $200,000 | $210,000 to $290,000 |

Adjust for geography (NYC, SF, LA typically run 15 to 30 percent higher; secondary markets 10 to 20 percent lower) and for agency size (top 50 holding-company agencies pay 15 to 25 percent above small independents). These are starting points; benchmark against your specific market annually.

The Review Conversation

The review document is the artifact. The conversation is the work. Most managers are bad at the conversation and need training.

A useful conversation structure for a 90-minute formal review:

| Segment | Time | Manager's job | Employee's job | | --- | --- | --- | --- | | Opening | 5 min | Frame the conversation, name the rating | Listen, prepare to engage | | Employee reaction | 15 min | Listen, ask clarifying questions | Share reactions, push back where appropriate | | Rubric walkthrough | 30 min | Discuss each dimension with specific examples | Confirm or contest specifics | | Growth and development | 15 min | Discuss next-level criteria, gaps to close | Share career aspirations | | Compensation | 10 min | Share the rating-to-raise mapping, the actual number | Ask questions | | Goals for next cycle | 10 min | Set 3 to 5 specific goals | Confirm and commit | | Close | 5 min | Acknowledge contribution, be human | N/A |

Three patterns that distinguish good review conversations from bad ones:

  • No surprises. Every concern raised in the review has been raised at least once before in a 1:1.
  • Specific examples. Adjectives ("good," "needs improvement") are forbidden. Behaviors and outcomes are required.
  • Two-way. The employee leaves with their own questions answered, not just having received feedback.

Performance Improvement Plans

When an employee is consistently below expectations, a PIP is the right next step. PIPs are sometimes treated as a cover for planned termination; the better use is as a genuine attempt to help the employee succeed.

A working PIP structure:

| Element | Detail | | --- | --- | | Specific performance gap | What is missing, with examples from the rubric | | Measurable improvement goals | 3 to 5 goals with clear success criteria | | Timeframe | Typically 30, 60, or 90 days | | Support provided | Coaching, training, peer mentorship, reduced load | | Check-in cadence | Weekly during the PIP period | | Outcomes if met | Return to standing, possible plan to address compensation | | Outcomes if not met | Specific next steps, including possible termination | | Documentation | Written plan, written check-in notes, written outcome |

SHRM publishes the most widely used guidance on PIPs and progressive discipline; their templates are a reasonable starting point. The single most important PIP discipline is documentation. Both for legal protection and for the employee's clarity.

Manager Training That Actually Works

Most agency managers were promoted from individual contributor roles without training. The result is wildly inconsistent review quality. Investment in manager training is the highest-leverage culture spend in most agencies.

A practical manager training curriculum for review cycles:

| Module | Duration | Format | | --- | --- | --- | | Rubric calibration | 2 hours | Workshop with examples and group exercises | | Writing assessments | 2 hours | Workshop with peer review of draft assessments | | Delivering hard feedback | 3 hours | Role-play with senior coach | | Compensation conversations | 1 hour | Workshop with scripts and Q&A | | Goal-setting | 1 hour | Workshop on outcome-based goals | | Handling emotional reactions | 2 hours | Role-play with senior coach | | Following up post-review | 1 hour | Workshop on 30-day check-ins and goal tracking |

Run the training before every review cycle, especially in the first two years of a new system. New managers attend the full curriculum; experienced managers do a refresher.

Reviewing Specific Situations

Remote employees

Remote employees can fall out of mind, especially in reviews. Compensate by tracking measurable outputs more carefully, scheduling more frequent 1:1s, and ensuring their work is visible across the team. The rubric is the same; the inputs require more deliberate effort.

Contractors and freelancers

Apply lighter versions of the same framework. Quarterly check-ins with clear scope and quality criteria work well for ongoing contractor relationships. Annual reviews are usually overkill.

Cross-functional partners

For senior roles, structured input from 3 to 5 peers adds essential perspective. Use anonymized written input synthesized by the manager. Avoid 360 reviews for junior roles where the political weight outweighs the data quality.

Departing employees

Skip the formal review and conduct a thorough exit interview instead. Document the patterns; share them in aggregate with leadership.

Measuring Review Effectiveness

Track these metrics across review cycles:

| Metric | What it tells you | Healthy target | | --- | --- | --- | | Manager completion rate by deadline | Whether the process is operational | 95+ percent on time | | Direct report satisfaction | Whether reviews feel useful | 4.0 or higher on 5-point scale | | Goal achievement in subsequent cycle | Whether reviews drive behavior | 70+ percent of goals achieved | | Promotion rate by rating | Whether ratings correlate with promotion | 4s and 5s should drive most promotions | | Voluntary departure by rating | Whether ratings predict retention | 1s and 2s should drive most regretted departures | | Calibration adjustment rate | Whether managers are calibrated | Adjustments under 15 percent of ratings |

Patterns over time tell you whether the review system is improving manager quality and team development or just generating paperwork.

Anonymized Scenario: A 60-Person Agency Calibration Rollout

A 60-person digital agency rolled out role-specific rubrics and calibration sessions in early 2026 after three cycles of inconsistent reviews. The before state:

  • 78 percent of ratings were 4 or 5.
  • 45 percent of employees reported they did not understand how reviews affected compensation.
  • Promotion decisions were widely seen as political.
  • Voluntary turnover was 31 percent.

The interventions:

  • Built five role-specific rubrics (creative, strategy, accounts, production, engineering).
  • Ran quarterly calibration sessions across all managers.
  • Published the rating-to-raise table on the internal wiki.
  • Trained all 11 managers on review conversations.
  • Added a quarterly check-in layer between annual reviews.

Two review cycles later (about 12 months):

  • Rating distribution: 22 percent at 5, 38 percent at 4, 32 percent at 3, 7 percent at 2, 1 percent at 1. A real distribution, not inflation.
  • Employee understanding of comp linkage: 88 percent favorable.
  • Voluntary turnover: 17 percent.
  • Promotion decisions: described as fair by 79 percent in internal survey, up from 41 percent.

The change required no budget increase beyond manager training. The work was structural and process-based.

Citations and Further Reading

Internal Resources

Frequently Asked Questions

Should we do annual or quarterly performance reviews?

Use a layered cadence: weekly or biweekly 1:1s, project retros, quarterly check-ins, and an annual or biannual formal review. Annual alone is too sparse to drive behavior. Quarterly formal reviews are usually overkill. The quarterly check-in plus continuous 1:1s plus annual formal is the right balance for most agencies.

How do we prevent rating bias across managers?

Run calibration sessions with all managers for a function plus a senior leader before final ratings are shared. Walk through ratings with specific examples, discuss outliers at both ends, and adjust where the discussion reveals inconsistencies. Calibration is the single highest-leverage practice for rating fairness.

Should reviews be tied to compensation transparently?

Yes. Publish the rating-to-raise table on your internal wiki. Define raise ranges by rating. Communicate the framework so employees understand exactly how reviews translate to compensation. Opaque compensation linkage is the single biggest credibility problem in agency reviews.

What should we do when an employee is consistently underperforming?

Move to a documented performance improvement plan with clear goals, defined support, weekly check-ins, and a specific timeframe (30, 60, or 90 days). Document everything. PIPs should be genuine attempts to help the employee succeed, not a procedural cover for planned termination. About 30 to 50 percent of PIPs result in successful return to standing when the system is run honestly.

How do we handle review conversations for remote employees?

Track measurable outputs carefully, schedule more frequent 1:1s, and ensure their work is visible across the team. Use the same role-specific rubric and the same calibration process. The structure of the review conversation is the same as for in-office employees, but the inputs require more deliberate effort to gather.


Want to ground review conversations in real operational data on utilization, project margin, and client outcomes? AgencyPro centralizes the project, time tracking, and reporting layer your managers need. Book a demo and see how operational data supports better people decisions.

About the Author

Bilal Azhar
Bilal AzharCo-Founder & CEO

Co-Founder & CEO at AgencyPro. Former agency owner writing about the operational lessons learned from running and scaling service businesses.

Continue Reading

Ready to Transform Your Agency?

Join thousands of agencies already using AgencyPro to streamline their operations and delight their clients.