There’s a hard limit in SharePoint Online that nobody talks about until it ruins their morning. 50,000 unique permission scopes per list or library. Once you hit it on a given list, you can’t break inheritance on another item in that list. SharePoint just stops letting you share things in there.
I found out about it on a Tuesday. With my Wordle still open in another tab.
The Setup
UK care provider, around 200 sites, 5,000 residents at any one time. I’d built them a custom permissions automation platform. Each resident folder got its own permission set based on status, role, and which home the staff worked out of. Per-item permissioning, by design.
Important note before we go further: this isn’t the build story, it’s the months-on incident. The build story gets its own post.
The system was working. Had been for ages. The kind of thing you stop thinking about because it’s just running.
Until it wasn’t.
The Morning
Coffee in hand, Wordle about to be opened. Email pinged. Application Insights alert. Then another. Then a few from the service desk because the client had also noticed.
Every Function and script in the system had a custom error handler that logged to App Insights with a [#TICKET] tag. An alert rule on that tag emailed me and the service desk. The tag itself doesn’t matter. [#TICKET], [#ALERT], [#GIRAFFE], whatever, as long as it’s unique enough never to collide with normal log noise. Point an alert at it. Done.
The errors were saying permissions weren’t being applied. PowerShell was complaining about something along the lines of “scope limit reached” / “unable to break inheritance”. The exact wording is in our ticketing system and I don’t have it in front of me. Gist was clear enough.
A bit of Googling. A bit of “wait, what limit?”. And there it was. 50,000 unique permission scopes per list or library. Hard cap. Microsoft documents it. Nobody reads it until they have to.
About an hour from “alerts fired” to “I know what’s wrong”. Another hour to confirm.
What Actually Causes It
A “unique permission scope” is anything that’s broken inheritance from its parent. Items, files, folders, the lot. Every time you give something its own permissions rather than letting it inherit, that’s another scope on the counter for whichever list or library it lives in. Whether you’ve shared it with one person or a thousand. One group or a thousand. It’s not the count of who has access, it’s the fact that this item has its own ACL list at all.
Important bit: the limit is per list or library, not per site, not per tenant. Which sounds like good news until you realise that most automation pipes everything into one big library. Mine certainly did. 5,000 residents, multiple statuses each, plus staff-driven changes over time, all sat in one library with the count creeping up. Quietly. With no warning. Until the platform says no.
You can’t know everything about every platform. SharePoint has loads of limits. Sites, subsites, storage, the fact you can change the SharePoint URL of a tenant once and only once. Nobody around me had heard of this 50k one either. The point isn’t “how did you not know”. The point is what you do when you find out.
The Triage
The system had a status that, for the purposes of this post, I’ll call “archived”. Different letters in real life, same role. Residents who’d left, kept for retention, not actively worked on day to day. Their permissions were the easiest to safely strip back.
Wrote a quick PowerShell to reduce the scopes on 500 of them. Then needed to know whether it had moved the needle. There’s no easy way to ask SharePoint for a current scope count directly, no Graph endpoint that just hands you the number. So the test was crude: try to break inheritance on something. If it succeeds, you’ve got headroom. If it fails with the same error, you don’t.
Hit the cap, manually removed one, was allowed to add one back. So the cap held. From the size of the reduction, worked backwards to a rough current count. Not exact, close enough to plan from.
Did the maths on what we’d save by treating the rest of that status, plus another low-traffic one. A – B = C, basic stuff. The point isn’t the calculation, it’s that nobody had been doing it. Including me. Which is the lesson.
Got runway to roughly 2029. Raised it with the client there and then. Cause, fix, plan. Got a thumbs up to do the broader consolidation properly after a planned bit of leave. Quick fix, game plan, sign-off, all in the day.
The proper consolidation, post-leave, pulled back somewhere around 10,000 scopes. Ballpark. Two years on, I’m not pretending I remember the exact number.
The Alarm
The whole point of going through this was to not go through it again. So we built one.
Weekly Azure Function. Audits residents and statuses, applies a coefficient per status (twenty if status X, one if status Y, based on how many scopes a resident in that status carries under the current model), totals it. Result goes to an Azure Table so we can watch the trend, not just the current number.
If it hits 45,000 at 90% of cap, it raises the alarm. Same email delivery as the other alerts. Plenty of headroom to plan, not panic.
The next mitigation, when the alarm eventually does fire, is to split the data across multiple libraries (regional groupings, because the limit is per library). We didn’t build it. By the time it’s needed, the requirement will probably be different to what we’d build today. If it ain’t broke yet, don’t fix it. Just make sure you know when it’s about to break.
Go Check Your Own
If you’re running a SharePoint estate with any kind of automated per-item permissioning, care, case management, HR records, anything where a record gets its own ACL, go and check your count. Today.
Here’s a starter for ten. PnP PowerShell, a few different auth options depending on where you’re running it from. The lifted-from-Stack-Overflow one-liner you’ve probably seen uses -UseWebLogin, which is dead, only counts list-level items so it misses files and folders inside libraries, and looks at HasUniqueRoleAssignments directly which silently returns null without explicitly loading the property. Three bugs in one. This one walks items and folders recursively, force-loads the property properly, and totals per list, because the highest single list count is what’ll bite you, not the sum.
[CmdletBinding()]
param(
[Parameter(Mandatory = $true)]
[string]$SiteUrl,
[string]$ListTitle,
[string]$ClientId,
[string]$Tenant,
[string]$ClientSecret,
[string]$CertificatePath,
[securestring]$CertificatePassword,
[string]$Thumbprint,
[switch]$UseDeviceLogin,
[switch]$UseOSLogin,
[switch]$ManagedIdentity
)
$ErrorActionPreference = "Stop"
if (-not (Get-Module -ListAvailable -Name PnP.PowerShell)) {
throw "PnP.PowerShell is not installed. Install it with: Install-Module PnP.PowerShell -Scope CurrentUser"
}
Import-Module PnP.PowerShell -Force
$connectParams = @{ Url = $SiteUrl }
$usesNonInteractiveAuth = $false
if (-not [string]::IsNullOrWhiteSpace($Tenant)) {
$connectParams.Tenant = $Tenant
}
if ($ManagedIdentity) {
$connectParams.ManagedIdentity = $true
$usesNonInteractiveAuth = $true
}
elseif (-not [string]::IsNullOrWhiteSpace($ClientSecret)) {
if ([string]::IsNullOrWhiteSpace($ClientId)) {
throw "ClientId is required with ClientSecret auth."
}
$connectParams.ClientId = $ClientId
$connectParams.ClientSecret = $ClientSecret
$usesNonInteractiveAuth = $true
}
elseif (-not [string]::IsNullOrWhiteSpace($Thumbprint)) {
if ([string]::IsNullOrWhiteSpace($ClientId)) {
throw "ClientId is required with certificate thumbprint auth."
}
$connectParams.ClientId = $ClientId
$connectParams.Thumbprint = $Thumbprint
$usesNonInteractiveAuth = $true
}
elseif (-not [string]::IsNullOrWhiteSpace($CertificatePath)) {
if ([string]::IsNullOrWhiteSpace($ClientId)) {
throw "ClientId is required with certificate file auth."
}
$connectParams.ClientId = $ClientId
$connectParams.CertificatePath = $CertificatePath
if ($CertificatePassword) {
$connectParams.CertificatePassword = $CertificatePassword
}
$usesNonInteractiveAuth = $true
}
else {
if ([string]::IsNullOrWhiteSpace($ClientId)) {
throw "ClientId is required for interactive, device-login, and OS-login auth."
}
$connectParams.ClientId = $ClientId
}
if (-not $usesNonInteractiveAuth) {
if ($UseDeviceLogin) {
$connectParams.DeviceLogin = $true
$connectParams.PersistLogin = $true
}
elseif ($UseOSLogin) {
$connectParams.OSLogin = $true
}
else {
$connectParams.Interactive = $true
}
}
Connect-PnPOnline @connectParams | Out-Null
$recursiveQuery = "<View Scope='RecursiveAll'><Query></Query></View>"
$lists = if (-not [string]::IsNullOrWhiteSpace($ListTitle)) {
@(Get-PnPList -Identity $ListTitle)
}
else {
@(Get-PnPList | Where-Object { -not $_.Hidden })
}
$results = foreach ($list in $lists) {
$count = 0
$items = Get-PnPListItem -List $list -PageSize 2000 -Query $recursiveQuery
foreach ($item in $items) {
$hasUnique = Get-PnPProperty -ClientObject $item -Property "HasUniqueRoleAssignments"
if ($hasUnique) {
$count++
}
}
[pscustomobject]@{
List = $list.Title
Scopes = $count
Status = if ($count -ge 48000) { "AT THE WALL" }
elseif ($count -ge 45000) { "ALARM (90%)" }
elseif ($count -ge 30000) { "Plan ahead" }
else { "OK" }
}
}
$results | Sort-Object Scopes -Descending
Caveats. Your app registration needs Sites.FullControl.All (or Sites.Selected granted on the target site). Run it on a single big library first to sanity-check the timing before pointing it at the whole estate. PnP enumeration with property-loading is slow on big lists, you’re looking at hours for 100k+ items. If that’s you, the SharePoint REST API is significantly faster (REST returns HasUniqueRoleAssignments directly without a per-item round trip), but the PnP version above is fine for most estates and far easier to drop into a Function. Even read-only enumerations can hammer a tenant, so don’t run it during business hours on the first go.
If you’re at 30k, plan. If you’re at 40k, plan harder. If you’re at 48k, stop reading this and go.
Then add the 90% alarm at go-live, not after the first incident. Costs ten minutes and an alert rule. The first incident costs you a fortnight.
🤷