Photo by Leah Kelley from Pexels

Updates on Common Certificate Authorities

About six weeks ago, I started on a project to figure out which of the 171 certificate authorities (CA) in my browser are actually needed for day to day web browsing. It turns out you only need 37. You could probably use just the top 16 and not have any major issues. 

Table of Results (as of 2021-10-13)

┌───────────────────────────┬───────────┐
│                        CA │ total_sum │
│                  String31 │     Int64 │
├───────────────────────────┼───────────┤
│                    CPanel │         1 │
│                     Apple │         1 │
│  Network Solutions L.L.C. │         1 │
│                  GoGetSSL │         1 │
│   Max-Planck-Gesellschaft │         1 │
│          GEANT Vereniging │         1 │
│                    TERENA │         2 │
│                GlobalSign │         2 │
│                    Google │         2 │
│      Buypass AS-983163327 │         2 │
│                Cybertrust │         2 │
│    The Trustico Group Ltd │         2 │
│                 IdenTrust │         2 │
│              SwissSign AG │         4 │
│                   Sectigo │         5 │
│    Starfield Technologies │         8 │
│               GoDaddy.com │         8 │
│                   ZeroSSL │        15 │
│                Apple Inc. │        47 │
│                     Gandi │        58 │
│           SSL Corporation │        85 │
│     Microsoft Corporation │        98 │
│                 Internet2 │       164 │
│                 Starfield │       247 │
│                    cPanel │       267 │
│                   Entrust │       300 │
│     Google Trust Services │       370 │
│         COMODO CA Limited │       539 │
│                   GoDaddy │      1524 │
│                  DigiCert │      1997 │
│ Google Trust Services LLC │      2576 │
│                    Amazon │      3018 │
│          GlobalSign nv-sa │      3073 │
│           Sectigo Limited │      3907 │
│              DigiCert Inc │      6028 │
│                Cloudflare │      7580 │
│             Let's Encrypt │     14300 │
└───────────────────────────┴───────────┘

As you can see, the data is sorted by count ascending. The most frequently recorded CAs are at the bottom. After playing around with various string distances, thanks to the great StringDistances.jl Julia package, here's the list with "similar" CAs combined.

Table of Results (similar CAs combined)

┌───────────────────────────┬───────────┐
│                        CA │ total_sum │
│                  String31 │     Int64 │
├───────────────────────────┼───────────┤
│                    CPanel │         1 │
│  Network Solutions L.L.C. │         1 │
│                  GoGetSSL │         1 │
│   Max-Planck-Gesellschaft │         1 │
│          GEANT Vereniging │         1 │
│                    TERENA │         2 │
│                GlobalSign │         2 │
│                    Google │         2 │
│      Buypass AS-983163327 │         2 │
│                Cybertrust │         2 │
│    The Trustico Group Ltd │         2 │
│                 IdenTrust │         2 │
│              SwissSign AG │         4 │
│                   Sectigo │         5 │
│    Starfield Technologies │         8 │
│               GoDaddy.com │         8 │
│                   ZeroSSL │        15 │
│                Apple Inc. │        47 │
│                     Gandi │        58 │
│                 Internet2 │       164 │
│                    cPanel │       268 │
│     Google Trust Services │       370 │
│     Microsoft Corporation │       398 │
│         COMODO CA Limited │       539 │
│                   GoDaddy │      1524 │
│ Google Trust Services LLC │      2576 │
│                    Amazon │      3018 │
│                 Starfield │      3405 │
│           Sectigo Limited │      3907 │
│                Cloudflare │      7580 │
│              DigiCert Inc │      8025 │
│             Let's Encrypt │     14300 │
└───────────────────────────┴───────────┘

As you can see, it's not very different. And I used a fairly conservative value for the JaroWinkler distance calculation. There are more CAs to be combined, but frankly, they aren't going to change the top 10.

All in all, it was a fun, short project. Obviously, my view of the Internet is probably different than yours. If I was in a different country, what CAs would appear in what order? One way to find out is to release a browser extension and collect data. Using couchDB or some other open source distributed datastore might help. Future projects for those wanting to learn browser extension development with JavaScript and back-end distributed data design!