Cookies
Introduction
The following chapter of the Web Almanac 2024 is focused on cookies. Cookies have multiple functionalities and are to some extent essential for the web—for example, for authentication, fraud prevention and security. However, some cookies can track users across websites and are utilized to build behavior profiles.
In this chapter, we measure the prevalence and structure of web cookies encountered while visiting mainly the top one million websites during the HTTP Archive crawl of June 2024.
Additionally, we discuss and measure the adoption of alternative mechanisms to third-party cookies that were introduced by Google on Chrome as part of the Privacy Sandbox initiative to reduce cross-site tracking.
We find that 61% of cookies are set in a third-party context. Generally, third-party cookies can be used for online tracking and targeted advertising. For this reason, Google proposed to phase out all third-party cookies and introduce more privacy-friendly options to replace their functionality with the Privacy Sandbox.
On the other hand, not all third-party cookies are used for online tracking. Browsers such as Chrome include a number of ways to limit the way that third-party cookies are used. For example, cookies that are partitioned (CHIPS) cannot be accessed across different top-level sites from the one the cookies are set on originally, which makes it impossible to track users across websites. Nonetheless, we find that the most prevalent partitioned cookies are set by domains related to advertising. Another example is the SameSite
cookies attribute, which ensures that (first-party) cookies are not included in cross-site requests by default. Trackers can disable this setting by explicitly setting the value of the SameSite
attribute to None
. Therefore, in practice, we find that for 11% of observed first-party cookies, SameSite
is set to None
. Additionally, we observe that the most widely set third-party cookies are used for advertising and analytics, with Google being prevalent on the largest percentage of websites.
First-party cookies can also be used to track recurring users. From our analysis, we conclude that the most prevalent first-party cookies are used for analytics. In theory, because of the same-origin policy, these cookies cannot be used for cross-site tracking. However, by using advanced tracking methods such as cookie syncing and CNAME tracking, trackers can bypass this limitation. We refer to the Privacy chapter for more details on online tracking methods.
Our results indicate both first-party and third-party tracking are common. We show that online tracking by means of cookies is still predominant on the web.
Definitions
First up let’s get a common understanding of some of the terms used in this chapter.
HTTP cookie
When a user visits a website, they interact with a web server that can request the user’s web browser to set and save an HTTP cookie. This cookie corresponds to data saved in a text string on the user’s device, and is sent with subsequent HTTP requests to the web server. Cookies are used to persist stateful information about users across multiple HTTP requests, which can allow authentication, session management, and tracking. Cookies are also associated with privacy and security risks.
First and third-party cookies
Cookies are set by a web server and there are two types of cookies: first-party and third-party cookies. First-party cookies are set by the same domain as the site the user is visiting, while third-party cookies are set from a different domain.
Third-party cookies may be from a third party, or from a different site or service belonging to the same “first party” as the top-level site. Third-party cookies are really cross-site cookies.
For example, imagine that the owner of the domain example.com
also owns example.net
and that the following cookies are set for a user visiting https://www.example.com
:
Cookie Name | Set by | Type of cookie | Reason |
---|---|---|---|
cookie_a |
www.example.com |
First-party | Same domain as visited website |
cookie_b |
cart.example.com |
First-party | Same domain as visited website: subdomains do not matter |
cookie_c |
www.example.edu |
Third-party | Different domain than visited website |
cookie_d |
tracking.example.org |
Third-party | Different domain than visited website |
cookie_e |
login.example.net |
Third-party | Different domain than visited website even if owned by the same owner in this example (cross-site cookie from the same “first party” at the top-level site) |
Privacy & security risks
Web tracking. Cookies are used by third parties to track users across websites and record their browsing behavior and interests. In targeted advertising, this data is leveraged to show users advertisements aligned with their interest. This tracking usually takes place the following way; third-party code embedded on a site can set a cookie that identifies a user. Then, the same third-party can record user activity by obtaining that cookie back when the user visits other websites where it is embedded as well (see also the Privacy chapter). We note that first-party cookies can also be used for online tracking, methods such as cookie syncing allow to bypass the limitation of third-party cookies and track users across different websites.
Cookie theft and session hijacking. Cookies are used to store session information such as credentials (session token) for authentication purposes across several HTTP requests. However, if these cookies were to be obtained by a malicious actor they could use them to authenticate to the corresponding web servers. If cookies are not properly set by web servers, they could be prone to cross-site vulnerabilities such as session hijacking, cross-site request forgery (CSRF), cross-site script inclusion (XSS), and others (see also the Security chapter).
Caveats
You can learn more about the methodology applied by the HTTP Archive for the Web Almanac in 2024 on the Methodology page. There are limitations to that methodology which may impact the results in this chapter:
- Data is collected by automatically visiting websites in a non-interactive way; user interaction could modify the way websites set and use cookies in practice. For example, HTTP Archive’s tools do not interact with cookie banners (if any) and so cookies that would be set after interaction with these banners are not observed by our study.
- Websites are visited from servers located in the US that have no cookie set when each independent website visit starts; this is quite different from a user accumulating and saving web cookies while browsing the web. The location from which visits are performed can impact cookie behavior due to regulation and legislation such as GDPR.
- For each website, the home page is visited as well as one other page from the same website.
- Most of the results presented in this chapter are based on the top one million most visited websites according to the Chrome User Experience Report (CrUX} that were successfully reached during the HTTP Archive crawl of June 2024.
- The cookies collected for the analysis in this chapter were obtained at the end of the visit of each website page by extracting all cookies stored by the web browser in its cookie jar. As a result, the collected data only contains cookies that are deemed valid by the web browser and successfully set. Thus, if websites attempt to set invalid cookies (too large, attributes mismatch, etc.) they would be missing from our analysis.
Notes
The figures plotted in this chapter indicate in their subtitle (a) the type of client device (desktop or mobile) that was used to access the websites for the plotted data and (b) the top number of websites visited (according to their CrUX rank). If the information is not specified, it must be on one of the axes of the graph.
Prevalence and structure of cookies
In this section, we report on the prevalence of cookies, their type, and their attributes on the web.
First and third-party prevalence
First-party cookies are set by the same domain as the website that the user is visiting, while third-party cookies are set by a different domain see Definitions. In this analysis, we examine the percentage of cookies set on websites that are first- and third-party across clients (desktop or mobile) and CrUX ranks.
On the top one million most visited websites, about 39% of the cookies are first-party and 61% are third-party cookies. Thus, a majority of the cookies set on the web are third-party cookies. We also observe that this distribution is very similar whether these websites are accessed through a desktop or a mobile client. This indicates that overall there is little to no behavior change based on the type of client used. However, some websites may still behave differently and/or use other tracking methods such as fingerprinting depending on the type of client (see the Privacy chapter for more).
Looking at the prevalence of the type of cookies across website rankings, we observe that more popular websites have a higher proportion of third-party cookies than the ones visited less often. For instance, in comparison to the results reported on the top one million websites, 23% and 77% of the cookies are first and third-party on the top one thousand (top one thousand) websites, respectively. This is likely due to the fact that websites that are the most visited by users embed more third-party code (that in turn sets more third-party cookies) than less visited ones. Additionally, the prevalence of each cookie type across the ranks is quite similar between desktop and mobile clients; we observe that previous remarks made on the top one million websites also hold across CrUX ranks.
Cookie attributes
Next, we discuss the distribution of different cookie attributes. Furthermore, we zoom into the use of the SameSite
cookie attribute. The following two figures show the proportion of first and third-party cookies set on the top one million websites for each client that have one of the following attributes set: Partitioned
, Session
, HttpOnly
, Secure
, SameSite
. Before diving into more details for each attribute, let’s observe here again the similarity of the distribution of the different attributes between desktop or mobile clients.
Partitioned
Partitioned cookies are stored by compatible browsers using partitioned storage. Cookies that have the Partitioned
attribute set can only be accessed by the same third party and from the same top-level site where they were created in the first place. In other words, partitioned cookies can not be used for third-party tracking across websites and allow for the legitimate use of third-party cookies on a top-level site. For more details see: Cookies Having Independent Partitioned State (CHIPS).
We observe that about 6% of third-party cookies set on desktop or mobile while visiting the top one million websites are partitioned. The next figure shows the most common partitioned cookies (name and domain) that are set in third-party context on the top one million websites. For each client (desktop and mobile) only the top ten partitioned cookies in percentage of websites they are seen on are reported. The top 2 most widely-used partitioned cookies are set by youtube.com
on 9.9% on desktop and 8.89% mobile websites. The YSC
cookie is used for security purposes i.e., to prevent fraud and abuse, and expires at the end of the user session, while VISITOR_INFO1_LIV
’s main purpose is analytics (see Google’s documentation). Most of the cookies listed in the graph are set by advertising domains e.g., adnxs.com
, criteo.com
, and doubleclick.net
.
Perhaps a bit surprising, 1% of all the first-party cookies that are set on the top one million websites (desktop and mobile client) are partitioned. However, partitioning cookies in a first-party context appears to be a bit redundant as first-party cookies are already accessible, by definition, only by that first-party on that top-level site. The following figure displays the top ten partitioned cookies set in first-party context for each client. receive-cookie-deprecation
is set by domains that participate in the testing phase of Chrome’s Privacy Sandbox. cf_clearance
and csrf_token
are cookies set by Cloudflare to indicate that the user has successfully completed an anti-bot challenge or to identify trusted web traffic, respectively.
Session
Session cookies are cookies that are only valid for a single user session. In other words, session cookies are temporary and expire once the user quits the corresponding website they were set on, or closes their web browser, whichever happens first. However, note that some web browsers allow users to restore a previous session on startup, in that case the session cookies set in that previous session are also restored.
The results from our analysis on the top one million websites in June 2024 show that 16% of first-party cookies and only 4% of third-party cookies are session cookies (on both desktop and mobile clients).
HttpOnly
The HttpOnly
attribute prevents cookies from being accessed by javascript code, this provides some mitigation against cross-site scripting (XSS) attacks. Note that setting the HttpOnly
attribute does not prevent cookies from being sent along XMLHttpRequest
or fetch
requests initiated from javascript.
Only 12% of first-party cookies have the HttpOnly
attribute set, while for third-party cookies 19% on desktop and 18% on mobile do.
Secure
Cookies with the Secure
attribute are only sent to requests made through HTTPs. This prevents man-in-the-middle attacks.
For first-party cookies, 23% on desktop and 22% on mobile have the Secure
attribute and all third-party cookies observed have the Secure
attribute. Indeed, these third-party cookies also have the SameSite=None
attribute that requires Secure
to be set (see the next section).
SameSite
The SameSite
cookie attribute allows sites to specify when cookies are included with cross-site requests:
SameSite=Strict
: a cookie is only sent in response to a request from the same site as the cookie’s origin.SameSite=Lax
: same asSameSite=Strict
except that the browser also sends the cookie on navigation to the cookie’s origin site. This is the default value ofSameSite
.SameSite=None
: cookies are sent on same-site or cross-site requests. This means that in order to make third-party tracking with cookies possible, the tracking cookies must have theirSameSite
attribute set toNone
.
To learn more about the SameSite
attribute, see the following references:
We observe that for each client about 33% of the first-party cookies and nearly 100% third-party cookies seen on the top one million websites have a SameSite
attribute that is explicitly set when they are created (reminder: SameSite
defaults to Lax
if not specified). The two bar charts above represent the distribution of this SameSite
attribute for first and third-party cookies across clients. We observe that the differences in results across clients is here again somewhat negligible. Nearly 100% of third-party cookies have SameSite=None
, and so are sent on cross-site requests. For first-party cookies, about 87% of them have the SameSite=Lax
(20% explicitly set the attribute, and the remaining 67% are concerned by the default behavior when SameSite
is not set). 11% of cookies have their SameSite
attributes explicitly set to have the value None
. It’s hard to determine the exact purpose for which cookies are set, but it is likely that a fraction of these cookies are used to track users in a first-party context. Only 2% of cookies have SameSite
set to Strict
.
Cookie prefixes
Two cookie prefixes __Host-
and __Secure-
can be used in the cookie name to indicate that they can only be set or modified by a secure HTTPS origin. This is to defend against session fixation attacks. Cookies with both prefixes must be set by a secure HTTPs origin and have the Secure
attribute set. Additionally, __Host-
cookies must not contain a Domain
attribute and have their Path
set to /
, thus __Host-
cookies are only sent back to the exact host they were set on, and so not to any parent domain.
We measure that 0.032% and 0.030% of the first-party cookies observed on desktop have the __Host-
and __Secure-
prefix set, respectively. These numbers are 0.001% for third-party cookies. These results show the very low adoption of these prefixes and the associated defense-in-depth measure since they were first introduced at the end of 2015.
Top first and third-party cookies and domains setting them
In the following section, we report for each client (desktop and mobile) the top ten first-party cookies, third-party cookies, as well as domains that set them. We comment on a few of them using results from Cookiepedia and invite curious readers to refer to this resource for more.
The first two first-party cookies _ga
and _gid
are set by Google Analytics to store client identifiers and statistics for site analytics reports, a majority of websites use Google Analytics (more than 60% and 35%, respectively). The third one _fbp
is set by Facebook and used for targeted advertising on 25% of the websites.
The IDE
and test_cookie
cookies are set by doubleclick.net
(owned by Google) and are the most common third-party cookies observed on the top one million websites; they are used for targeted advertising. DoubleClick checks if a user’s web browser supports third-party cookies by trying to set test_cookie
. MUID
from Microsoft comes next and is also used in targeted advertising to store the user’s unique identifier for cross-site tracking.
Among the ten most common domains that set cookies on the web, we only find domains involved in search, targeting, and advertising services. This result outlines the coverage that some third-parties have of the web, for example: Google’s owned advertising platform DoubleClick sets cookies on more than 44% of the top one million websites while others are at about 8% to 12%.
Number of cookies set by websites
Number of cookies (desktop top one million) | First-party | Third-party | All |
---|---|---|---|
min | 1 | 1 | 1 |
p25 | 3 | 2 | 4 |
median | 7 | 5 | 10 |
p75 | 13 | 17 | 24 |
p90 | 22 | 66 | 51 |
p95 | 46 | 331 | 323 |
max | 160 | 632 | 662 |
Number of cookies (mobile top one million) | First-party | Third-party | All |
---|---|---|---|
min | 1 | 1 | 1 |
p25 | 3 | 2 | 4 |
median | 7 | 4 | 9 |
p75 | 12 | 18 | 24 |
p90 | 21 | 64 | 52 |
p95 | 45 | 327 | 316 |
max | 168 | 604 | 645 |
Websites set a median of nine or ten cookies of any type overall, seven first-party cookies, and four or five third-party cookies for mobile and desktop clients, respectively. The tables above report several other statistics about the number of cookies observed per website and the figures below display their cumulative distribution functions (cdf). For example: on desktop a maximum of 160 first-party and 632 third-party cookies are set per website.
We see that more websites have a number of first-party cookies that is closer to the maximum of first-party cookies observed, than for third-party cookies.
Size of cookies
Size of cookies (desktop top one million) in bytes | First-party | Third-party | All |
---|---|---|---|
min | 1 | 1 | 1 |
p25 | 26 | 22 | 23 |
median | 39 | 36 | 37 |
p75 | 59 | 58 | 58 |
p90 | 148 | 114 | 128 |
p95 | 380 | 274 | 348 |
max | 4087 | 4094 | 4094 |
Size of cookies (mobile top one million) in bytes | First-party | Third-party | All |
---|---|---|---|
min | 1 | 1 | 1 |
p25 | 26 | 22 | 23 |
median | 39 | 37 | 38 |
p75 | 59 | 59 | 59 |
p90 | 149 | 114 | 130 |
p95 | 382 | 278 | 352 |
max | 4086 | 4093 | 4093 |
This section focuses on the actual size of these cookies. We find that the median size across all cookies observed on desktop during the HTTP Archive crawl of June 2024 is 37 bytes. This median value is consistent across first and third-party cookies as well as clients. The maximal size that we obtain is at about 4K bytes which is consistent with the limits defined in RFC 6265. Note that because of the way the HTTP Archive tools work and collect the cookies, if websites try to set cookies larger than the limit of 4K bytes this information would be missing from the data analyzed in this chapter.
The smallest cookies that we observe are of a single byte in size, they are likely set by error by empty Set-Cookie
headers. Additionally, we also report the cumulative distribution function (cdf) of the size of all the cookies seen on the top one million websites for each client.
Most cookies used for tracking have a size greater than 35 bytes. The reason for this is that size is related to the tracking capability of cookies: trackers assign identifiers randomly to users in order to be able to re-identify them. So the larger the size (number of bytes) for the identifier, the more unique users they can be assigned to.
Persistence (expiration)
Age of cookies (desktop top one million) in days | First-party | Third-party | All |
---|---|---|---|
min | 0 | 0 | 0 |
p25 | 1 | 30 | 30 |
median | 183 | 365 | 365 |
p75 | 396 | 365 | 396 |
p90 | 400 | 400 | 400 |
p95 | 400 | 400 | 400 |
max | 400 | 400 | 400 |
Age of cookies (mobile top one million) in days | First-party | Third-party | All |
---|---|---|---|
min | 0 | 0 | 0 |
p25 | 1 | 30 | 30 |
median | 183 | 365 | 365 |
p75 | 396 | 365 | 390 |
p90 | 400 | 400 | 400 |
p95 | 400 | 400 | 400 |
max | 400 | 400 | 400 |
After looking into cookie size, let’s now dive into cookie age. Cookies are set to an expiration date when they are created. Recall that session cookies expire immediately after the session is over (see previous section). The median age of first-party cookies is at about 183 days or roughly 6 months, while the median age of third-party cookies is a full year. After less than one day and thirty days, 25% of first-party and third-party cookies expire, respectively. The maximum age among the cookies that we can observe with the instrumentation and collection of the HTTP Archive Tools is of 400 days, this is aligned with the hard limits that Chrome imposes on cookie Expires
and Max-Age
attribute. Below, are the cumulative distribution functions (cdf) of the age of the cookies set on the top one million websites whether it is on a desktop or mobile client.
From the graph, we deduce that about 45 % of cookies expire after 90 days. We find the same results for both mobile and desktop clients. Additionally, 75% of cookies have a lifespan of maximum 1 year, while the other half remain stored in the browser for longer than a year. In theory, the longer the lifespan of the cookies, the longer that they can re-identify a recurring user. For this reason, most tracking cookies are typically stored in the browser for a longer time.
Privacy Sandbox initiative
In 2019, Google announced the launch of the Privacy Sandbox initiative to reduce cross-site (web) and cross-app (Android) tracking while retaining utility for advertising and other use cases that historically have relied on third-party cookies and other tracking mechanisms.
What is the Privacy Sandbox initiative?
The Privacy Sandbox is composed of more than 20 different proposals that aim to diminish the use of unique identifiers, limiting covert tracking, fighting spam and fraud, showing relevant ads to users, and measuring ad conversions.
Part of Google’s initial plan with the Privacy Sandbox was to deprecate third-party cookies, but in recent updates Google announced that this was not their intention anymore and that they would rather introduce a “new experience in Chrome that lets people make an informed choice that applies across their web browsing”. At the same time, Google will “continue to make the Privacy Sandbox APIs available and invest in them to further improve privacy and utility”.
We partnered with the Privacy chapter of the Web Almanac 2024 to measure adoption of the Privacy Sandbox APIs on the websites visited by the HTTP Archive crawl and will defer interested readers to their chapter for the analysis of the results. Next, we present an overview of the proposed mechanisms that are part of the Privacy Sandbox and aim at replacing a capability provided by cookies so far.
Topics API
The Topics API enables interest-based advertising, without using third-party cookies. The API allows callers (such as ad tech platforms) to access topics of interest that they have observed for a user, but without revealing additional information about the user’s activity.
See the Privacy chapter for some results about the adoption of the Topics API.
Protected Audience
The Protected Audience API enables on-device ad auctions to serve remarketing and custom audiences, without cross-site third-party tracking. Advertisers can add users to interest groups that are saved by the browser while users are navigating on the web. This allows advertisers to perform retargeted advertising by bidding on the available interest groups the user is part of when they visit a website where an ad auction is performed.
See the Privacy chapter for some results about the adoption of the Protected Audience API.
Attribution Reporting API
The Attribution Reporting API allows websites and third parties to measure ad conversion, i.e., when a view or a click on an advertisement leads later for example to a purchase. The Attribution Reporting API aims to enable measurement of ad conversion but without the use of cross-site identifiers and cookies.
See the Privacy chapter for some results about the adoption of the Attribution Reporting API.
CHIPS
Cookies Having Independent Partitioned State (CHIPS) allow web developers to specify that they would like the cookies that they are setting to be saved in a partitioned storage, i.e., in a separate cookie jar per top-level site. CHIPS cookies correspond to the partitioned cookies discussed previously in this chapter, in the partitioned section.
Related Website Sets
Related Website Sets allow websites from the same owner to share cookies among themselves. The creation and submission of a Related Website Set is done at the moment through opening a pull request on a GitHub repository that Google employees check and merge if deemed valid. Websites that belong to the same related website set must also indicate it by placing a corresponding file at the .well-known URI /.well-known/related-website-set.json
.
Chrome ships with a preloaded file containing related website sets validated by the Chrome team; at the moment of writing (version 2024.8.10.0
), there are 64 distinct related website sets. Each related website set contains a primary domain and a list of other domains related to the primary one below one of the following attributes: associatedSites
, servicesSites
, and/or ccTLDs
. These 64 primary domains are each associated with secondary domains as part of their set: 60 sets contain associatedSites
, 11 servicesSites
, and 7 ccTLDs
. We report on the following figure the number of secondary domains for each set. We observe that if a majority of the primary domains are associated with 5 or less secondary domains, https://journaldesfemmes.com
, https://ya.ru
, and https://mercadolibre.com
are linked to 8, 17, and 39 secondary domains among which third party requests are handled as if they were all from the first party, respectively.
Attestation file
In order to use some of the Privacy Sandbox APIs, API callers have to go through an enrollment process to declare that they will not abuse these APIs for cross-site re-identification, but only for their intended use cases. The legal implications of this commitment if not respected is quite unclear, but this allows these callers to obtain an attestation file that must be placed at the .well-known
URI /.well-know/privacy-sandbox-attestations.json
on the domain they registered to call these APIs from.
Chrome ships with a preloaded file containing a list of domains that have an attestation file registered. Currently, this list contains 257 distinct domains (version 2024.10.7.0
) that have enrolled to call the following APIs: Attribution Reporting, Protected App Signals (Android only), Private Aggregation (Chrome only), Protected Audience, Shared Storage (Chrome only), and Topics.
We used a custom crawler separate from the HTTP Archive tools to obtain and parse these attestation files. We successfully retrieved attestation files for 232 distinct domains with that crawler (some attestation files may be available but not obtained by this crawler due to networking issues for example). Next, we report the proportion of domains that are enrolled for each API on Chrome and Android. We observe that the majority of these origins are enrolled to call one of the five Chrome APIs requiring an attestation while the proportion is way less for the Android APIs.
Conclusion
In this chapter, we report on the use of cookies on the web. Our analysis allows us to answer multiple questions:
Which type of cookies is set by websites?
We find that the majority of cookies on the web (61%) are third-party. Moreover, more popular websites set significantly more third-party cookies, presumably because they generally include more third-party content. Additionally, we observe that about 6% of third-party cookies are partitioned (CHIPS). Partitioned cookies cannot be used for third-party tracking given that the cookie jar is separate for each website (domain) that the user visits. However, we find that partitioned cookies are predominantly set by advertising domains and are used for analytics.
Which cookie attributes are set?
Out of all cookies set, 16% of first-party cookies and only 4% of third-party cookies are session cookies. The remainder of the cookies are more persistent since they are not deleted when the user closes the browser. Generally, the average lifetime of cookies (the median) is 6 months for first-party and 1 year for third-party cookies.
Furthermore, for 100% of third-party cookies the SameSite
attribute is explicitly set to None
, which allows these cookies to be included in cross-site requests and therefore to track users with them.
Who sets cookies and what are they used for?
The top first-party cookies are mainly used for analytics. Google Analytics, whose primary function is to report on the use of websites by users i.e, first-party analytics, is prevalent on at least 60% of websites. Meta follows its footsteps, by setting first-party cookies on 25% websites.
Third-party cookies also predominantly set by Google: doubleclick.net
sets a cookie on 44% of websites. Other top trackers have a considerably smaller reach of 8-12% of websites. In general, the most popular third-party cookies belong predominantly to the targeted advertising category.
We conclude the chapter with an overview of the Privacy Sandbox, which aims to replace third-party cookies altogether, and refer to the Privacy chapter for more results.