# Be1st.ai | Glossary of Checks

> Complete glossary of all checks from the Be1st.ai audit.

103 checks in 6 categories total.

## Webový Odtlačok (22)

26 detekcií technológií, obsahu a infraštruktúry

### F1 — CMS / Content Management System

**What is it:** CMS platform detection based on meta tags, HTML comments, URL structure, cookies, and platform-specific scripts. Recognizes WordPress, Joomla, Drupal, Ghost, Wix, Squarespace, Webflow, Typo3, Nette, Laravel, Django, HubSpot, Stranka.sk, Webnode, Blogger, and more.

**Why it matters:** The CMS is the foundation of web infrastructure — it determines security risks, performance, SEO capabilities, and maintenance costs. WordPress has different vulnerabilities than Webflow, and an e-shop on Shoptet requires different optimization than WooCommerce.

**Real-world example:** A site running WordPress 6.x has access to thousands of plugins but requires regular updates. A Webflow site is maintenance-free but less flexible. Fingerprint detects the CMS even when the operator has removed visible markers.


---

### F2 — E-commerce Platform

**What is it:** Identification of the e-commerce solution — Shoptet, PrestaShop, WooCommerce, Magento, OpenCart, Webareal, Shopify, Shoper, Upgates. Detection is performed via specific URL patterns, cart scripts, payment integrations, and meta tags.

**Why it matters:** The e-commerce platform directly affects conversion rate, product page loading speed, product SEO, and marketplace integrations. Each platform has specific limitations and optimization options.

**Real-world example:** Shoptet has native integration with Heureka.sk and Zbozi.cz, while WooCommerce requires plugins. Magento can handle millions of products but is more demanding on hosting. Fingerprint identifies both the platform and its version.


---

### F3 — JS / CSS Frameworks + CDN

**What is it:** Detection of JavaScript frameworks (jQuery, React, Vue.js, Angular, Alpine.js, HTMX, Turbo, Stimulus, Svelte) with versions, CSS frameworks (Bootstrap, Tailwind, Bulma, Foundation), and CDN providers (Cloudflare, CloudFront, Akamai, Fastly, jsDelivr).

**Why it matters:** The tech stack determines how modern, performant, and maintainable a website is. React 18 with Next.js is more performant than jQuery spaghetti code. The CDN provider affects latency and availability for end users.

**Real-world example:** A site using React 18 + Next.js + Tailwind CSS via Cloudflare CDN is modern and fast. A site with jQuery 1.x + Bootstrap 3 without a CDN is outdated and slow. Fingerprint also reveals versions, which helps identify security risks.

#### Sources
- [HTTP Archive — Web Technology Report](https://httparchive.org/reports) — HTTP Archive

---

### F4 — Analytics & Marketing

**What is it:** Detection of analytics and marketing tools — Google Analytics (GA4, UA), GTM, Facebook Pixel, Hotjar, Heureka, Sklik, Criteo, Google Ads, SmartSupp, Biano, Luigi's Box, CookieYes, and more.

**Why it matters:** Analytics tools indicate the level of a company's digital maturity. A site without GA4 has no visitor data. The presence of remarketing pixels indicates active online marketing. CookieYes suggests GDPR compliance.

**Real-world example:** An e-shop with GA4 + GTM + Facebook Pixel + Heureka tracking has sophisticated analytics. A blog without any analytics has no visibility into traffic. Fingerprint also detects duplicate or conflicting tracking codes.

#### Sources
- [Google Tag Manager](https://tagmanager.google.com/) — Google

---

### F5 — Payment Gateways

**What is it:** Identification of payment gateways and methods — GoPay, Stripe, PayPal, Comgate, Tatrapay, Sporopay, CardPay, Cash on Delivery, Bank Transfer. Detection via JavaScript SDK, checkout URL patterns, and form elements.

**Why it matters:** Payment methods directly affect e-shop conversion rates. Customers expect card payments, bank transfers, and cash on delivery. Missing payment methods result in lost orders.

**Real-world example:** An e-shop with GoPay (card + bank transfer) + cash on delivery covers 90% of Slovak customers. A site with only PayPal loses customers who don't have a PayPal account. Stripe is preferred for international payments.

#### Sources
- [GoPay — Payment Gateway](https://www.gopay.com/) — GoPay

---

### F6 — Fonts

**What is it:** Detection of fonts in use — Google Fonts (with family extraction), Adobe Fonts (Typekit), Font Awesome, Custom WOFF/WOFF2. Analysis of the number of font families and their impact on performance.

**Why it matters:** Fonts are often the biggest render-blocking resource on a page. Each font family adds 50-200 KB to download. Too many fonts slow down LCP (Largest Contentful Paint) and degrade Core Web Vitals.

**Real-world example:** A site with 1-2 Google Fonts families has optimal loading. A site with 6+ different fonts and Font Awesome icons can have 500ms+ slower first render. Self-hosted WOFF2 fonts are faster than Google Fonts CDN.

#### Sources
- [Google Fonts](https://fonts.google.com/) — Google

---

### F7 — CDN Provider

**What is it:** CDN provider identification — Cloudflare, Fastly, Akamai, CloudFront, Google CDN. Detection via HTTP headers (cf-ray, x-cache, x-amz-cf-id), DNS records, and certificates.

**Why it matters:** A CDN dramatically reduces latency for end users. A site without a CDN serves content from a single server, resulting in higher latency for distant visitors. Cloudflare also provides DDoS protection and WAF.

**Real-world example:** A site behind Cloudflare has TTFB under 100ms even for visitors from other continents. A site on shared hosting without a CDN can have TTFB of 500ms+ for international visitors. A CDN also reduces load on the origin server.

#### Sources
- [Cloudflare — How CDN Works](https://www.cloudflare.com/learning/cdn/what-is-a-cdn/) — Cloudflare

---

### F8 — Hosting / Server Info

**What is it:** Web server and reverse proxy detection — Nginx, Apache, LiteSpeed, IIS, Tomcat + version + release year. Reverse proxy: Varnish, BigIP, HAProxy, Envoy, Traefik. Identification via Server header and specific headers.

**Why it matters:** Server software and its version affect performance and security. An outdated Apache version may contain known vulnerabilities. LiteSpeed is faster than Apache for PHP sites. A reverse proxy indicates enterprise infrastructure.

**Real-world example:** A site on Nginx 1.25 + Varnish cache has enterprise-grade infrastructure. A site on Apache 2.2 (EOL since 2018) is a security risk. Fingerprint reveals exact versions, which helps with security audits.

#### Sources
- [Netcraft — Web Server Survey](https://www.netcraft.com/) — Netcraft

---

### F9 — Website Type Classification

**What is it:** Heuristic website classification based on detections — e-shop, marketplace, blog, forum, social network, aggregator, news portal, wiki, portfolio, catalog, booking, SaaS, streaming. Uses a combination of CMS, e-commerce platforms, and content.

**Why it matters:** The website type determines relevant metrics and benchmarks. An e-shop is evaluated differently than a blog — conversion rate vs. time on page. Classification enables comparison with relevant competitors in the same category.

**Real-world example:** A site with WooCommerce + product pages + a cart is classified as an e-shop. A site with WordPress + articles without products is a blog. A SaaS site has a login page, pricing, and documentation.

#### Sources
- [Schema.org — WebSite Type](https://schema.org/WebSite) — Schema.org

---

### F10 — Social Networks

**What is it:** Detection of social network links — Facebook, Instagram, Twitter/X, LinkedIn, YouTube, TikTok, Pinterest. URL extraction from footer links, meta tags (og:see_also), and JSON-LD.

**Why it matters:** Social media presence indicates a company's digital maturity and marketing strategy. A LinkedIn profile suggests B2B focus, TikTok suggests a younger target audience. Absence of social media may signal an inactive business.

**Real-world example:** A company with Facebook + Instagram + LinkedIn + YouTube has a comprehensive social presence. An e-shop with only a Facebook page is using a minimum of channels. Fingerprint extracts the exact URL for each platform.

#### Sources
- [Open Graph Protocol](https://ogp.me/) — Open Graph

---

### C1 — Visible Text Extraction

**What is it:** Removal of HTML tags, scripts, styles, and invisible elements — a clean text representation of the page. Used as input for keyword extraction, embeddings, and AI analysis.

**Why it matters:** Clean text is the foundation for all content analysis. AI models and search engines work with text, not HTML code. Quality extraction filters out navigational noise and preserves only content-relevant text.

**Real-world example:** From an e-shop HTML page, extraction removes the menu, footer, cookie banner and retains the product description, specifications, and reviews. This clean text is then used to generate embeddings and extract keywords.

#### Sources
- [Google Search Essentials — Crawling](https://developers.google.com/search/docs/essentials) — Google

---

### C2 — Word Count

**What is it:** A basic content length metric for the analyzed page. Counts words in the extracted visible text after removing HTML tags and scripts.

**Why it matters:** Content length correlates with information depth and SEO performance. Pages with fewer than 300 words are considered 'thin content'. AI models prefer more comprehensive sources when generating responses.

**Real-world example:** A product page with 50 words doesn't have enough information for SEO or AI. An article with 1500+ words has a greater chance of ranking in Google and being cited in AI responses. The optimal length depends on the page type.

#### Sources
- [Creating Helpful Content](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Google

---

### KW1 — Keywords — Extraction

**What is it:** Automatic keyword extraction from URL paths, H1, title, meta description, breadcrumbs, category tree, and headings. Scoring: weight x log2(frequency + 1) x log2(product_count + 2).

**Why it matters:** Keywords define the thematic focus of a website and are the foundation for both SEO and AI visibility. Automatic extraction reveals what topics a site actually focuses on — often different from what the owner believes.

**Real-world example:** An electronics e-shop has the strongest keywords 'mobile phone', 'notebook', 'tablet'. However, if 'sale' appears as the strongest word in the extraction, the site communicates discounts rather than products.

#### Sources
- [Google SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google

---

### KW2 — Keywords — Categorization

**What is it:** Classification of extracted keywords into categories — product, service, location, brand. Helps understand the thematic structure of a website and identify content gaps.

**Why it matters:** Keyword categorization shows whether a site covers all important aspects. An e-shop should have strong product keywords, a local business should have local ones. Gaps in categories indicate missing content.

**Real-world example:** A restaurant in Bratislava has strong product keywords ('pizza', 'pasta') but is missing local ones ('Bratislava', 'Old Town'). This means weak local SEO visibility and a low chance of appearing in AI responses to local queries.

#### Sources
- [Google SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google

---

### SM1 — Sitemap Existence

**What is it:** Check whether the site has an accessible sitemap.xml or sitemap index at standard URLs (/sitemap.xml, /sitemap_index.xml). Verification of HTTP status and XML format validity.

**Why it matters:** A sitemap is a map of the website for search engines and AI crawlers. Without a sitemap, crawlers must discover pages through links, which is slower and less reliable. Both Google and AI bots use sitemaps for efficient indexing.

**Real-world example:** An e-shop with 10,000 products without a sitemap risks Google not discovering 30-50% of product pages. A site with an up-to-date sitemap has all pages indexed within 48 hours of publication.

#### Sources
- [Sitemaps — Google Search Central](https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview) — Google

---

### SM2 — URL Count in Sitemap

**What is it:** Counting URLs in the sitemap — the basis for tier recommendation (FREE=1, BASIC=20, PRO=50+ URLs). Analysis of URL distribution across subdomains and sections.

**Why it matters:** The URL count determines the website's scope and the recommended audit tier. A small site with 5 URLs only needs a basic audit, while a large e-shop with thousands of products needs the PRO tier for a complete analysis.

**Real-world example:** A personal blog with 10 articles falls into the BASIC tier. An e-shop with 500 product pages needs the PRO tier to analyze all URLs. The number of URLs in the sitemap vs. the actual page count reveals indexing issues.

#### Sources
- [Sitemaps — Google Search Central](https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview) — Google

---

### SM3 — Sitemap Validity

**What is it:** Verification of sitemap XML format, URL correctness, and accessibility of linked pages. Checks lastmod dates, changefreq, and priority attributes.

**Why it matters:** An invalid sitemap can cause crawlers to ignore it. Incorrect URLs, missing namespaces, or invalid dates lead to indexing errors. Up-to-date lastmod dates help crawlers re-crawl efficiently.

**Real-world example:** A sitemap with URLs pointing to 404 pages signals a neglected website. A sitemap without lastmod dates doesn't allow crawlers to distinguish new from old content. A valid sitemap with current dates speeds up indexing.

#### Sources
- [Sitemaps XML Format](https://www.sitemaps.org/protocol.html) — sitemaps.org

---

### SSL1 — SSL Certificate — Existence

**What is it:** Verification that the domain uses HTTPS with a valid SSL/TLS certificate. Checks HTTP to HTTPS redirect and certificate validity for the given domain.

**Why it matters:** HTTPS has been a Google ranking requirement since 2018. Chrome and Firefox browsers display a 'Not Secure' warning for HTTP sites. SSL is essential for user trust and data protection in transit.

**Real-world example:** A site without SSL shows a red warning in the browser, immediately deterring visitors. An e-shop without HTTPS cannot accept card payments. All modern websites must have a valid SSL certificate.

#### Sources
- [HTTPS as a Ranking Signal](https://developers.google.com/search/blog/2014/08/https-as-ranking-signal) — Google

---

### SSL2 — SSL Certificate — Issuer

**What is it:** Identification of the SSL certificate issuer — Let's Encrypt, DigiCert, Sectigo, GlobalSign, GeoTrust, and others. Certificate type: DV (Domain Validation), OV (Organization Validation), EV (Extended Validation).

**Why it matters:** The certificate type indicates the level of identity verification. DV (Let's Encrypt) only verifies domain ownership. OV and EV also verify the organization. For e-shops and financial services, an OV/EV certificate signals trustworthiness.

**Real-world example:** A bank with an EV certificate (DigiCert) has the highest level of verification. A blog with a Let's Encrypt DV certificate has basic encryption. Both are secure, but EV provides greater trust for sensitive transactions.

#### Sources
- [Let's Encrypt — How It Works](https://letsencrypt.org/how-it-works/) — Let's Encrypt

---

### SSL3 — SSL Certificate — Validity

**What is it:** Check of the SSL certificate expiration date and the number of days until expiry. Warning for certificates approaching expiration (less than 30 days).

**Why it matters:** An expired SSL certificate causes the browser to block access to the site with an error page. Automatic renewal (Let's Encrypt, Cloudflare) eliminates this risk. Manually managed certificates require monitoring.

**Real-world example:** A certificate with 340 days of validity is fine. A certificate with 5 days until expiration requires immediate renewal. Let's Encrypt certificates auto-renew every 90 days, while commercial certificates renew annually.

#### Sources
- [SSL Labs — SSL Server Test](https://www.ssllabs.com/ssltest/) — Qualys

---

### EMB1 — Vector Embeddings

**What is it:** Generation of 1024-dimensional vector embeddings from extracted text using the BGE-M3 model via OpenRouter. Vectors are stored in a pgvector database for semantic search.

**Why it matters:** Vector embeddings enable semantic comparison of websites — not by keywords, but by content meaning. Two sites with different words but the same focus will have similar vectors.

**Real-world example:** An electronics e-shop and a tech blog about gadgets will have similar embeddings, even though they use different terminology. The cosine similarity between their vectors will be high (>0.8), signaling content relatedness.

#### Sources
- [BGE-M3 — Multi-Lingual Multi-Granularity Embedding Model](https://huggingface.co/BAAI/bge-m3) — BAAI

---

### EMB2 — Competitor Similarity

**What is it:** Cosine similarity search in the embeddings database — finding the most content-similar websites in the Be1st.ai database. Result: TOP N closest domains with similarity percentage.

**Why it matters:** Automatically finding similar websites reveals competitors the owner may not have known about. It also helps benchmark the site against actual competition rather than subjective estimates.

**Real-world example:** A Slovak clothing e-shop gets a list of the 5 closest sites from the database — e.g., ZOOT.sk (92%), About You (88%), Answear.sk (85%). The owner thus discovers who they are actually competing with for customers online.

#### Sources
- [pgvector — Open-Source Vector Similarity Search for Postgres](https://github.com/pgvector/pgvector) — pgvector

---

## AI Pripravenosť webu (20)

20 kontrol pripravenosti na AI

### A1 — Organization schema

**What is it:** Verifies the presence of Organization schema markup that defines core business information — name, logo, contact details, social networks, and legal form. This markup creates a digital identity for the organization.

**Why it matters:** AI systems use Organization schema to build a knowledge graph about a company. When a user asks ChatGPT or Google AI about your business, structured data ensures accurate and complete responses including logo and contact information.

**Real-world example:** When Google displays a Knowledge Panel for a company like Deutsche Bank, it draws from Organization schema. Companies without this markup often lack a Knowledge Panel or display incorrect information sourced from third parties.

#### Sources
- [Local Business Structured Data](https://developers.google.com/search/docs/appearance/structured-data/local-business) — Google

---

### A2 — Product schema

**What is it:** Checks for Product schema on e-commerce product pages and service listings. It includes price, availability, ratings, and product description in a machine-readable format. For non-e-commerce sites, this check is automatically skipped (N/A).

**Why it matters:** AI assistants like Google Shopping and ChatGPT plugins use Product schema to compare products and generate recommendations. Without this markup, your products won't appear in AI-driven shopping results.

**Real-world example:** Amazon has Product schema on every product page, enabling Google to display price, availability, and ratings directly in search results. E-shops without Product schema lose up to 30% of organic traffic from shopping queries.

#### Sources
- [Product Structured Data](https://developers.google.com/search/docs/appearance/structured-data/product) — Google

---

### A3 — FAQ schema

**What is it:** Verifies the presence of FAQ (Frequently Asked Questions) schema markup on pages with frequently asked questions. For pages without an FAQ section, this check is automatically skipped (N/A).

**Why it matters:** FAQ schema is a direct source for AI answers. When ChatGPT or Google AI Overviews look for an answer to a question, they prefer content marked with FAQPage schema because it's already in a question-answer format.

**Real-world example:** Cloudflare has FAQ schema on its product pages, which causes their answers to appear directly in Google results as expandable questions. This increases SERP real estate and CTR.

#### Sources
- [FAQ Structured Data](https://developers.google.com/search/docs/appearance/structured-data/faq) — Google

---

### A4 — Review/Rating schema

**What is it:** Verifies the presence of Review and AggregateRating schema on pages with product or service reviews. For websites without reviews or ratings, this check is automatically skipped (N/A).

**Why it matters:** AI models use Review schema as a signal of trustworthiness and quality. When AI generates recommendations, it prioritizes sources with verified ratings and reviews from real users.

**Real-world example:** Booking.com uses AggregateRating schema on all hotels, enabling Google to display star ratings directly in search results. Hotels with visible ratings have a 25% higher click-through rate.

#### Sources
- [Review Snippet Structured Data](https://developers.google.com/search/docs/appearance/structured-data/review-snippet) — Google

---

### A5 — AI bot policy

**What is it:** Analyzes robots.txt and meta tags for AI crawlers (GPTBot, ClaudeBot, Google-Extended, and others). The check determines whether the site has an explicit policy for AI bots, evaluates the clarity and comprehensibility of rules — whether robots.txt contains explicit rules for each AI crawler separately, whether a public page explaining the policy exists, and whether the rules are consistent across robots.txt and meta tags. It also evaluates the strategic tradeoff — blocking AI bots may protect content but reduces AI visibility.

**Why it matters:** Strategic management of AI bot access is crucial. Complete blocking means your company won't exist in AI answers. Conversely, full access may lead to unwanted training on your content without compensation. An unclear AI bot policy leads to inconsistent crawler behavior.

**Real-world example:** The New York Times blocked GPTBot in robots.txt, protecting its content from AI training but losing visibility in ChatGPT. Conversely, Stripe allows AI crawlers because it wants to be the primary source of information about payment APIs.

#### Sources
- [Overview of OpenAI Crawlers](https://developers.openai.com/api/docs/bots) — OpenAI
- [Anthropic Web Crawler Policy](https://privacy.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) — Anthropic
- [AI Crawl Control](https://developers.cloudflare.com/ai-crawl-control/) — Cloudflare

---

### A6 — llms.txt existence

**What is it:** Checks for the existence of the /llms.txt file — a new standard that provides AI models with a structured overview of a website in Markdown format. Only the presence of the file is evaluated — content quality is assessed in a separate check A7.

**Why it matters:** The llms.txt file is designed specifically for LLM models to quickly understand a website's purpose, services, and structure. Unlike robots.txt which controls access, llms.txt actively helps AI understand your content. Its mere existence is a strong signal that the site is AI-ready.

**Real-world example:** Stripe has one of the best llms.txt files — it contains a product overview, links to documentation, and API references. Thanks to this, ChatGPT and Claude can accurately answer questions about Stripe products.

#### Sources
- [The /llms.txt File Specification](https://llmstxt.org/) — llmstxt.org
- [OpenAI llms.txt Documentation](https://developers.openai.com/api/docs/llms.txt) — OpenAI

---

### A7 — llms.txt content quality

**What is it:** Evaluates the content quality of the /llms.txt file — length, website description, number of section links, presence of .md links, and overall informativeness. If llms.txt doesn't exist (A6=fail), this check is automatically N/A.

**Why it matters:** Having an empty or minimal llms.txt is not enough. AI models need quality content — a website description, links to key sections, .md versions of documentation. A high-quality llms.txt significantly improves the accuracy of AI responses about your site.

**Real-world example:** OpenAI has a detailed overview at developers.openai.com/api/docs/llms.txt with links to all API sections including .md versions. A low-quality llms.txt with a single line 'This is our website' barely helps AI models at all.

#### Sources
- [The /llms.txt File Specification](https://llmstxt.org/) — llmstxt.org

---

### A8 — llms-full.txt (bonus)

**What is it:** Bonus check for the existence of the /llms-full.txt file — an extended version of llms.txt containing full documentation content in a single Markdown file. This check never penalizes — if the file doesn't exist, it's scored as N/A and doesn't affect the overall score.

**Why it matters:** The llms-full.txt file allows AI models to load entire documentation in a single request without needing to crawl dozens of pages. This dramatically speeds up product comprehension and reduces errors in AI responses. Since this is a specific need of documentation sites, the check works as a bonus — it rewards sites that have it but doesn't penalize those that don't.

**Real-world example:** OpenAI provides llms-full.txt at developers.openai.com/api/llms-full.txt, containing the entire API documentation in a single file. Competing AI services can thus quickly index the OpenAI API without crawling hundreds of pages.

#### Sources
- [The /llms.txt File Specification](https://llmstxt.org/) — llmstxt.org
- [OpenAI llms.txt Documentation](https://developers.openai.com/api/docs/llms.txt) — OpenAI

---

### A9 — BLUF (Bottom Line Up Front)

**What is it:** Evaluates whether the page presents the key information at the beginning of the content — the BLUF principle. AI models extract answers primarily from the first paragraphs, so placing the main point at the top is critical.

**Why it matters:** AI models assign the highest weight to the first paragraphs of a page when generating responses. If key information is buried in the middle or at the end of the text, AI may overlook it and use less relevant information instead.

**Real-world example:** Wikipedia uses the BLUF principle on every article — the first paragraph always contains the definition and the most important facts. That's why AI models cite Wikipedia so often — the key information is always at the beginning.

#### Sources
- [Creating Helpful Content](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Google

---

### A10 — Content structure (lists, tables)

**What is it:** Analyzes content structure — use of bulleted and numbered lists, tables, and structured formats. Well-structured content with lists and tables is easier for AI models to parse.

**Why it matters:** AI models process structured content more efficiently than continuous text. Lists enable extraction of bullet-point answers, and tables provide comparable data in a clear format.

**Real-world example:** Cloudflare documentation uses code blocks, lists, and tables on every page. That's why AI assistants can accurately answer technical questions about Cloudflare products with specific parameters from tables.

#### Sources
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google

---

### A11 — FAQ section on page

**What is it:** Checks for the presence of an FAQ section directly in the page content (not just schema markup). For pages where an FAQ doesn't make sense (e.g., contact pages), the check is skipped (N/A).

**Why it matters:** FAQ sections are an ideal source for AI answers because they contain question-answer pairs in natural language. AI models can directly use these pairs as responses to user queries.

**Real-world example:** Shopify has an FAQ section on every product page with real customer questions. These answers regularly appear in Google AI Overviews and ChatGPT because they precisely match common user queries.

#### Sources
- [FAQ Structured Data](https://developers.google.com/search/docs/appearance/structured-data/faq) — Google

---

### A12 — Definitions/glossary patterns

**What is it:** Detects definition patterns and glossary sections on the page — for example, 'Term: definition' format, definition lists (dl/dt/dd), or dedicated glossary pages. For pages without specialized terminology, this check is skipped (N/A).

**Why it matters:** AI models actively search for term definitions to build their knowledge base. Pages with clearly marked definitions become an authoritative source for AI when explaining technical terms.

**Real-world example:** MDN Web Docs (Mozilla) uses a consistent definition pattern for every web API and CSS property. That's why when you ask ChatGPT about CSS flexbox, the answer often comes from MDN definitions.

#### Sources
- [Creating Helpful Content](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Google

---

### A13 — Knowledge chunkability & content depth

**What is it:** Evaluates how well the page content can be divided into standalone knowledge blocks (chunks). Each chunk should contain one complete idea with a clear heading and context. Content uniqueness, depth, and clarity are also assessed.

**Why it matters:** RAG (Retrieval-Augmented Generation) systems split web content into chunks before storing them in a vector database. If content is poorly segmented, chunks lose context and AI generates inaccurate or incomplete responses.

**Real-world example:** Stripe API documentation has each endpoint in a separate section with a heading, description, parameters, and examples. This allows AI systems to accurately extract information about a specific endpoint without contamination from other sections.

#### Sources
- [Creating Helpful Content](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Google
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google

---

### A14 — Reference / evidence signals

**What is it:** Checks for the presence of references, citations, and evidence in the content — external links to studies, statistics with source attribution, expert quotes. These signals increase content credibility for AI systems.

**Why it matters:** AI models evaluate source credibility based on references and evidence. Content backed by verifiable sources has a higher chance of being cited in AI responses.

**Real-world example:** Articles on HubSpot Blog always contain links to research, statistics, and case studies. That's why HubSpot articles are among the most frequently cited sources in AI responses to marketing questions.

#### Sources
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google

---

### A15 — Freshness signals

**What is it:** Detects content freshness signals — publication date, last updated date, document version. For homepages, this check is skipped (N/A) since homepages typically don't have a publication date.

**Why it matters:** AI models prefer current content and use freshness signals to decide which source to cite. Outdated content without an update date has lower priority in AI responses.

**Real-world example:** Google Cloud documentation displays a 'Last updated' date on every page. When AI compares two sources on the same topic, it prefers the one with a more recent update date.

#### Sources
- [Creating Helpful Content](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Google

---

### A16 — Entity & author completeness

**What is it:** Checks the completeness of author and entity information on the page — author name, bio, contact, social profiles, and organization affiliation. Evaluates E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness).

**Why it matters:** AI systems build a knowledge graph of authors and organizations. Complete author profiles increase content credibility, and AI models prefer to cite content from identifiable experts in a given field.

**Real-world example:** Articles on Mayo Clinic always include the doctor's name, specialization, and credentials. That's why AI models prioritize Mayo Clinic over anonymous health websites for medical questions.

#### Sources
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google

---

### A17 — AI-friendly content formats & feeds

**What is it:** Evaluates whether the site provides content in formats optimized for AI processing — Markdown page versions, RSS feed with full content, API or data feeds. For sites where an API doesn't make sense, the API portion of the check is skipped (N/A).

**Why it matters:** AI crawlers and agents prefer clean content without navigational noise. Sites that provide Markdown or clean content versions are processed more efficiently and have a higher chance of being included in AI responses.

**Real-world example:** Cloudflare offers a Markdown for Agents feature — when an AI agent sends a request with Accept: text/markdown, it receives a clean Markdown version of the page instead of full HTML. Stripe provides a comprehensive REST API with an OpenAPI specification.

#### Sources
- [Cloudflare Markdown Conversion](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/) — Cloudflare
- [The /llms.txt File Specification](https://llmstxt.org/) — llmstxt.org

---

### A18 — Linkability of key facts

**What is it:** Checks whether key facts and information on the page have direct URLs (anchor links) — whether headings contain ID attributes, whether deep links to specific sections exist, and whether specific information can be shared directly.

**Why it matters:** AI systems need to link to specific facts, not just entire pages. When AI cites a source, a deep link to the specific section increases response credibility and allows the user to quickly verify the information.

**Real-world example:** GitHub documentation automatically generates anchor links for every heading, enabling precise citation. When ChatGPT references GitHub docs, the user goes directly to the relevant section.

#### Sources
- [Google SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google

---

### A19 — Changelog / release notes

**What is it:** Detects a changelog or release notes on the site — change history, new features, fixed bugs. For sites where a changelog doesn't make sense (restaurants, local services), the check is skipped (N/A).

**Why it matters:** A changelog is a key freshness signal for AI models. A regularly updated changelog signals an actively maintained product, and AI models use it to verify the currency of product information.

**Real-world example:** Vercel has a public changelog at vercel.com/changelog with dates and detailed change descriptions. When a user asks AI about the latest Vercel features, AI can provide a current answer precisely because of the structured changelog.

#### Sources
- [Creating Helpful Content](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) — Google

---

### A20 — Semantic HTML (article, section, nav, aside)

**What is it:** Checks the use of semantic HTML5 elements — article, section, nav, aside, main, header, footer. These elements provide AI models with contextual information about the role of each part of the page.

**Why it matters:** AI crawlers use semantic HTML elements to identify the main content of a page. The article element marks primary content, nav marks navigation, and aside marks secondary content — this helps AI ignore noise and extract relevant content.

**Real-world example:** Web.dev (Google) consistently uses semantic HTML — main content is in article, navigation in nav, and related links in aside. AI crawlers can thus efficiently extract only the educational content without navigational noise.

#### Sources
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google

---

## SEO (30)

30 kontrol optimalizácie pre vyhľadávače

### S1 — Title Tag

**What is it:** Checks for the presence of an HTML <title> tag on the page. The title tag is one of the most important on-page SEO factors, displayed in search results as the main link heading.

**Why it matters:** Without a title tag, search engines cannot properly display your page in results. Google uses the title tag as the primary signal for understanding page content and displaying it in SERPs.

**Real-world example:** Amazon.com uses precise title tags for every product, e.g. '<title>iPhone 15 Pro Max - Apple Smartphone | Amazon.com</title>'. Pages without a title tag can lose up to 30% of organic traffic.

#### Sources
- [HTML <title> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title) — MDN Web Docs
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S2 — Title - Length

**What is it:** Checks the length of the title tag, which should be 50-60 characters. A title that is too short does not utilize its full potential in search results, while one that is too long gets truncated with an ellipsis.

**Why it matters:** Google displays approximately 50-60 characters from the title tag in search results. An optimal title within this range maximizes visibility and click-through rate (CTR).

**Real-world example:** Wikipedia literally optimizes the title of every page. For example, 'Bratislava - Wikipedia' (22 characters) is short but effective. Conversely, a title with 90 characters gets truncated in Google and the user cannot see the full information.

#### Sources
- [HTML <title> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title) — MDN Web Docs
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S3 — H1 Heading

**What is it:** Checks whether the page contains exactly one H1 heading. The H1 is the main heading of the page, which tells search engines what the page is about. Multiple H1s or no H1 at all are problematic.

**Why it matters:** The H1 heading is the second most important on-page SEO element after the title tag. It helps search engines and users quickly understand the main topic of the page. Exactly one H1 per page is the established standard.

**Real-world example:** Google.com always uses exactly one H1 for every article on its blog. Pages with multiple H1 headings confuse search engines about the content hierarchy and can lower rankings.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S4 — Heading Hierarchy

**What is it:** Checks for proper heading hierarchy from H1 through H6 without skipping levels. Correct hierarchy means H1 -> H2 -> H3 without jumping levels (e.g., going from H1 directly to H3).

**Why it matters:** Proper heading hierarchy helps search engines understand the structure and relationships between content sections. It also improves accessibility for screen readers and the overall usability of the page.

**Real-world example:** MDN Web Docs uses flawless heading hierarchy on every documentation page. Conversely, pages that jump from H1 to H4 lose semantic context and search engines cannot properly interpret the content structure.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S5 — Canonical URL

**What is it:** Checks for the presence of a canonical tag (<link rel='canonical'>), which specifies the preferred version of a page. It helps prevent duplicate content issues.

**Why it matters:** Without a canonical tag, search engines may index multiple versions of the same page (with/without www, with/without trailing slash, with UTM parameters), which splits link equity and can lower rankings.

**Real-world example:** Shopify automatically adds canonical tags to all product pages so that URLs with filters (?color=red) do not take ranking away from the main product page. Without this, Google could index hundreds of duplicates.

#### Sources
- [Consolidate Duplicate URLs](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls) — Google Search Central
- [HTML <link> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link) — MDN Web Docs

---

### S6 — Viewport Meta Tag

**What is it:** Checks for the presence of a meta viewport tag that ensures proper page rendering on mobile devices. The tag <meta name='viewport' content='width=device-width, initial-scale=1'> is the standard.

**Why it matters:** Google uses mobile-first indexing, which means the mobile version of the page is primary for ranking. Without a viewport tag, the page displays on mobile as a desktop version scaled down to a small screen.

**Real-world example:** All modern web frameworks (Next.js, Nuxt, Angular) add the viewport tag automatically. Older websites without this tag are flagged by Google as 'not mobile-friendly' and lose positions in search results.

#### Sources
- [HTML <meta> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta) — MDN Web Docs
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S7 — Robots Meta Tag

**What is it:** Checks whether the page contains a meta robots tag with a 'noindex' value, which prevents search engines from indexing the page. Verifies that main pages are accessible for indexing.

**Why it matters:** A meta robots tag with a 'noindex' value completely excludes the page from search results. It often happens that developers forget to remove noindex after transitioning from a staging environment to production.

**Real-world example:** A well-known case: a large e-shop forgot to remove 'noindex' after migrating to a new server and lost 80% of organic traffic within a week. Google Search Console can alert you to this issue, but with a delay.

#### Sources
- [Robots Meta Tag Specifications](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S8 — Open Graph Tags

**What is it:** Checks for the presence of Open Graph meta tags (og:title, og:description, og:image, og:url). These tags determine how content appears when shared on social networks like Facebook, LinkedIn, and others.

**Why it matters:** Without OG tags, social networks generate previews automatically, which often results in an unattractive display. Properly set OG tags can increase click-through rates when sharing content by up to 50%.

**Real-world example:** The New York Times has precisely configured OG tags for every article, including og:image with an optimized preview image. When shared on Facebook, the article appears with a large image, title, and description.

#### Sources
- [The Open Graph Protocol](https://ogp.me/) — Open Graph Protocol
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S9 — Twitter Card Tags

**What is it:** Checks for the presence of Twitter (X) Card meta tags (twitter:card, twitter:title, twitter:description, twitter:image). These tags control how content appears when shared on the X platform (formerly Twitter).

**Why it matters:** Twitter Card tags enable richer previews when sharing links on X. They support various formats such as Summary Card, Summary with Large Image, and Player Card for video content.

**Real-world example:** Spotify uses the Twitter Player Card for sharing songs, so users can preview music directly on X. Without Twitter Card tags, only a plain link without a preview image would be displayed.

#### Sources
- [Getting Started with Cards](https://developer.x.com/en/docs/x-for-websites/cards/guides/getting-started) — X Developer Platform
- [Cards Markup](https://developer.x.com/en/docs/x-for-websites/cards/overview/markup) — X Developer Platform

---

### S10 — Structured Data

**What is it:** Checks for the presence of structured data in JSON-LD or microdata format (Schema.org). Structured data helps search engines understand page content more precisely and display rich snippets.

**Why it matters:** Pages with structured data can earn rich snippets in results (star ratings, prices, FAQ, recipes, events). These enhancements increase visibility and CTR by up to 30%.

**Real-world example:** Amazon uses Product schema on every product, allowing Google to display price, availability, and ratings directly in results. Recipe sites like AllRecipes use Recipe schema to show preparation time and ratings.

#### Sources
- [Introduction to Structured Data](https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S11 — Image ALT Texts

**What is it:** Checks whether all images on the page have a filled alt attribute. ALT text describes the content of an image for search engines and assistive technologies (screen readers).

**Why it matters:** Search engines cannot 'see' images and rely on ALT text for indexing. ALT texts are also crucial for web accessibility and are displayed when an image fails to load.

**Real-world example:** Wikipedia consistently uses descriptive ALT texts for all images, e.g. alt='Bratislava Castle at sunset'. E-shops without ALT texts miss out on traffic from Google Images, which can account for 10-20% of visits.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S12 — Internal Links

**What is it:** Checks whether the page contains at least 3 internal links to other pages on the same website. Internal linking helps search engines discover content and distribute link equity across the site.

**Why it matters:** Internal links are the foundation of search engine crawling. Pages without internal links are 'orphaned' - search engines struggle to discover them and do not assign them sufficient authority.

**Real-world example:** Wikipedia is a master of internal linking - every article contains dozens of links to related pages. Thanks to this, Google can efficiently crawl millions of pages. A blog without internal links can lose up to 40% of its potential ranking.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S13 — External Links

**What is it:** Checks for the presence of external links to trustworthy sources. Linking to high-quality external sources increases the credibility of the page and helps search engines understand the topical context.

**Why it matters:** External links to authoritative sources signal to search engines that your content is grounded in a broader context and is trustworthy. Google perceives this as a content quality signal.

**Real-world example:** Scientific articles always reference sources and studies. Similarly, a quality blog post about health should link to WHO or professional sources. Pages without external links can appear isolated and untrustworthy.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S14 — Anchor Text Quality

**What is it:** Checks the quality of anchor texts (link texts). Anchor texts should be descriptive and relevant, not generic like 'click here', 'here', 'more', or 'read more'.

**Why it matters:** Anchor text helps search engines understand the content of the target page. Descriptive anchor texts like 'SEO optimization guide' are far more useful than 'click here', both for users and search engines.

**Real-world example:** Google uses descriptive anchor texts in its own documentation, such as 'see our structured data guide'. Conversely, email marketing campaigns often use 'click here', which is a missed opportunity from an SEO perspective.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S15 — Clean URLs

**What is it:** Checks whether the page URL is clean and readable - without unnecessary query parameters, session IDs, or random strings. Clean URLs are easier to remember and share.

**Why it matters:** Clean URLs improve user experience and help search engines understand the site structure. A URL like '/products/nike-air-shoes' is better than '/p?id=38291&cat=12&sess=abc123'.

**Real-world example:** Airbnb uses clean URLs like '/rooms/12345' instead of complex query parameters. WordPress with 'pretty permalinks' enabled changes '/p?id=123' to '/article-title'. Clean URLs have demonstrably higher click-through rates in search results.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S16 — HTML lang Attribute

**What is it:** Checks for the presence of the lang attribute on the HTML element (e.g., <html lang='en'>). This attribute defines the language of the page content for search engines and assistive technologies.

**Why it matters:** The lang attribute helps search engines serve the page to the correct language audience. Screen readers use it to select proper pronunciation. Without it, search engines may serve the page to the wrong audience.

**Real-world example:** A Slovak website without lang='sk' may be shown by Google to German or Czech users. Global companies like IKEA set the lang attribute on every language version of their site (lang='sk', lang='cs', lang='de').

#### Sources
- [HTML lang Global Attribute](https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang) — MDN Web Docs
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S17 — Favicon

**What is it:** Checks for the presence of a favicon - a small icon displayed in browser tabs, bookmarks, and search results. A favicon increases brand recognition.

**Why it matters:** Google displays the favicon next to the URL in mobile search results. A favicon increases the credibility and recognizability of a website. Without one, a generic icon is shown, which lowers CTR.

**Real-world example:** GitHub uses its iconic octocat favicon, which is instantly recognizable among ten open browser tabs. Websites without a favicon appear less professional, and users have a harder time finding them among open tabs.

#### Sources
- [HTML <link> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link) — MDN Web Docs
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S18 — Status Code and Broken Links

**What is it:** Checks the HTTP status code of the analyzed URL and verifies all links on the page. Every <a href> link is actually verified with an HTTP HEAD request to determine whether it is functional or broken (404, 5xx, timeout).

**Why it matters:** Broken links damage user experience and SEO. Google lowers the ranking of pages with many non-functional links. The page's HTTP status code must be 2xx - any other status (e.g., 5xx) indicates technical issues.

**Real-world example:** An e-shop has 85 links on its homepage. After checking, 3 lead to discontinued products (404) and 1 external link points to a partner company that no longer exists (timeout). Fixing these links improves crawl budget and user experience.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [HTTP Status Codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) — MDN Web Docs

---

### S19 — Robots.txt

**What is it:** Checks for the existence of a robots.txt file in the root of the website. This file tells search engine crawlers which parts of the site they can and cannot crawl.

**Why it matters:** Robots.txt is the first file a search engine crawler reads when visiting a website. A properly configured robots.txt optimizes crawl budget and protects sensitive sections of the site from indexing.

**Real-world example:** Facebook has an extensive robots.txt that blocks crawling of profiles but allows public pages. An incorrect robots.txt can block the entire site - a well-known case where 'Disallow: /' accidentally blocked an entire e-shop for weeks.

#### Sources
- [Robots.txt Introduction and Guide](https://developers.google.com/search/docs/crawling-indexing/robots/intro) — Google Search Central
- [How Google Interprets robots.txt](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt) — Google Search Central

---

### S20 — Sitemap.xml

**What is it:** Checks for the existence of an XML sitemap, which provides search engines with a list of all important URLs on the website. A sitemap speeds up the discovery of new and updated content.

**Why it matters:** A sitemap is like a map of the website for search engines. It is especially important for large sites, new sites with few external links, and sites with rich multimedia content.

**Real-world example:** CNN.com has a sitemap index with links to sitemaps for news, video, and images. WordPress generates a sitemap automatically. New websites without a sitemap may wait weeks for Google to discover all their pages.

#### Sources
- [Sitemaps Overview](https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview) — Google Search Central
- [Robots.txt Introduction](https://developers.google.com/search/docs/crawling-indexing/robots/intro) — Google Search Central

---

### S21 — HTTPS

**What is it:** Checks whether the website uses the HTTPS protocol with a valid SSL/TLS certificate. HTTPS encrypts communication between the browser and server and is a ranking factor for Google.

**Why it matters:** Google has used HTTPS as a ranking signal since 2014. Browsers mark HTTP pages as 'Not Secure', which deters users. HTTPS is now an absolute baseline for every website.

**Real-world example:** Let's Encrypt provides free SSL certificates and most hosting providers offer HTTPS automatically. Websites without HTTPS lose user trust - research shows that 85% of users leave a site marked as 'Not Secure'.

#### Sources
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S22 — WWW Consistency

**What is it:** Checks whether the website consistently uses either the www or non-www version and properly redirects one version to the other. Both versions should not work simultaneously without a redirect.

**Why it matters:** For search engines, example.com and www.example.com are two different pages. Without a redirect, link equity is split between both versions, which weakens the overall site ranking.

**Real-world example:** Google.com redirects www.google.com to google.com (without www). Conversely, www.amazon.com is Amazon's primary version. The important thing is to choose one variant and redirect the other via a 301 redirect.

#### Sources
- [Consolidate Duplicate URLs](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls) — Google Search Central
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S23 — Trailing Slash Consistency

**What is it:** Checks for consistent use of trailing slashes in URLs. The addresses /page and /page/ are different URLs for search engines and should be unified.

**Why it matters:** Inconsistent use of trailing slashes creates duplicate URLs, which confuses search engines and splits link equity. The site should use one convention and redirect the other.

**Real-world example:** Next.js allows setting trailingSlash: true/false in its configuration. Apache servers often add trailing slashes automatically for directories. It is important to have a consistent rule and a 301 redirect for the alternate variant.

#### Sources
- [Consolidate Duplicate URLs](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S24 — HTML Size

**What is it:** Checks the size of the HTML document, which should not exceed 100 KB. Overly large HTML slows down parsing, rendering, and increases page load time.

**Why it matters:** Large HTML documents slow down page loading, consume more memory, and can cause issues on mobile devices with limited data plans. Google favors fast pages.

**Real-world example:** Single Page Applications (SPAs) sometimes generate HTML with inline CSS/JS exceeding 500 KB. Google recommends keeping HTML under 100 KB. WordPress themes with page builders often generate unnecessarily large HTML with dozens of superfluous div elements.

#### Sources
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S25 — Mobile-Friendly Indicators

**What is it:** Checks basic mobile-friendliness indicators: viewport tag, responsive design, readable font size, sufficiently large touch targets, and no horizontal scrolling.

**Why it matters:** More than 60% of searches happen on mobile devices. Google uses mobile-first indexing, meaning the mobile version of the page is primary for evaluation and ranking.

**Real-world example:** Google offers the Mobile-Friendly Test for testing mobile-friendliness. Pages that are not mobile-friendly lose positions in mobile search results. Responsive design is today's standard - Bootstrap and Tailwind CSS handle it automatically.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Lighthouse SEO Audits](https://web.dev/articles/pass-lighthouse-seo-audit) — web.dev

---

### S26 — Content Length

**What is it:** Checks whether the main page content contains at least 300 words. Short content cannot sufficiently cover a topic and search engines may consider it 'thin content'.

**Why it matters:** Pages with little content (thin content) have lower rankings because they do not provide sufficient value to users. Google prefers comprehensive, valuable content that thoroughly covers a topic.

**Real-world example:** Studies show that pages in Google's first position average 1,400+ words. Wikipedia articles with thousands of words dominate search results. Product pages with only 2-3 sentences lose organic traffic compared to competitors with detailed descriptions.

#### Sources
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central

---

### S27 — Keyword Density

**What is it:** Checks the density of keywords in page content. Too high a density (keyword stuffing) is penalized, while too low means the page is not sufficiently relevant for the given keyword.

**Why it matters:** Google penalizes keyword stuffing - artificially repeating keywords. A natural density of 1-3% is optimal. Modern search engines use semantic understanding, so synonyms and related terms are equally important.

**Real-world example:** The old SEO approach of 'cheap shoes, shoes cheap, buy cheap shoes' is penalized today. The Google Panda update in 2011 started penalizing keyword stuffing. Quality content uses keywords naturally in context.

#### Sources
- [Google Search Essentials](https://developers.google.com/search/docs/essentials) — Google Search Central
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

### S28 — Hreflang Tags

**What is it:** Checks for the presence of hreflang tags on multilingual websites. The hreflang attribute informs search engines about language and regional variants of a page (e.g., sk, cs, en).

**Why it matters:** Without hreflang tags, Google may display the Czech version of a page to a Slovak user or vice versa. Hreflang ensures that the correct language version is shown to the right audience.

**Real-world example:** IKEA.com uses hreflang for dozens of countries: hreflang='sk' for Slovakia, hreflang='cs' for the Czech Republic, hreflang='de-AT' for Austria. Without hreflang tags, Google could show the German version to Czech users, leading to a high bounce rate.

#### Sources
- [Localized Versions of Pages](https://developers.google.com/search/docs/specialty/international/localized-versions) — Google Search Central
- [HTML <link> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link) — MDN Web Docs

---

### S29 — Pagination (rel prev/next)

**What is it:** Checks for proper use of rel='prev' and rel='next' on paginated pages. These attributes inform search engines about the relationship between pages in a sequence (e.g., page 1, 2, 3).

**Why it matters:** Proper pagination helps search engines understand that multiple pages form a logical whole. Without it, paginated pages may be perceived as duplicate content with similar titles and descriptions.

**Real-world example:** E-shops like Alza.sk use pagination for categories with dozens of products. Properly set rel prev/next helps Google understand that /shoes?page=2 is a continuation of /shoes. Google marked rel prev/next as a 'hint' rather than a directive in 2019, but still accepts it.

#### Sources
- [Consolidate Duplicate URLs](https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls) — Google Search Central
- [HTML <link> Element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link) — MDN Web Docs

---

### S30 — Breadcrumb Navigation

**What is it:** Checks for the presence of a breadcrumb navigation element on the page. Breadcrumbs show the hierarchical position of a page within the site structure (e.g., Home > Category > Product).

**Why it matters:** Breadcrumbs improve navigation, reduce bounce rate, and Google displays them directly in search results instead of the URL. Structured data for breadcrumbs (BreadcrumbList schema) increases visibility in SERPs.

**Real-world example:** Amazon displays breadcrumbs on every product page: 'Electronics > Computers > Laptops > Gaming Laptops'. Google shows these breadcrumbs in search results, helping users understand the page context before clicking.

#### Sources
- [Breadcrumb Structured Data](https://developers.google.com/search/docs/appearance/structured-data/breadcrumb) — Google Search Central
- [SEO Starter Guide](https://developers.google.com/search/docs/fundamentals/seo-starter-guide) — Google Search Central

---

## Bezpečnosť (11)

11 bezpečnostných kontrol

### SEC1 — HTTPS

**What is it:** Checks whether the website uses the secure HTTPS protocol instead of unencrypted HTTP. HTTPS encrypts communication between the browser and server using a TLS certificate, protecting sensitive data (passwords, payment information) from eavesdropping.

**Why it matters:** Without HTTPS, an attacker on the network can intercept and modify data transmitted between the user and the website (man-in-the-middle attack). Modern browsers mark HTTP pages as 'Not Secure' and many APIs (geolocation, camera, service workers) work exclusively over HTTPS. Google also uses HTTPS as a ranking signal.

**Real-world example:** Stripe.com has HTTPS on every page including marketing subpages — not just on the payment form. Conversely, a local e-shop without HTTPS displays a 'Not Secure' warning in Chrome's address bar, which discourages customers from making a purchase.

#### Sources
- [Why HTTPS Matters](https://web.dev/articles/why-https-matters) — web.dev
- [Transport Layer Security (TLS)](https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Transport_Layer_Security) — MDN

---

### SEC2 — Strict-Transport-Security (HSTS)

**What is it:** Checks for the presence of the Strict-Transport-Security HTTP header with a max-age value of at least 31536000 (1 year). HSTS instructs the browser to automatically send all future requests to the domain over HTTPS, even if the user types http://.

**Why it matters:** Without HSTS, the first visit over HTTP is vulnerable to SSL stripping attacks — an attacker can intercept the redirect to HTTPS and eavesdrop on the communication. With HSTS, the browser automatically upgrades to HTTPS before even sending the request.

**Real-world example:** GitHub.com sends the header Strict-Transport-Security: max-age=31536000; includeSubdomains; preload. Thanks to this, the domain github.com is included in the HSTS preload list, so the browser never sends an HTTP request even on the first visit.

#### Sources
- [Strict-Transport-Security header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Strict-Transport-Security) — MDN
- [HTTP Strict Transport Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Strict_Transport_Security_Cheat_Sheet.html) — OWASP

---

### SEC3 — Content-Security-Policy (CSP)

**What is it:** Checks for the presence of the Content-Security-Policy HTTP header, which defines from which origins the browser may load scripts, styles, images, and other resources. CSP is the most effective defense against XSS (Cross-Site Scripting) attacks.

**Why it matters:** XSS is one of the most common web vulnerabilities. An attacker can inject malicious JavaScript that steals cookies, redirects to a phishing site, or modifies page content. CSP prevents the execution of unauthorized scripts by precisely defining allowed sources.

**Real-world example:** Cloudflare.com uses a strict CSP with the rule script-src 'self', which only allows scripts from its own domain. If an attacker injects an external script from a foreign domain, the browser blocks it because that domain is not on the allowed list.

#### Sources
- [Content-Security-Policy (CSP) header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Security-Policy) — MDN
- [Content Security Policy Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Content_Security_Policy_Cheat_Sheet.html) — OWASP

---

### SEC4 — X-Frame-Options

**What is it:** Checks for the presence of the X-Frame-Options HTTP header, which determines whether a page can be displayed in an iframe, frame, or embed element. The main values are DENY (embedding forbidden) and SAMEORIGIN (allowed only from the same domain).

**Why it matters:** Without this header, an attacker can embed your page in an invisible iframe on their website and use clickjacking to trick the user into clicking something that actually performs an action on your site — such as approving a payment or changing a password.

**Real-world example:** Banking portals like mBank use X-Frame-Options: DENY, preventing the login form from being embedded in a foreign iframe. An attacker therefore cannot create a fake page that overlays an iframe with the banking application.

#### Sources
- [X-Frame-Options header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/X-Frame-Options) — MDN
- [OWASP Secure Headers Project](https://owasp.org/www-project-secure-headers/) — OWASP

---

### SEC5 — X-Content-Type-Options

**What is it:** Checks for the presence of the X-Content-Type-Options HTTP header with the value nosniff. This header prevents the browser from MIME type sniffing — automatically guessing the content type, which can lead to interpreting a harmless file as an executable script.

**Why it matters:** Without the nosniff header, a browser may interpret an uploaded text file as JavaScript and execute it. An attacker can thus upload malicious code disguised as an image or text file, which the browser will execute instead of displaying.

**Real-world example:** Dropbox.com sends X-Content-Type-Options: nosniff with all responses. If someone uploads a file evil.jpg containing JavaScript code, the browser will strictly treat it as an image thanks to nosniff and will not execute the script.

#### Sources
- [X-Content-Type-Options header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/X-Content-Type-Options) — MDN
- [OWASP Secure Headers Project](https://owasp.org/www-project-secure-headers/) — OWASP

---

### SEC6 — Referrer-Policy

**What is it:** Checks for the presence of the Referrer-Policy HTTP header, which determines how much URL information is sent in the Referer header during navigation or resource loading. The recommended value is strict-origin-when-cross-origin or no-referrer.

**Why it matters:** Without a proper Referrer-Policy, the URL sent in the Referer header can reveal sensitive data — such as tokens in query parameters, internal URL addresses, or search information. External services (analytics, ads) can thus gain access to data that does not belong to them.

**Real-world example:** GitHub.com uses Referrer-Policy: strict-origin-when-cross-origin. When you click an external link from a private repository, the target website receives only https://github.com as the referrer — not the full URL with the private repository name.

#### Sources
- [Referrer-Policy header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Referrer-Policy) — MDN
- [OWASP Secure Headers Project](https://owasp.org/www-project-secure-headers/) — OWASP

---

### SEC7 — Permissions-Policy

**What is it:** Checks for the presence of the Permissions-Policy HTTP header (formerly Feature-Policy), which restricts the page's and embedded iframes' access to sensitive browser APIs such as camera, microphone, geolocation, payment API, and others.

**Why it matters:** Without Permissions-Policy, a malicious iframe embedded via an ad or third-party widget can access the user's camera, microphone, or geolocation. This header allows you to explicitly disable APIs that the website does not need, thereby reducing the attack surface.

**Real-world example:** Stripe.com sets Permissions-Policy: camera=(), microphone=(), geolocation=(), which disables access to camera, microphone, and geolocation across the entire website. If a malicious script were to get onto the page, it could not activate these sensors.

#### Sources
- [Permissions-Policy header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Permissions-Policy) — MDN
- [Permissions Policy guide](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Permissions_Policy) — MDN

---

### SEC8 — Mixed Content

**What is it:** Checks whether an HTTPS page contains resources loaded over the unencrypted HTTP protocol (images, scripts, styles, fonts). Such content is called 'mixed content' and compromises the security of the entire page.

**Why it matters:** Even if the main page is on HTTPS, HTTP resources can be intercepted and modified by an attacker on the network. A malicious script loaded over HTTP on an HTTPS page has full access to the DOM and cookies. Browsers block active mixed content (scripts, iframes) and flag passive content (images) with a warning.

**Real-world example:** An e-shop on HTTPS loads product images from http://cdn.example.com/product.jpg. Chrome displays a 'Mixed Content' warning in the console and the padlock icon disappears. After changing to https://cdn.example.com/product.jpg, the page is fully secured and the padlock returns.

#### Sources
- [Mixed content](https://developer.mozilla.org/en-US/docs/Web/Security/Defenses/Mixed_content) — MDN
- [Content Security Policy guide](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP) — MDN

---

### SEC9 — Security Warnings

**What is it:** Checks whether security warnings appear in the browser console — such as missing security headers, invalid certificates, outdated TLS versions, insecure forms, or CORS policy issues.

**Why it matters:** Security warnings in the console signal potential vulnerabilities that an attacker can exploit. Ignoring warnings can lead to data leaks, unauthorized access, or compromise of the entire application. A clean console is a sign of a well-secured website.

**Real-world example:** A website with a form on an HTTP page displays a warning in Chrome about an insecure form target. Cloudflare.com has a clean console with no security warnings, which indicates thorough security across all parts of the website.

#### Sources
- [Security on the web](https://developer.mozilla.org/en-US/docs/Web/Security) — MDN
- [OWASP Secure Headers Project](https://owasp.org/www-project-secure-headers/) — OWASP

---

### SEC10 — Insecure Cookies

**What is it:** Checks whether cookies have the required security attributes set: Secure (sent only over HTTPS), HttpOnly (inaccessible via JavaScript), and SameSite (protection against CSRF attacks). Missing attributes create serious security vulnerabilities.

**Why it matters:** A cookie without the Secure attribute is also sent over HTTP, where an attacker can intercept it. Without HttpOnly, it can be stolen via an XSS attack through JavaScript cookie access. Without SameSite=Strict or Lax, an attacker can craft a CSRF attack — a fake form that automatically sends a request with the victim's cookie.

**Real-world example:** Stripe Dashboard sets its session cookie with all three attributes: Set-Cookie: session=abc123; Secure; HttpOnly; SameSite=Lax. Conversely, a website without these attributes risks an attacker stealing the session cookie via XSS and taking over the user's account.

#### Sources
- [Set-Cookie header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Set-Cookie) — MDN
- [Secure cookie configuration](https://developer.mozilla.org/en-US/docs/Web/Security/Practical_implementation_guides/Cookies) — MDN

---

### SEC11 — Deprecated APIs

**What is it:** Checks whether the website uses deprecated API calls that browsers plan to or have already stopped supporting. These include synchronous XMLHttpRequest, AppCache, Web SQL, and other obsolete features.

**Why it matters:** Deprecated APIs often contain security vulnerabilities that will never be fixed. Some deprecated DOM writing methods can be exploited for malicious code injection, and synchronous XHR blocks the main thread. Browsers are gradually removing these APIs, which can cause the website to stop working.

**Real-world example:** An older website uses deprecated methods to inject scripts into the page instead of modern createElement or async/defer attributes. Chrome blocks these calls on slow connections. GitHub and Cloudflare use exclusively modern APIs, ensuring long-term compatibility.

#### Sources
- [Security on the web](https://developer.mozilla.org/en-US/docs/Web/Security) — MDN
- [HTTP Headers Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/HTTP_Headers_Cheat_Sheet.html) — OWASP

---

## Výkon (10)

10 výkonnostných kontrol

### P1 — Response Time (TTFB)

**What is it:** Time to First Byte (TTFB) measures the time from sending the HTTP request to receiving the first byte of the server response. The ideal value is under 800 ms. TTFB is a fundamental metric that affects all subsequent metrics such as FCP and LCP. A slow server means the user is waiting before anything even starts rendering.

**Why it matters:** A slow TTFB slows down the entire page. If the server responds slowly, the user sees a white screen. Google uses loading speed as a ranking factor. According to web.dev, TTFB should be under 800 ms for a good rating. Every 100 ms of delay reduces conversions by approximately 1%.

**Real-world example:** Amazon found that every 100 ms of delay cost them 1% in revenue. Google typically has a TTFB under 200 ms thanks to distributed servers and edge caching. If your e-shop responds in 2500 ms, the page takes 4-5 seconds to load and most visitors will leave.

#### Sources
- [Time to First Byte (TTFB)](https://web.dev/articles/ttfb) — web.dev
- [Optimize Time to First Byte](https://web.dev/articles/optimize-ttfb) — web.dev
- [Web Performance](https://developer.mozilla.org/en-US/docs/Web/Performance) — MDN Web Docs

---

### P2 — HTML Size

**What is it:** Checks the size of the page's HTML document. The ideal size is under 100 KB. Large HTML documents slow down parsing and increase the time to first render. Excessively large HTML often results from inline styles, inline scripts, or duplicated content.

**Why it matters:** Large HTML slows down loading because the browser must download and parse the entire document before it can begin rendering. On mobile networks this is even more pronounced. Smaller HTML also reduces data consumption for users and speeds up DOM parsing.

**Real-world example:** The Google homepage has HTML under 50 KB, enabling lightning-fast loading. Shopify optimizes their templates' HTML to a minimum. If your blog generates 500 KB of HTML due to inline CSS and unnecessary attributes, loading on a 3G network can take up to 5 seconds.

#### Sources
- [Minify and compress network payloads](https://web.dev/reduce-network-payloads-using-text-compression/) — web.dev
- [Avoid enormous network payloads](https://web.dev/total-byte-weight/) — web.dev
- [Web Performance](https://developer.mozilla.org/en-US/docs/Web/Performance) — MDN Web Docs

---

### P3 — External Resources

**What is it:** Counts the number of external scripts and CSS files loaded on the page. The ideal count is fewer than 15. Each external resource requires a DNS lookup, TCP connection, and HTTP request, which adds latency. Render-blocking scripts and styles are particularly problematic.

**Why it matters:** Each external resource adds network latency. CSS and JavaScript are render-blocking — the browser cannot display content until all of them are loaded. Too many external resources dramatically slow down First Contentful Paint and increase the time the user sees a blank page.

**Real-world example:** A typical WordPress site loads 15-30 external resources due to plugins. The Google homepage uses only 3-4 external resources. If your site loads 25 external scripts from various CDNs, each adds 50-200 ms of latency and the total delay can reach 2 seconds.

#### Sources
- [Keep request counts low and transfer sizes small](https://web.dev/resource-summary/) — web.dev
- [Optimize resource loading](https://web.dev/learn/performance/optimize-resource-loading) — web.dev
- [Web Performance](https://developer.mozilla.org/en-US/docs/Web/Performance) — MDN Web Docs

---

### P4 — Inline CSS

**What is it:** Measures the size of inline CSS styles directly in the HTML document compared to external styles. A small amount of critical inline CSS (up to 15 KB) is beneficial for fast rendering of above-the-fold content. However, too much inline CSS increases HTML size and prevents style caching.

**Why it matters:** Inline CSS cannot be cached — it is downloaded with every page load. Large blocks of inline CSS increase the HTML document size and slow down parsing. The correct approach is to inline only critical CSS for above-the-fold content and load the rest asynchronously from an external file.

**Real-world example:** Google inlines critical CSS for above-the-fold content (roughly 10 KB) and loads the rest asynchronously. If your website has 200 KB of inline CSS in every HTML document, the user downloads unnecessary data with every request and the browser cannot cache CSS between pages.

#### Sources
- [Extract critical CSS](https://web.dev/articles/extract-critical-css) — web.dev
- [Defer non-critical CSS](https://web.dev/articles/defer-non-critical-css) — web.dev
- [CSS for Web Vitals](https://web.dev/articles/css-web-vitals) — web.dev

---

### P5 — Inline JS

**What is it:** Measures the size of inline JavaScript code directly in the HTML document compared to external scripts. Small inline scripts (up to 5 KB) are acceptable for critical functionality. Large blocks of inline JS increase HTML size, slow down parsing, and prevent script caching and optimization.

**Why it matters:** Inline JavaScript blocks the HTML parser and cannot be cached by the browser. Large inline scripts slow down Time to Interactive and increase the size of every HTML document. External scripts can be cached, minified, and loaded asynchronously using the async or defer attributes.

**Real-world example:** Shopify moved most of their inline JS to external files with async/defer attributes, which improved TTI by 30%. If your e-shop has 150 KB of inline JavaScript (trackers, analytics scripts), every page will be 150 KB larger and impossible to optimize.

#### Sources
- [Remove unused JavaScript](https://web.dev/unused-javascript/) — web.dev
- [General HTML performance considerations](https://web.dev/learn/performance/general-html-performance) — web.dev
- [Web Performance](https://developer.mozilla.org/en-US/docs/Web/Performance) — MDN Web Docs

---

### P6 — Image Optimization

**What is it:** Checks whether images use modern formats (WebP, AVIF), lazy loading (loading='lazy'), and responsive sizes via the srcset attribute. Lazy loading defers loading of images outside the viewport, and srcset allows the browser to select the right image size for the given device.

**Why it matters:** Images are typically the largest part of a page (50-70% of data). Without lazy loading, all images are loaded at once, even those the user will never see. Without srcset, mobile devices download unnecessarily large desktop images. Modern formats like WebP are 25-35% smaller than JPEG.

**Real-world example:** Medium uses lazy loading and srcset for all images in articles — it only loads images visible in the viewport and sends smaller versions to mobile. If your blog has 20 images at 500 KB each without lazy loading, the user downloads 10 MB of data immediately on page load.

#### Sources
- [Browser-level image lazy loading for the web](https://web.dev/articles/browser-level-image-lazy-loading) — web.dev
- [Serve responsive images](https://web.dev/articles/serve-responsive-images) — web.dev
- [The Image Embed element - img](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/img) — MDN Web Docs

---

### P7 — Total Page Weight

**What is it:** Measures the total size of all page resources in KB (HTML, CSS, JS, images, fonts). The ideal value is under 2,000 KB. According to HTTP Archive, the median is around 1,700-1,900 KB. A large page slows down loading, especially on mobile networks and weaker devices.

**Why it matters:** Total page size directly affects loading time. On a 3G network, downloading a 5 MB page takes over 15 seconds. Google recommends keeping critical content under 170 KB of compressed data to achieve TTI under 10 seconds on mobile devices.

**Real-world example:** Google.com has a total size under 500 KB. The average e-commerce website is 3-5 MB due to large product images. Amazon optimizes images and uses lazy loading to keep the total size below 2 MB on initial load.

#### Sources
- [Avoid enormous network payloads](https://web.dev/total-byte-weight/) — web.dev
- [Your first performance budget](https://web.dev/your-first-performance-budget/) — web.dev
- [Performance budgets 101](https://web.dev/articles/performance-budgets-101) — web.dev

---

### P8 — Request Count

**What is it:** Counts the total number of HTTP requests needed to load the page. The ideal count is fewer than 50. Each request adds latency due to DNS, TCP, and TLS handshakes. Even with HTTP/2 multiplexing, a high number of requests has a negative impact on performance.

**Why it matters:** A high number of requests slows down page loading, especially on high-latency mobile networks. Render-blocking requests (CSS, synchronous JS) must complete before content can be displayed. Caching, bundling, and sprite techniques reduce the number of requests.

**Real-world example:** A typical WordPress site with 15 plugins generates 80-120 HTTP requests. Google.com needs fewer than 20 requests. If your website requires 95 requests (30 scripts, 25 images, 15 CSS, 25 other), on a mobile network it can take up to 8 seconds.

#### Sources
- [Keep request counts low and transfer sizes small](https://web.dev/resource-summary/) — web.dev
- [Prevent unnecessary network requests with the HTTP Cache](https://web.dev/articles/http-cache) — web.dev
- [Web Performance](https://developer.mozilla.org/en-US/docs/Web/Performance) — MDN Web Docs

---

### P9 — CSS Coverage

**What is it:** Measures the percentage of unused CSS code on the page. The ideal value is less than 50% unused CSS. Unused CSS styles unnecessarily increase file sizes, slow down downloads, and block page rendering because the browser must process all CSS rules.

**Why it matters:** CSS is a render-blocking resource — the browser must process all CSS rules before rendering the page. If 70% of CSS is unused, the browser wastes time parsing it. Removing unused CSS improves FCP and LCP and reduces page size.

**Real-world example:** The Bootstrap framework often loads 90% unused CSS if only a few components are used. Tailwind CSS with its purge feature removes unused classes and reduces CSS from 3 MB to 10 KB. If your website uses 500 KB of CSS but actually needs only 50 KB, you are unnecessarily slowing down rendering.

#### Sources
- [Remove unused CSS](https://web.dev/unused-css-rules/) — web.dev
- [Defer non-critical CSS](https://web.dev/articles/defer-non-critical-css) — web.dev
- [Coverage: Find unused JavaScript and CSS](https://developer.chrome.com/docs/devtools/coverage) — Chrome DevTools

---

### P10 — JS Coverage

**What is it:** Measures the percentage of unused JavaScript code on the page. The ideal value is less than 50% unused JS. Unused JavaScript unnecessarily increases file sizes, slows down downloads, parsing, and compilation, which directly affects Time to Interactive.

**Why it matters:** JavaScript is the most expensive resource to process — it must be downloaded, parsed, compiled, and executed. Unused JS wastes all of these resources. Lighthouse flags every JS file with more than 20 KiB of unused code. Tree shaking and code splitting are key optimizations.

**Real-world example:** Webpack bundle analyzer often reveals that 60% of JavaScript code is unused. Next.js uses automatic code splitting — each page loads only the JS it needs. If your website loads 2 MB of JavaScript but uses only 400 KB, mobile devices spend seconds parsing unnecessary code.

#### Sources
- [Remove unused JavaScript](https://web.dev/unused-javascript/) — web.dev
- [Remove unused code](https://web.dev/remove-unused-code/) — web.dev
- [Coverage: Find unused JavaScript and CSS](https://developer.chrome.com/docs/devtools/coverage) — Chrome DevTools

---

## AI Viditeľnosť (10)

10 konceptov AI viditeľnosti v LLM modeloch

### K1 — Identity (Knowledge Level)

**What is it:** Measures the level of knowledge AI models (ChatGPT, Claude, Gemini, Perplexity) have about your domain and brand. The result is expressed on a scale: none (AI knows nothing), confused (AI mixes up facts), partial (AI knows basic information), good (AI has accurate knowledge), and excellent (AI knows details, history, and brand context). It is tested through a series of questions about the company, products, and services.

**Why it matters:** The level of AI knowledge about your brand directly determines the quality of answers users receive. If the level is 'none' or 'confused', AI may provide potential customers with incorrect information or ignore you entirely. According to GEO research (Princeton/IIT Delhi), up to 40% of users now start their searches through AI tools, making it crucial for AI models to have accurate and complete knowledge about your brand.

**Real-world example:** When asked 'What is Amazon.com?', ChatGPT responds: 'Amazon is the world's largest e-commerce company, founded in 1994 by Jeff Bezos in Seattle.' This is an 'excellent' level — AI knows the name, focus, founding year, and city. Conversely, a small local business may get a 'none' level if AI has no information about it.

#### Sources
- [GEO: Generative Engine Optimization](https://arxiv.org/abs/2311.09735) — arXiv (IIT Delhi, Princeton)
- [The Complete AI Visibility Guide for SEOs, Marketers, and Site Owners](https://ahrefs.com/blog/ai-visibility/) — Ahrefs
- [How Claude uses the internet — Does Anthropic crawl data from the web?](https://privacy.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) — Anthropic

---

### K2 — Industry (Brand Mention Rate)

**What is it:** Measures how frequently AI models mention your brand in responses to relevant queries. Expressed as a percentage of responses in which the brand appears out of the total number of relevant queries. For example, if out of 50 queries related to your industry AI mentions your brand in 15, the Brand Mention Rate is 30%.

**Why it matters:** A brand mention in an AI response is the digital equivalent of a word-of-mouth recommendation. According to Ahrefs, brand mentions from trusted sources are the strongest factor for visibility in AI Overviews. A higher mention frequency builds awareness and trust, as users perceive AI responses as objective and trustworthy recommendations.

**Real-world example:** Amazon appears in 70% of AI responses for electronics queries in the US — that's a high Brand Mention Rate. A small local electronics shop may appear in only 5% of responses. The goal is to systematically increase this ratio by optimizing content, building mentions on third-party sites, and strengthening topical authority.

#### Sources
- [7 ways to grow brand mentions, a key metric for AI Overviews visibility](https://searchengineland.com/7-ways-to-grow-brand-mentions-a-key-metric-for-ai-overviews-visibility-458600) — Search Engine Land
- [Ahrefs Brand Radar: See ANY brand's AI visibility](https://ahrefs.com/brand-radar) — Ahrefs

---

### K3 — Competition (Link Presence)

**What is it:** Checks whether AI models provide a direct link (URL) to your website within their responses. Some platforms like Perplexity and Google AI Overviews add citations with links by default, while ChatGPT and Claude may not always provide direct URLs. The metric evaluates link presence across all monitored AI platforms.

**Why it matters:** A brand mention alone generates awareness, but only a direct link generates traffic and conversions. Without a URL, the user must manually search for your page, reducing the probability of a visit by 60-80%. Perplexity and Google AI Overviews add citations automatically, so it's important to be among the cited sources on these platforms.

**Real-world example:** A Perplexity query for 'best pizzerias in New York' — the response contains a list of restaurants and citations with links at the end: '[1] yelp.com, [2] tripadvisor.com'. A website that's cited with a link has positive Link Presence. If your site is mentioned but without a link, the user likely won't click through.

#### Sources
- [Google Search — AI features and your website](https://developers.google.com/search/docs/appearance/ai-features) — Google
- [Perplexity API — Getting Started](https://docs.perplexity.ai/guides/getting-started) — Perplexity
- [How to boost your AI search visibility: 5 key factors](https://searchengineland.com/how-to-boost-your-ai-search-visibility-5-key-factors-464398) — Search Engine Land

---

### K4 — Recommendation (Confidence Score)

**What is it:** Evaluates the level of certainty and conviction with which AI models talk about your brand. High confidence means AI responds with certainty and without hesitation ('Amazon is the world's largest online retailer'), while low confidence manifests through uncertain phrasing ('It seems that Amazon might be...' or 'I'm not sure, but...').

**Why it matters:** The AI's level of certainty directly affects user trust. If AI responds uncertainly, the user will seek further verification and may choose a competitor about whom AI speaks with greater confidence. The Confidence Score correlates with the quantity and quality of sources about your brand that the AI model processed during training.

**Real-world example:** When asked 'Is Shopify a reliable e-commerce platform?', ChatGPT responds: 'Shopify is one of the largest and most reliable e-commerce platforms with over 4 million active stores.' This is high confidence — no hesitation, specific numbers. Conversely, the response 'Shopify is apparently some kind of e-commerce platform' indicates low confidence.

#### Sources
- [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) — OpenAI
- [The Claude Model Card](https://docs.anthropic.com/en/docs/about-claude/models) — Anthropic
- [LLMO: 10 Ways to Work Your Brand Into AI Answers](https://ahrefs.com/blog/llm-optimization/) — Ahrefs

---

### K5 — Technology (Reputation Sentiment)

**What is it:** Analyzes the sentiment (tone) of AI responses about your brand — whether positive, neutral, or negative. AI models synthesize information from numerous sources, and their responses reflect the overall sentiment that exists about your brand on the internet. The metric evaluates phrasing, descriptors, and the context in which AI mentions you.

**Why it matters:** Negative sentiment in AI responses can damage brand reputation among thousands of users daily. Unlike a single negative review that a person can put into perspective, an AI model presents a synthesis of all available information as objective fact. That's why it's important to monitor and actively influence sentiment through quality PR and content marketing.

**Real-world example:** When asked 'experiences with company XY', AI responds: 'Company XY has predominantly positive reviews; customers praise fast delivery and quality customer support.' This is positive sentiment. If AI responded: 'Company XY has many negative reviews; customers complain about long delivery times,' the sentiment would be negative and the brand should respond.

#### Sources
- [AI Brand Performance: See How AI Talks About Your Brand](https://www.semrush.com/ai-seo/brand-performance/) — Semrush
- [How to measure and maximize visibility in AI search](https://searchengineland.com/how-to-measure-and-maximize-visibility-in-ai-search-462953) — Search Engine Land
- [Gemini Apps Privacy Hub](https://support.google.com/gemini/answer/13594961) — Google AI

---

### K6 — Local awareness (Multi-model Consistency)

**What is it:** Measures the consistency of information about your brand across different AI models (ChatGPT, Claude, Gemini, Perplexity). Each model has a different training dataset, architecture, and information sources, which can lead to contradictory responses. The metric compares key facts (name, focus, products, contact) across models and expresses the level of agreement.

**Why it matters:** If one AI model says the right things about you and another states incorrect information, it creates confusion and distrust among users. Consistency across models strengthens brand credibility. Inconsistency often indicates a lack of structured data on the website or contradictory information across different sources that different models interpret differently.

**Real-world example:** ChatGPT states that your company is based in New York, Claude claims it's in Chicago, and Gemini has no information about the headquarters at all. Multi-model Consistency is low in this case. The solution is to update Organization schema on the website, unify data on Google My Business, Wikipedia, and business registries so all AI models have access to the same facts.

#### Sources
- [LLM optimization in 2026: Tracking, visibility, and what's next for AI discovery](https://searchengineland.com/llm-optimization-tracking-visibility-ai-discovery-463860) — Search Engine Land
- [Publishers and developers FAQ](https://help.openai.com/en/articles/12627856-publishers-and-developers-faq) — OpenAI
- [15 of the Best AI Search Monitoring Tools](https://ahrefs.com/blog/aeo-tools-optimize-for-llms/) — Ahrefs

---

### K7 — Complete profile (Query Coverage)

**What is it:** Measures coverage across different types of user queries — informational ('what is...'), navigational ('company XY website'), transactional ('buy product XY'), and comparative ('XY vs. competitor'). It expresses how many query types your brand appears in within AI responses. A higher value means AI knows you across different contexts and stages of the buying process.

**Why it matters:** Users ask AI different types of questions at different stages of decision-making. If AI mentions you only for informational queries but not transactional ones, you're losing customers at the crucial purchasing stage. Complete query coverage ensures brand presence across the entire customer journey — from awareness to purchase.

**Real-world example:** An accounting firm tracks 4 query types: informational ('what is a tax return'), navigational ('accounting firm New York'), transactional ('hire an accountant online'), comparative ('best accounting firms in New York'). AI mentions them for informational and navigational queries but not for transactional and comparative ones — query coverage is 50%. Commercial content and reviews need strengthening.

#### Sources
- [AI Visibility Toolkit: Boost Brand Visibility in AI Search](https://www.semrush.com/kb/1493-ai-visibility-toolkit) — Semrush
- [How to optimize for AI search: 12 proven LLM visibility tactics](https://searchengineland.com/optimize-ai-search-llm-visibility-tactics-468106) — Search Engine Land
- [10 Ways to Use Ahrefs' Brand Radar to Grow AI Visibility](https://ahrefs.com/blog/brand-radar-use-cases/) — Ahrefs

---

### D1 — Main topic (Factual Accuracy)

**What is it:** Evaluates the correctness of facts that AI models state about your brand. It checks the accuracy of basic data: company name, founding year, headquarters, products, prices, contact information, number of employees, and other verifiable facts. Expressed as a percentage of correct facts out of the total number of verifiable AI claims about your brand.

**Why it matters:** AI hallucinations are a real problem — models can confidently state incorrect facts about your company. A wrong price, nonexistent product, or incorrect contact can deter a potential customer or damage your reputation. Regular monitoring of factual accuracy enables you to identify and correct wrong information by updating structured data and publicly available sources.

**Real-world example:** AI claims about your company: 'Company XY was founded in 2005, is headquartered in Chicago, and offers 3 main products.' In reality, it was founded in 2008, is based in New York, and offers 5 products. Factual Accuracy is therefore 0 out of 3 key facts (0%). The solution is to update information on Wikipedia, in business registries, on LinkedIn, and deploy correct Organization schema.

#### Sources
- [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774) — OpenAI
- [The Claude Model Card](https://docs.anthropic.com/en/docs/about-claude/models) — Anthropic
- [Free AI Brand Visibility Tool: Check Your AI Search Presence](https://www.semrush.com/free-tools/ai-search-visibility-checker/) — Semrush

---

### D2 — Category (Competitive Position)

**What is it:** Determines your brand's position compared to competitors in AI responses. It measures relative share of mentions (share of voice), ranking in lists and recommendations, and overall sentiment compared to competing brands. If AI lists your competitor first and you third when asked 'best X in [region]', your competitive position is lower.

**Why it matters:** Absolute mention numbers have limited value without competitive context. If you have 50 mentions but your main competitor has 200, you're significantly behind. Benchmarking against competitors allows you to identify specific areas where they outperform you and focus optimization precisely where it has the greatest impact on acquiring customers.

**Real-world example:** Three travel agencies: when asked 'best travel agency in the US', AI responds: '1. Expedia, 2. Booking.com, 3. Kayak.' Kayak has competitive position 3 out of 3. After a campaign focused on building mentions, reviews, and content authority, it moves to 2nd place after 6 months. The long-term goal is to occupy position 1.

#### Sources
- [The AI Visibility Index: Here's who's winning AI search](https://searchengineland.com/the-ai-visibility-index-heres-whos-winning-ai-search-463319) — Search Engine Land
- [Ahrefs Brand Radar: See ANY brand's AI visibility](https://ahrefs.com/brand-radar) — Ahrefs
- [Semrush AI Visibility — Win Every Search](https://www.semrush.com/ai-seo/overview/) — Semrush

---

### D3 — Keyword combination (Update Freshness)

**What is it:** Measures how current the information AI models state about your brand is. It checks whether AI knows your latest products, services, changes in offerings, new locations, or current pricing policy. Outdated information indicates that the AI model lacks access to fresh data or that your current information isn't sufficiently visible on the web.

**Why it matters:** AI models have training data with a certain delay (knowledge cutoff). If you recently changed your offering, opened a new location, or launched a new product, AI may not know about it. Regular content updates on the website, in structured data, and on authoritative sources (Wikipedia, LinkedIn, Google My Business) help AI models maintain current information.

**Real-world example:** A restaurant expanded its menu with vegetarian dishes in 2025 and opened a new location downtown. However, AI still references only the old menu and the single original location. Update Freshness is low. Solution: update the website, add LocalBusiness schema for the new location, publish a press release, and update the Google My Business profile. Within 2-4 months, AI models will register the changes.

#### Sources
- [Google Search — AI features and your website](https://developers.google.com/search/docs/appearance/ai-features) — Google
- [A 90-day SEO playbook for AI-driven search visibility](https://searchengineland.com/a-90-day-seo-playbook-for-ai-driven-search-visibility-466751) — Search Engine Land
- [LLM Visibility: What It Is and How to Optimize for It](https://ahrefs.com/blog/llm-visibility/) — Ahrefs

---