Methodology | KI-Assurance

Technical Foundation

The Engine

UK AI Safety Institute

Inspect AI

The evaluation infrastructure of the UK AI Safety Institute, used by leading AI labs including xAI, with contributions from DeepMind and Anthropic. Open source (MIT license), with over 100 evaluation tasks and a proven architecture for systematic AI testing.

ETH Zurich / INSAIT / LatticeFlow

Compl-AI

The EU AI Act compliance benchmark suite from ETH Zurich, INSAIT, and LatticeFlow AI. Maps 27+ established benchmarks to the 6 Trustworthy AI principles (EU HLEG). Published methodology (ArXiv: 2410.07959).

Proprietary Research

Swiss-Bench

Our proprietary evaluation scenarios for Swiss languages (German, French, Italian), legal terminology, financial domain language, and domain-specific failure modes in the Swiss regulatory environment.

Scoring Framework

KIAS Score: 6 Dimensions

Accuracy & Performance

Does the model perform its task correctly?

Robustness & Reliability

Does it behave consistently under stress?

Fairness & Non-Discrimination

Does it treat all groups equitably?

Data Protection

Does it protect personal data?

Transparency & Explainability

Can its decisions be traced and understood?

Swiss Regulatory Alignment

Is it suitable for the Swiss regulatory environment?

Each dimension is scored 0–100, with confidence intervals and sample sizes.

Workflow

The Process

Scoping

We jointly define evaluation objectives, models, and benchmarks (1 hour).

Configuration

We configure the evaluation pipeline for your specific models and data (2–4 hours).

Evaluation

The engine runs automated benchmarks. No manual intervention. Fully reproducible.

Analysis

We interpret the results, identify failure modes, and map gaps to regulatory requirements.

Report

You receive a standardized evaluation report with KIAS scores, gap analysis, and recommendations.

Handover

You receive the complete evaluation harness. You can rerun every test yourself.

Quality Assurance

Reproducibility Guarantee

Every evaluation report includes:

Complete evaluation configuration (Inspect AI task definitions, scorer logic, datasets)
Model version identifiers and API parameters used
Seed values and sampling parameters
Cryptographic timestamp of raw results
The complete evaluation harness – rerunnable at any time

We do not use proprietary, non-reproducible methods.

Principle

Independence

We have no commercial relationships with AI model providers. No commissions. No vendor partnerships. No pay-for-score. Every model is evaluated with the same methodology.

Infrastructure

Data Sovereignty

Standard

You provide an API key. We run the evaluation.

Regulated

Our dockerized engine runs on your infrastructure.

Premium

We bring dedicated hardware to your site. Complete air gap.

Privacy-First

You anonymize your data first using our script.

No data leaves Switzerland. No data is retained beyond the engagement.

Ready for an independent evaluation?

Get in touch →

Independent. Reproducible. Swiss.