{ "cells": [ { "cell_type": "markdown", "id": "0a03dc77", "metadata": {}, "source": [ "# Aggregation\n", "\n", "Aggregationen beziehen sich auf jede Datentransformation, die skalare Werte aus Arrays erzeugt. In den vorangegangenen Beispielen wurden mehrere von ihnen verwendet, darunter `count` und `sum`. Ihr fragt euch nun vielleicht, was passiert, wenn ihr `sum()` auf ein `GroupBy`-Objekt anwendet. Für viele gängige Aggregationen, wie die in der folgenden Tabelle, gibt es optimierte Implementierungen. Sie sind jedoch nicht auf diesen Satz von Methoden beschränkt.\n", "\n", "Funktionsname | Beschreibung\n", ":------------ | :-----------\n", "`any`, `all` | Gibt `True` zurück, wenn einer (ein oder mehrere Werte) oder alle Nicht-NA-Werte \"truthy\" sind\n", "`count` | Anzahl der Nicht-NA-Werte\n", "`cummin`, `cummax` | Kumuliertes Minimum und Maximum der Nicht-NA-Werte\n", "`cumsum` | Kumulative Summe der Nicht-NA-Werte\n", "`cumprod` | Kumulatives Produkt von Nicht-NA-Werten\n", "`first`, `last` | Erste und letzte Nicht-NA-Werte\n", "`mean` | Mittelwert der Nicht-NA-Werte\n", "`median` | Arithmetischer Median der Nicht-NA-Werte\n", "`min`, `max` | Minimum und Maximum der Nicht-NA-Werte\n", "`nth` | Abrufen des n-ten größten Wertes\n", "`ohlc` | Berechnung von vier *Open-high-low-close*-Statistiken für zeitreihenähnliche Daten\n", "`prod` | Produkt der Nicht-NA-Werte\n", "`quantile` | Berechnet das Stichprobenquantil\n", "`rank` | Ordinale Ränge von Nicht-NA-Werten, wie beim Aufruf von Series.rank\n", "`sum` | Summe der Nicht-NA-Werte\n", "`std`, `var` | Standardabweichung und Varianz der Stichprobe\n", "\n", "Ihr könnt eigene Aggregationen verwenden und zusätzlich jede Methode aufrufen, die auch für das gruppierte Objekt definiert ist. Zum Beispiel wählt die `Series`-Methode `nsmallest` die kleinste angeforderte Anzahl von Werten aus den Daten aus.\n", "\n", "Obwohl `nsmallest` nicht explizit für `GroupBy` implementiert ist, können wir es dennoch mit einer nicht optimierten Implementierung verwenden. Intern zerlegt `GroupBy` die `Series`, ruft `df.nsmallest(n)` für jeden Teil auf und fügt diese Ergebnisse dann im Ergebnisobjekt zusammen:" ] }, { "cell_type": "code", "execution_count": 1, "id": "dcbeb480", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.219456Z", "iopub.status.busy": "2026-05-21T16:38:57.219078Z", "iopub.status.idle": "2026-05-21T16:38:57.437988Z", "shell.execute_reply": "2026-05-21T16:38:57.437657Z", "shell.execute_reply.started": "2026-05-21T16:38:57.219437Z" } }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "0224cd39", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.438544Z", "iopub.status.busy": "2026-05-21T16:38:57.438426Z", "iopub.status.idle": "2026-05-21T16:38:57.445464Z", "shell.execute_reply": "2026-05-21T16:38:57.445156Z", "shell.execute_reply.started": "2026-05-21T16:38:57.438536Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Title2021-122022-012022-02
0Jupyter Tutorial30134.033295.019651.0
1Jupyter Tutorial6073.07716.06547.0
2PyViz Tutorial4873.03930.02573.0
3NoneNaNNaNNaN
4Python Basics427.0276.0525.0
5Python Basics95.0226.0157.0
\n", "
" ], "text/plain": [ " Title 2021-12 2022-01 2022-02\n", "0 Jupyter Tutorial 30134.0 33295.0 19651.0\n", "1 Jupyter Tutorial 6073.0 7716.0 6547.0\n", "2 PyViz Tutorial 4873.0 3930.0 2573.0\n", "3 None NaN NaN NaN\n", "4 Python Basics 427.0 276.0 525.0\n", "5 Python Basics 95.0 226.0 157.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(\n", " {\n", " \"Title\": [\n", " \"Jupyter Tutorial\",\n", " \"Jupyter Tutorial\",\n", " \"PyViz Tutorial\",\n", " None,\n", " \"Python Basics\",\n", " \"Python Basics\",\n", " ],\n", " \"2021-12\": [30134, 6073, 4873, None, 427, 95],\n", " \"2022-01\": [33295, 7716, 3930, None, 276, 226],\n", " \"2022-02\": [19651, 6547, 2573, None, 525, 157],\n", " }\n", ")\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "id": "e87f7e40", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.445894Z", "iopub.status.busy": "2026-05-21T16:38:57.445818Z", "iopub.status.idle": "2026-05-21T16:38:57.447558Z", "shell.execute_reply": "2026-05-21T16:38:57.447325Z", "shell.execute_reply.started": "2026-05-21T16:38:57.445886Z" } }, "outputs": [], "source": [ "grouped = df.groupby(\"Title\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "de944585", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.447916Z", "iopub.status.busy": "2026-05-21T16:38:57.447849Z", "iopub.status.idle": "2026-05-21T16:38:57.452008Z", "shell.execute_reply": "2026-05-21T16:38:57.451780Z", "shell.execute_reply.started": "2026-05-21T16:38:57.447910Z" } }, "outputs": [ { "data": { "text/plain": [ "Title \n", "Jupyter Tutorial 1 7716.0\n", "PyViz Tutorial 2 3930.0\n", "Python Basics 5 226.0\n", "Name: 2022-01, dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped[\"2022-01\"].nsmallest(1)" ] }, { "cell_type": "markdown", "id": "7dced4b7", "metadata": {}, "source": [ "Um eine eigene Aggregationsfunktionen zu verwenden, übergebt eine beliebige Funktion, die ein Array aggregiert, an die Methode `aggregate` oder `agg`:" ] }, { "cell_type": "code", "execution_count": 5, "id": "d99f985e", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.452390Z", "iopub.status.busy": "2026-05-21T16:38:57.452301Z", "iopub.status.idle": "2026-05-21T16:38:57.457146Z", "shell.execute_reply": "2026-05-21T16:38:57.456909Z", "shell.execute_reply.started": "2026-05-21T16:38:57.452382Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
Title
Jupyter Tutorial24061.025579.013104.0
PyViz Tutorial0.00.00.0
Python Basics332.050.0368.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 2022-02\n", "Title \n", "Jupyter Tutorial 24061.0 25579.0 13104.0\n", "PyViz Tutorial 0.0 0.0 0.0\n", "Python Basics 332.0 50.0 368.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def minmaxrange(arr):\n", " return arr.max() - arr.min()\n", "\n", "\n", "grouped.agg(minmaxrange)" ] }, { "cell_type": "markdown", "id": "66e7a918", "metadata": {}, "source": [ "Ihr werdet feststellen, dass einige Methoden wie `describe` ebenfalls funktionieren, auch wenn es sich dabei streng genommen nicht um Aggregationen handelt:" ] }, { "cell_type": "code", "execution_count": 6, "id": "b4a40696", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.458679Z", "iopub.status.busy": "2026-05-21T16:38:57.458562Z", "iopub.status.idle": "2026-05-21T16:38:57.471915Z", "shell.execute_reply": "2026-05-21T16:38:57.471644Z", "shell.execute_reply.started": "2026-05-21T16:38:57.458666Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
Title
Jupyter Tutorial2.018103.517013.6962626073.012088.2518103.524118.7530134.02.020505.5...26900.2533295.02.013099.09265.9272616547.09823.013099.016375.019651.0
PyViz Tutorial1.04873.0NaN4873.04873.004873.04873.004873.01.03930.0...3930.003930.01.02573.0NaN2573.02573.02573.02573.02573.0
Python Basics2.0261.0234.75945195.0178.00261.0344.00427.02.0251.0...263.50276.02.0341.0260.215295157.0249.0341.0433.0525.0
\n", "

3 rows × 24 columns

\n", "
" ], "text/plain": [ " 2021-12 \\\n", " count mean std min 25% 50% \n", "Title \n", "Jupyter Tutorial 2.0 18103.5 17013.696262 6073.0 12088.25 18103.5 \n", "PyViz Tutorial 1.0 4873.0 NaN 4873.0 4873.00 4873.0 \n", "Python Basics 2.0 261.0 234.759451 95.0 178.00 261.0 \n", "\n", " 2022-01 ... \\\n", " 75% max count mean ... 75% max \n", "Title ... \n", "Jupyter Tutorial 24118.75 30134.0 2.0 20505.5 ... 26900.25 33295.0 \n", "PyViz Tutorial 4873.00 4873.0 1.0 3930.0 ... 3930.00 3930.0 \n", "Python Basics 344.00 427.0 2.0 251.0 ... 263.50 276.0 \n", "\n", " 2022-02 \\\n", " count mean std min 25% 50% \n", "Title \n", "Jupyter Tutorial 2.0 13099.0 9265.927261 6547.0 9823.0 13099.0 \n", "PyViz Tutorial 1.0 2573.0 NaN 2573.0 2573.0 2573.0 \n", "Python Basics 2.0 341.0 260.215295 157.0 249.0 341.0 \n", "\n", " \n", " 75% max \n", "Title \n", "Jupyter Tutorial 16375.0 19651.0 \n", "PyViz Tutorial 2573.0 2573.0 \n", "Python Basics 433.0 525.0 \n", "\n", "[3 rows x 24 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.describe()" ] }, { "cell_type": "markdown", "id": "9adf3b68", "metadata": {}, "source": [ "
\n", "\n", "**Hinweis:**\n", "\n", "Benutzerdefinierte Aggregationsfunktionen sind im Allgemeinen viel langsamer als die optimierten Funktionen in der obigen Tabelle. Dies liegt daran, dass bei der Erstellung der Zwischendatensätze für die Gruppe ein gewisser Mehraufwand entsteht (Funktionsaufrufe, Umordnung von Daten).\n", "
" ] }, { "cell_type": "markdown", "id": "668065b9", "metadata": {}, "source": [ "## Spaltenweise zusätzliche Funktionen\n", "\n", "Wie wir bereits gesehen haben, ist das Aggregieren einer `Series` oder aller Spalten eines `DataFrame` eine Frage der Verwendung von `aggregate` (oder `agg`) mit der gewünschten Funktion oder des Aufrufs einer Methode wie `mean` oder `std`. Es kommt jedoch häufiger vor, dass gleichzeitig mit einer anderen Funktion je nach Spalte oder mit mehreren Funktionen aggregiert werden soll." ] }, { "cell_type": "code", "execution_count": 7, "id": "c5bfa781", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.472442Z", "iopub.status.busy": "2026-05-21T16:38:57.472346Z", "iopub.status.idle": "2026-05-21T16:38:57.475874Z", "shell.execute_reply": "2026-05-21T16:38:57.475419Z", "shell.execute_reply.started": "2026-05-21T16:38:57.472430Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
Title
Jupyter Tutorial18103.520505.513099.0
PyViz Tutorial4873.03930.02573.0
Python Basics261.0251.0341.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 2022-02\n", "Title \n", "Jupyter Tutorial 18103.5 20505.5 13099.0\n", "PyViz Tutorial 4873.0 3930.0 2573.0\n", "Python Basics 261.0 251.0 341.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg(\"mean\")" ] }, { "cell_type": "markdown", "id": "bd0c9405", "metadata": {}, "source": [ "Wenn ihr stattdessen eine Liste von Funktionen oder Funktionsnamen übergebt, erhaltet ihr einen `DataFrame` mit Spaltennamen aus den Funktionen zurück:" ] }, { "cell_type": "code", "execution_count": 8, "id": "9113bdf2", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.476512Z", "iopub.status.busy": "2026-05-21T16:38:57.476403Z", "iopub.status.idle": "2026-05-21T16:38:57.482549Z", "shell.execute_reply": "2026-05-21T16:38:57.482192Z", "shell.execute_reply.started": "2026-05-21T16:38:57.476504Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
meanstdminmaxrangemeanstdminmaxrangemeanstdminmaxrange
Title
Jupyter Tutorial18103.517013.69626224061.020505.518087.08435625579.013099.09265.92726113104.0
PyViz Tutorial4873.0NaN0.03930.0NaN0.02573.0NaN0.0
Python Basics261.0234.759451332.0251.035.35533950.0341.0260.215295368.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 \\\n", " mean std minmaxrange mean std \n", "Title \n", "Jupyter Tutorial 18103.5 17013.696262 24061.0 20505.5 18087.084356 \n", "PyViz Tutorial 4873.0 NaN 0.0 3930.0 NaN \n", "Python Basics 261.0 234.759451 332.0 251.0 35.355339 \n", "\n", " 2022-02 \n", " minmaxrange mean std minmaxrange \n", "Title \n", "Jupyter Tutorial 25579.0 13099.0 9265.927261 13104.0 \n", "PyViz Tutorial 0.0 2573.0 NaN 0.0 \n", "Python Basics 50.0 341.0 260.215295 368.0 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg([\"mean\", \"std\", minmaxrange])" ] }, { "cell_type": "markdown", "id": "0e6a9e80", "metadata": {}, "source": [ "Hier haben wir `agg` eine Liste von Aggregationsfunktionen übergeben, die unabhängig voneinander für die Datengruppen ausgewertet werden sollen." ] }, { "cell_type": "markdown", "id": "b9a475d9", "metadata": {}, "source": [ "Ihr braucht die Namen, die `GroupBy` den Spalten gibt, nicht zu akzeptieren; insbesondere haben Lambda-Funktionen den Namen ``, was ihre Identifizierung erschwert. Wenn ihr eine Liste von Tupels übergebt, wird das erste Element jedes Tuples als Spaltenname im `DataFrame` verwendet:" ] }, { "cell_type": "code", "execution_count": 9, "id": "d27d19a1", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.483056Z", "iopub.status.busy": "2026-05-21T16:38:57.482972Z", "iopub.status.idle": "2026-05-21T16:38:57.488563Z", "shell.execute_reply": "2026-05-21T16:38:57.488235Z", "shell.execute_reply.started": "2026-05-21T16:38:57.483049Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
MittelwertStandardabweichungBereichMittelwertStandardabweichungBereichMittelwertStandardabweichungBereich
Title
Jupyter Tutorial18103.517013.69626224061.020505.518087.08435625579.013099.09265.92726113104.0
PyViz Tutorial4873.0NaN0.03930.0NaN0.02573.0NaN0.0
Python Basics261.0234.759451332.0251.035.35533950.0341.0260.215295368.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 \\\n", " Mittelwert Standardabweichung Bereich Mittelwert \n", "Title \n", "Jupyter Tutorial 18103.5 17013.696262 24061.0 20505.5 \n", "PyViz Tutorial 4873.0 NaN 0.0 3930.0 \n", "Python Basics 261.0 234.759451 332.0 251.0 \n", "\n", " 2022-02 \\\n", " Standardabweichung Bereich Mittelwert Standardabweichung \n", "Title \n", "Jupyter Tutorial 18087.084356 25579.0 13099.0 9265.927261 \n", "PyViz Tutorial NaN 0.0 2573.0 NaN \n", "Python Basics 35.355339 50.0 341.0 260.215295 \n", "\n", " \n", " Bereich \n", "Title \n", "Jupyter Tutorial 13104.0 \n", "PyViz Tutorial 0.0 \n", "Python Basics 368.0 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg(\n", " [(\"Mean\", \"mean\"), (\"Standard deviation\", \"std\"), (\"Range\", minmaxrange)],\n", ")" ] }, { "cell_type": "markdown", "id": "eb306ec2", "metadata": {}, "source": [ "Bei einem `DataFrame` habt ihr die Möglichkeit, eine Liste von Funktionen anzugeben, die auf alle Spalten oder auf verschiedene Funktionen pro Spalte angewendet werden. Nehmen wir an, wir möchten die gleichen drei Statistiken für die Spalten berechnen:" ] }, { "cell_type": "code", "execution_count": 10, "id": "55d7b2b1", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.489226Z", "iopub.status.busy": "2026-05-21T16:38:57.489082Z", "iopub.status.idle": "2026-05-21T16:38:57.496172Z", "shell.execute_reply": "2026-05-21T16:38:57.495949Z", "shell.execute_reply.started": "2026-05-21T16:38:57.489216Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-012022-02
countmeanmaxcountmeanmaxcountmeanmax
Title
Jupyter Tutorial218103.530134.0220505.533295.0213099.019651.0
PyViz Tutorial14873.04873.013930.03930.012573.02573.0
Python Basics2261.0427.02251.0276.02341.0525.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 2022-02 \\\n", " count mean max count mean max count \n", "Title \n", "Jupyter Tutorial 2 18103.5 30134.0 2 20505.5 33295.0 2 \n", "PyViz Tutorial 1 4873.0 4873.0 1 3930.0 3930.0 1 \n", "Python Basics 2 261.0 427.0 2 251.0 276.0 2 \n", "\n", " \n", " mean max \n", "Title \n", "Jupyter Tutorial 13099.0 19651.0 \n", "PyViz Tutorial 2573.0 2573.0 \n", "Python Basics 341.0 525.0 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats = [\"count\", \"mean\", \"max\"]\n", "evaluations = grouped.agg(stats)\n", "\n", "evaluations" ] }, { "cell_type": "markdown", "id": "454dbc00", "metadata": {}, "source": [ "Wie ihr sehen könnt, hat der resultierende `DataFrame` hierarchische Spalten, genauso wie ihr sie bekommen würdet, wenn ihr jede Spalte separat aggregieren und [pandas.concat](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) verwenden würdet, um die Ergebnisse zusammenzufügen, indem ihr die Spaltennamen als Schlüsselargument verwendet:" ] }, { "cell_type": "code", "execution_count": 11, "id": "dd307b43", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.496763Z", "iopub.status.busy": "2026-05-21T16:38:57.496611Z", "iopub.status.idle": "2026-05-21T16:38:57.500551Z", "shell.execute_reply": "2026-05-21T16:38:57.500326Z", "shell.execute_reply.started": "2026-05-21T16:38:57.496755Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanmax
Title
Jupyter Tutorial218103.530134.0
PyViz Tutorial14873.04873.0
Python Basics2261.0427.0
\n", "
" ], "text/plain": [ " count mean max\n", "Title \n", "Jupyter Tutorial 2 18103.5 30134.0\n", "PyViz Tutorial 1 4873.0 4873.0\n", "Python Basics 2 261.0 427.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluations[\"2021-12\"]" ] }, { "cell_type": "markdown", "id": "245a9179", "metadata": {}, "source": [ "Wie zuvor kann eine Liste von Tupeln mit benutzerdefinierten Namen übergeben werden:" ] }, { "cell_type": "code", "execution_count": 12, "id": "e80c106a", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.501147Z", "iopub.status.busy": "2026-05-21T16:38:57.501043Z", "iopub.status.idle": "2026-05-21T16:38:57.506419Z", "shell.execute_reply": "2026-05-21T16:38:57.506015Z", "shell.execute_reply.started": "2026-05-21T16:38:57.501140Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-01
MeanVarianceMeanVariance
Title
Jupyter Tutorial18103.5289465860.520505.5327142620.5
PyViz Tutorial4873.0NaN3930.0NaN
Python Basics261.055112.0251.01250.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01 \n", " Mean Variance Mean Variance\n", "Title \n", "Jupyter Tutorial 18103.5 289465860.5 20505.5 327142620.5\n", "PyViz Tutorial 4873.0 NaN 3930.0 NaN\n", "Python Basics 261.0 55112.0 251.0 1250.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tuples = [(\"Mean\", \"mean\"), (\"Variance\", \"var\")]\n", "\n", "grouped[[\"2021-12\", \"2022-01\"]].agg(tuples)" ] }, { "cell_type": "markdown", "id": "4f9720af", "metadata": {}, "source": [ "Nehmen wir nun an, dass potenziell verschiedene Funktionen auf eine oder mehrere der Spalten angewendet werden sollen, dann übergeben wir dazu ein `dict` an `agg`, das eine Zuordnung von Spaltennamen zu einer der Funktionsspezifikationen enthält:" ] }, { "cell_type": "code", "execution_count": 13, "id": "d93edadc", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.507201Z", "iopub.status.busy": "2026-05-21T16:38:57.506905Z", "iopub.status.idle": "2026-05-21T16:38:57.510597Z", "shell.execute_reply": "2026-05-21T16:38:57.510329Z", "shell.execute_reply.started": "2026-05-21T16:38:57.507194Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-01
Title
Jupyter Tutorial18103.5327142620.5
PyViz Tutorial4873.0NaN
Python Basics261.01250.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01\n", "Title \n", "Jupyter Tutorial 18103.5 327142620.5\n", "PyViz Tutorial 4873.0 NaN\n", "Python Basics 261.0 1250.0" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg({\"2021-12\": \"mean\", \"2022-01\": \"var\"})" ] }, { "cell_type": "code", "execution_count": 14, "id": "01ce2021", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.510963Z", "iopub.status.busy": "2026-05-21T16:38:57.510892Z", "iopub.status.idle": "2026-05-21T16:38:57.515471Z", "shell.execute_reply": "2026-05-21T16:38:57.515213Z", "shell.execute_reply.started": "2026-05-21T16:38:57.510957Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2021-122022-01
minmaxmeanstdsum
Title
Jupyter Tutorial6073.030134.018103.517013.69626241011.0
PyViz Tutorial4873.04873.04873.0NaN3930.0
Python Basics95.0427.0261.0234.759451502.0
\n", "
" ], "text/plain": [ " 2021-12 2022-01\n", " min max mean std sum\n", "Title \n", "Jupyter Tutorial 6073.0 30134.0 18103.5 17013.696262 41011.0\n", "PyViz Tutorial 4873.0 4873.0 4873.0 NaN 3930.0\n", "Python Basics 95.0 427.0 261.0 234.759451 502.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg({\"2021-12\": [\"min\", \"max\", \"mean\", \"std\"], \"2022-01\": \"sum\"})" ] }, { "cell_type": "markdown", "id": "64c324f9", "metadata": {}, "source": [ "## Aggregierte Daten ohne Zeilenindizes zurückgeben\n", "\n", "In allen bisherigen Beispielen werden die aggregierten Daten mit einem Index zurückgegeben. Da dies nicht immer erwünscht ist, könnt ihr dieses Verhalten in den meisten Fällen deaktivieren, indem ihr `as_index=False` an `groupby` übergebt:" ] }, { "cell_type": "code", "execution_count": 15, "id": "b8e1f80b", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T16:38:57.515993Z", "iopub.status.busy": "2026-05-21T16:38:57.515883Z", "iopub.status.idle": "2026-05-21T16:38:57.519546Z", "shell.execute_reply": "2026-05-21T16:38:57.519279Z", "shell.execute_reply.started": "2026-05-21T16:38:57.515982Z" } }, "outputs": [ { "data": { "text/plain": [ "2021-12 minmaxrange 8131.000000\n", "2022-01 minmaxrange 8543.000000\n", "2022-02 minmaxrange 4490.666667\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg([minmaxrange]).mean()" ] }, { "cell_type": "markdown", "id": "0d515ab5", "metadata": {}, "source": [ "Durch die Verwendung der Methode `as_index=False` werden einige unnötige Berechnungen vermieden. Natürlich ist es jederzeit möglich, das Ergebnis wieder mit Index zu erhalten, indem `reset_index` für das Ergebnis aufgerufen wird. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }