{ "cells": [ { "cell_type": "markdown", "id": "3714bcce", "metadata": {}, "source": [ "# `dtype` konvertieren\n", "\n", "Manchmal passen die pandas-Datentypen nicht wirklich gut. Dies kann z.B. auf Serialisierungsformate zurückzuführen sein, die keine Typinformationen enthalten. Manchmal solltet ihr jedoch den Typ auch ändern, um eine bessere Performance zu erzielen – entweder mehr Manipulationsmöglichkeiten oder weniger Speicherbedarf. In den folgenden Beispielen werden wir verschiedene Konvertierungen einer `Series` vornehmen:" ] }, { "cell_type": "code", "execution_count": 1, "id": "e6bc0dcc", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.219373Z", "iopub.status.busy": "2026-05-21T14:25:58.219159Z", "iopub.status.idle": "2026-05-21T14:25:58.446137Z", "shell.execute_reply": "2026-05-21T14:25:58.445838Z", "shell.execute_reply.started": "2026-05-21T14:25:58.219349Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "ebae66af", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.446624Z", "iopub.status.busy": "2026-05-21T14:25:58.446501Z", "iopub.status.idle": "2026-05-21T14:25:58.449016Z", "shell.execute_reply": "2026-05-21T14:25:58.448723Z", "shell.execute_reply.started": "2026-05-21T14:25:58.446615Z" } }, "outputs": [], "source": [ "rng = np.random.default_rng()\n", "s = pd.Series(rng.normal(size=7))" ] }, { "cell_type": "code", "execution_count": 3, "id": "93f01178", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.449534Z", "iopub.status.busy": "2026-05-21T14:25:58.449455Z", "iopub.status.idle": "2026-05-21T14:25:58.453401Z", "shell.execute_reply": "2026-05-21T14:25:58.452821Z", "shell.execute_reply.started": "2026-05-21T14:25:58.449527Z" }, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 -1.663472\n", "1 0.205134\n", "2 0.333356\n", "3 -0.639533\n", "4 1.519715\n", "5 0.331444\n", "6 1.551766\n", "dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "markdown", "id": "1f0016ea", "metadata": {}, "source": [ "## Automatische Konvertierung\n", "\n", "[pandas.Series.convert_dtypes](https://pandas.pydata.org/docs/reference/api/pandas.Series.convert_dtypes.html) versucht, eine `Series` in einen Typ zu konvertieren, der `NA` unterstützt. Im Fall unserer `Series` wird der Typ von `float64` in `Float64` geändert:" ] }, { "cell_type": "code", "execution_count": 4, "id": "70b57e04", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.453993Z", "iopub.status.busy": "2026-05-21T14:25:58.453888Z", "iopub.status.idle": "2026-05-21T14:25:58.456900Z", "shell.execute_reply": "2026-05-21T14:25:58.456682Z", "shell.execute_reply.started": "2026-05-21T14:25:58.453983Z" } }, "outputs": [ { "data": { "text/plain": [ "0 -1.663472\n", "1 0.205134\n", "2 0.333356\n", "3 -0.639533\n", "4 1.519715\n", "5 0.331444\n", "6 1.551766\n", "dtype: Float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.convert_dtypes()" ] }, { "cell_type": "markdown", "id": "b9463886", "metadata": {}, "source": [ "Bedauerlicherweise habe ich jedoch mit `convert_dtypes` kaum Kontrolle darüber, in welchen Datentyp konvertiert wird. Daher bevorzuge ich [pandas.Series.astype](https://pandas.pydata.org/docs/reference/api/pandas.Series.astype.html):" ] }, { "cell_type": "code", "execution_count": 5, "id": "87b1c7e4", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.457277Z", "iopub.status.busy": "2026-05-21T14:25:58.457209Z", "iopub.status.idle": "2026-05-21T14:25:58.460227Z", "shell.execute_reply": "2026-05-21T14:25:58.460004Z", "shell.execute_reply.started": "2026-05-21T14:25:58.457269Z" } }, "outputs": [ { "data": { "text/plain": [ "0 -1.663472\n", "1 0.205134\n", "2 0.333356\n", "3 -0.639533\n", "4 1.519715\n", "5 0.331444\n", "6 1.551766\n", "dtype: float32" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(\"float32\")" ] }, { "cell_type": "markdown", "id": "e0b9f452-619c-4667-98f6-7fc4d2639722", "metadata": {}, "source": [ "Sofern jedoch nicht konvertierbare Werte enthalten sind, wird ein Fehler ausgegeben:" ] }, { "cell_type": "code", "execution_count": 6, "id": "b95d2906-b5f8-4e8c-854b-229d8afd1b61", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.462067Z", "iopub.status.busy": "2026-05-21T14:25:58.461980Z", "iopub.status.idle": "2026-05-21T14:25:58.464763Z", "shell.execute_reply": "2026-05-21T14:25:58.464573Z", "shell.execute_reply.started": "2026-05-21T14:25:58.462060Z" } }, "outputs": [ { "data": { "text/plain": [ "0 90.0\n", "1 NaN\n", "2 1.0\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rng = np.random.default_rng()\n", "n = pd.Series([rng.integers(127), np.nan, rng.integers(127)])\n", "n" ] }, { "cell_type": "code", "execution_count": 7, "id": "de312b5b-c7fc-4540-be84-8078be862b3d", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:25:58.465219Z", "iopub.status.busy": "2026-05-21T14:25:58.465092Z", "iopub.status.idle": "2026-05-21T14:25:58.770495Z", "shell.execute_reply": "2026-05-21T14:25:58.769548Z", "shell.execute_reply.started": "2026-05-21T14:25:58.465212Z" } }, "outputs": [ { "ename": "IntCastingNaNError", "evalue": "Cannot convert non-finite values (NA or inf) to integer", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIntCastingNaNError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[7], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mn\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mastype\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mint8\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/generic.py:6643\u001b[0m, in \u001b[0;36mNDFrame.astype\u001b[0;34m(self, dtype, copy, errors)\u001b[0m\n\u001b[1;32m 6637\u001b[0m results \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m 6638\u001b[0m ser\u001b[38;5;241m.\u001b[39mastype(dtype, copy\u001b[38;5;241m=\u001b[39mcopy, errors\u001b[38;5;241m=\u001b[39merrors) \u001b[38;5;28;01mfor\u001b[39;00m _, ser \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mitems()\n\u001b[1;32m 6639\u001b[0m ]\n\u001b[1;32m 6641\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 6642\u001b[0m \u001b[38;5;66;03m# else, only a single dtype is given\u001b[39;00m\n\u001b[0;32m-> 6643\u001b[0m new_data \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_mgr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mastype\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdtype\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcopy\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 6644\u001b[0m res \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_constructor_from_mgr(new_data, axes\u001b[38;5;241m=\u001b[39mnew_data\u001b[38;5;241m.\u001b[39maxes)\n\u001b[1;32m 6645\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m res\u001b[38;5;241m.\u001b[39m__finalize__(\u001b[38;5;28mself\u001b[39m, method\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mastype\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/internals/managers.py:430\u001b[0m, in \u001b[0;36mBaseBlockManager.astype\u001b[0;34m(self, dtype, copy, errors)\u001b[0m\n\u001b[1;32m 427\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m using_copy_on_write():\n\u001b[1;32m 428\u001b[0m copy \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[0;32m--> 430\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 431\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mastype\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 432\u001b[0m \u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 433\u001b[0m \u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcopy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 434\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 435\u001b[0m \u001b[43m \u001b[49m\u001b[43musing_cow\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43musing_copy_on_write\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 436\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/internals/managers.py:363\u001b[0m, in \u001b[0;36mBaseBlockManager.apply\u001b[0;34m(self, f, align_keys, **kwargs)\u001b[0m\n\u001b[1;32m 361\u001b[0m applied \u001b[38;5;241m=\u001b[39m b\u001b[38;5;241m.\u001b[39mapply(f, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 362\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 363\u001b[0m applied \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mb\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mf\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 364\u001b[0m result_blocks \u001b[38;5;241m=\u001b[39m extend_blocks(applied, result_blocks)\n\u001b[1;32m 366\u001b[0m out \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mtype\u001b[39m(\u001b[38;5;28mself\u001b[39m)\u001b[38;5;241m.\u001b[39mfrom_blocks(result_blocks, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39maxes)\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/internals/blocks.py:758\u001b[0m, in \u001b[0;36mBlock.astype\u001b[0;34m(self, dtype, copy, errors, using_cow, squeeze)\u001b[0m\n\u001b[1;32m 755\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCan not squeeze with more than one column.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 756\u001b[0m values \u001b[38;5;241m=\u001b[39m values[\u001b[38;5;241m0\u001b[39m, :] \u001b[38;5;66;03m# type: ignore[call-overload]\u001b[39;00m\n\u001b[0;32m--> 758\u001b[0m new_values \u001b[38;5;241m=\u001b[39m \u001b[43mastype_array_safe\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalues\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcopy\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 760\u001b[0m new_values \u001b[38;5;241m=\u001b[39m maybe_coerce_values(new_values)\n\u001b[1;32m 762\u001b[0m refs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/dtypes/astype.py:237\u001b[0m, in \u001b[0;36mastype_array_safe\u001b[0;34m(values, dtype, copy, errors)\u001b[0m\n\u001b[1;32m 234\u001b[0m dtype \u001b[38;5;241m=\u001b[39m dtype\u001b[38;5;241m.\u001b[39mnumpy_dtype\n\u001b[1;32m 236\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 237\u001b[0m new_values \u001b[38;5;241m=\u001b[39m \u001b[43mastype_array\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalues\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcopy\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 238\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mValueError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):\n\u001b[1;32m 239\u001b[0m \u001b[38;5;66;03m# e.g. _astype_nansafe can fail on object-dtype of strings\u001b[39;00m\n\u001b[1;32m 240\u001b[0m \u001b[38;5;66;03m# trying to convert to float\u001b[39;00m\n\u001b[1;32m 241\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m errors \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/dtypes/astype.py:182\u001b[0m, in \u001b[0;36mastype_array\u001b[0;34m(values, dtype, copy)\u001b[0m\n\u001b[1;32m 179\u001b[0m values \u001b[38;5;241m=\u001b[39m values\u001b[38;5;241m.\u001b[39mastype(dtype, copy\u001b[38;5;241m=\u001b[39mcopy)\n\u001b[1;32m 181\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 182\u001b[0m values \u001b[38;5;241m=\u001b[39m \u001b[43m_astype_nansafe\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalues\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcopy\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 184\u001b[0m \u001b[38;5;66;03m# in pandas we don't store numpy str dtypes, so convert to object\u001b[39;00m\n\u001b[1;32m 185\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(dtype, np\u001b[38;5;241m.\u001b[39mdtype) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28missubclass\u001b[39m(values\u001b[38;5;241m.\u001b[39mdtype\u001b[38;5;241m.\u001b[39mtype, \u001b[38;5;28mstr\u001b[39m):\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/dtypes/astype.py:101\u001b[0m, in \u001b[0;36m_astype_nansafe\u001b[0;34m(arr, dtype, copy, skipna)\u001b[0m\n\u001b[1;32m 96\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m lib\u001b[38;5;241m.\u001b[39mensure_string_array(\n\u001b[1;32m 97\u001b[0m arr, skipna\u001b[38;5;241m=\u001b[39mskipna, convert_na_value\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[1;32m 98\u001b[0m )\u001b[38;5;241m.\u001b[39mreshape(shape)\n\u001b[1;32m 100\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m np\u001b[38;5;241m.\u001b[39missubdtype(arr\u001b[38;5;241m.\u001b[39mdtype, np\u001b[38;5;241m.\u001b[39mfloating) \u001b[38;5;129;01mand\u001b[39;00m dtype\u001b[38;5;241m.\u001b[39mkind \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124miu\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m--> 101\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_astype_float_to_int_nansafe\u001b[49m\u001b[43m(\u001b[49m\u001b[43marr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 103\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m arr\u001b[38;5;241m.\u001b[39mdtype \u001b[38;5;241m==\u001b[39m \u001b[38;5;28mobject\u001b[39m:\n\u001b[1;32m 104\u001b[0m \u001b[38;5;66;03m# if we have a datetime/timedelta array of objects\u001b[39;00m\n\u001b[1;32m 105\u001b[0m \u001b[38;5;66;03m# then coerce to datetime64[ns] and use DatetimeArray.astype\u001b[39;00m\n\u001b[1;32m 107\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m lib\u001b[38;5;241m.\u001b[39mis_np_dtype(dtype, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mM\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n", "File \u001b[0;32m~/cusy/trn/jupyter-tutorial/uvenvs/py313/.venv/lib/python3.13/site-packages/pandas/core/dtypes/astype.py:145\u001b[0m, in \u001b[0;36m_astype_float_to_int_nansafe\u001b[0;34m(values, dtype, copy)\u001b[0m\n\u001b[1;32m 141\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 142\u001b[0m \u001b[38;5;124;03mastype with a check preventing converting NaN to an meaningless integer value.\u001b[39;00m\n\u001b[1;32m 143\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 144\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m np\u001b[38;5;241m.\u001b[39misfinite(values)\u001b[38;5;241m.\u001b[39mall():\n\u001b[0;32m--> 145\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m IntCastingNaNError(\n\u001b[1;32m 146\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot convert non-finite values (NA or inf) to integer\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 147\u001b[0m )\n\u001b[1;32m 148\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dtype\u001b[38;5;241m.\u001b[39mkind \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mu\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[1;32m 149\u001b[0m \u001b[38;5;66;03m# GH#45151\u001b[39;00m\n\u001b[1;32m 150\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (values \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m)\u001b[38;5;241m.\u001b[39mall():\n", "\u001b[0;31mIntCastingNaNError\u001b[0m: Cannot convert non-finite values (NA or inf) to integer" ] } ], "source": [ "n.astype(\"int8\")" ] }, { "cell_type": "markdown", "id": "7edecbd2-841a-44fe-85f2-776522d16c1b", "metadata": { "execution": { "iopub.execute_input": "2026-03-09T11:29:03.883303Z", "iopub.status.busy": "2026-03-09T11:29:03.882980Z", "iopub.status.idle": "2026-03-09T11:29:03.888305Z", "shell.execute_reply": "2026-03-09T11:29:03.887725Z", "shell.execute_reply.started": "2026-03-09T11:29:03.883281Z" } }, "source": [ "Fehler, wie dieser `IntCastingNaNError` können vermieden werden, indem durch `errors = \"ignore\"` ggf. der ursprüngliche Datentyp beibehalten wird:" ] }, { "cell_type": "code", "execution_count": 8, "id": "31623ec6-337e-4689-afa8-ab4259f29984", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.865776Z", "iopub.status.busy": "2026-05-21T14:26:05.865476Z", "iopub.status.idle": "2026-05-21T14:26:05.871279Z", "shell.execute_reply": "2026-05-21T14:26:05.870665Z", "shell.execute_reply.started": "2026-05-21T14:26:05.865757Z" } }, "outputs": [ { "data": { "text/plain": [ "0 90.0\n", "1 NaN\n", "2 1.0\n", "dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n.astype(\"int8\", errors=\"ignore\")" ] }, { "cell_type": "markdown", "id": "0d36b3e4-c88f-4ee0-96b2-a72d2f22f144", "metadata": {}, "source": [ "Die Verwendung des richtigen Typs kann Speicherplatz einsparen. Üblich ist ein 8 Byte breiter Datentyp, also `int64` oder `float64`. Wenn ihr einen schmaleren Typ verwenden könnt, reduziert dies den Speicherverbrauch deutlich, sodass ihr mehr Daten verarbeiten könnt. Ihr könnt NumPy verwenden, um die Grenzen von Integer- und Float-Typen zu überprüfen:" ] }, { "cell_type": "code", "execution_count": 9, "id": "a98e0506-97e9-474a-b7bd-00480c663503", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.872698Z", "iopub.status.busy": "2026-05-21T14:26:05.872421Z", "iopub.status.idle": "2026-05-21T14:26:05.876450Z", "shell.execute_reply": "2026-05-21T14:26:05.876048Z", "shell.execute_reply.started": "2026-05-21T14:26:05.872660Z" } }, "outputs": [ { "data": { "text/plain": [ "iinfo(min=-128, max=127, dtype=int8)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.iinfo(\"int8\")" ] }, { "cell_type": "code", "execution_count": 10, "id": "ab52e51d-6b61-45eb-b030-ea396d87a785", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.877160Z", "iopub.status.busy": "2026-05-21T14:26:05.877006Z", "iopub.status.idle": "2026-05-21T14:26:05.880612Z", "shell.execute_reply": "2026-05-21T14:26:05.880144Z", "shell.execute_reply.started": "2026-05-21T14:26:05.877146Z" } }, "outputs": [ { "data": { "text/plain": [ "iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.iinfo(\"int64\")" ] }, { "cell_type": "code", "execution_count": 11, "id": "c2134c5a", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.882231Z", "iopub.status.busy": "2026-05-21T14:26:05.882040Z", "iopub.status.idle": "2026-05-21T14:26:05.885571Z", "shell.execute_reply": "2026-05-21T14:26:05.885143Z", "shell.execute_reply.started": "2026-05-21T14:26:05.882215Z" } }, "outputs": [ { "data": { "text/plain": [ "finfo(resolution=1e-06, min=-3.4028235e+38, max=3.4028235e+38, dtype=float32)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.finfo(\"float32\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "80abda31", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.886348Z", "iopub.status.busy": "2026-05-21T14:26:05.886092Z", "iopub.status.idle": "2026-05-21T14:26:05.889914Z", "shell.execute_reply": "2026-05-21T14:26:05.889458Z", "shell.execute_reply.started": "2026-05-21T14:26:05.886329Z" } }, "outputs": [ { "data": { "text/plain": [ "finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.finfo(\"float64\")" ] }, { "cell_type": "markdown", "id": "31113679", "metadata": {}, "source": [ "## Speicherverbrauch\n", "\n", "Um den Speicherverbrauch der `Series` zu berechnen, könnt ihr [pandas.Series.nbytes](https://pandas.pydata.org/docs/reference/api/pandas.Series.nbytes.html) verwenden um den Speicher, der von den Daten verwendet wird, zu ermitteln. [pandas.Series.memory_usage](https://pandas.pydata.org/docs/reference/api/pandas.Series.memory_usage.html) erfasst darüberhinaus auch den Indexspeicher und den Datentyp. Mit `deep=True` lässt sich auch der Speicherverbrauch auf Systemebene ermitteln." ] }, { "cell_type": "code", "execution_count": 13, "id": "1f433772", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.890677Z", "iopub.status.busy": "2026-05-21T14:26:05.890543Z", "iopub.status.idle": "2026-05-21T14:26:05.894181Z", "shell.execute_reply": "2026-05-21T14:26:05.893808Z", "shell.execute_reply.started": "2026-05-21T14:26:05.890663Z" } }, "outputs": [ { "data": { "text/plain": [ "56" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.nbytes" ] }, { "cell_type": "code", "execution_count": 14, "id": "e32acea2", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.894974Z", "iopub.status.busy": "2026-05-21T14:26:05.894812Z", "iopub.status.idle": "2026-05-21T14:26:05.897909Z", "shell.execute_reply": "2026-05-21T14:26:05.897555Z", "shell.execute_reply.started": "2026-05-21T14:26:05.894964Z" } }, "outputs": [ { "data": { "text/plain": [ "35" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(\"Float32\").nbytes" ] }, { "cell_type": "code", "execution_count": 15, "id": "794f2ddc", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.898546Z", "iopub.status.busy": "2026-05-21T14:26:05.898445Z", "iopub.status.idle": "2026-05-21T14:26:05.900801Z", "shell.execute_reply": "2026-05-21T14:26:05.900497Z", "shell.execute_reply.started": "2026-05-21T14:26:05.898537Z" } }, "outputs": [ { "data": { "text/plain": [ "188" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.memory_usage()" ] }, { "cell_type": "code", "execution_count": 16, "id": "febe7656", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.901283Z", "iopub.status.busy": "2026-05-21T14:26:05.901123Z", "iopub.status.idle": "2026-05-21T14:26:05.903534Z", "shell.execute_reply": "2026-05-21T14:26:05.903171Z", "shell.execute_reply.started": "2026-05-21T14:26:05.901274Z" } }, "outputs": [ { "data": { "text/plain": [ "167" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(\"Float32\").memory_usage()" ] }, { "cell_type": "code", "execution_count": 17, "id": "55459cf6", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.905159Z", "iopub.status.busy": "2026-05-21T14:26:05.905065Z", "iopub.status.idle": "2026-05-21T14:26:05.907359Z", "shell.execute_reply": "2026-05-21T14:26:05.907152Z", "shell.execute_reply.started": "2026-05-21T14:26:05.905150Z" } }, "outputs": [ { "data": { "text/plain": [ "188" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.memory_usage(deep=True)" ] }, { "cell_type": "markdown", "id": "1c34f8a2", "metadata": {}, "source": [ "## String- und Kategorietypen\n", "\n", "Die Methode [pandas.Series.astype](https://pandas.pydata.org/docs/reference/api/pandas.Series.astype.html) kann auch numerische Reihen in Zeichenketten umwandeln, wenn ihr `str` übergebt. Beachtet den `dtype` im folgenden Beispiel:" ] }, { "cell_type": "code", "execution_count": 18, "id": "c771b5ca", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.907782Z", "iopub.status.busy": "2026-05-21T14:26:05.907690Z", "iopub.status.idle": "2026-05-21T14:26:05.910626Z", "shell.execute_reply": "2026-05-21T14:26:05.910417Z", "shell.execute_reply.started": "2026-05-21T14:26:05.907774Z" } }, "outputs": [ { "data": { "text/plain": [ "0 -1.6634723613898739\n", "1 0.20513361124745808\n", "2 0.3333563040239043\n", "3 -0.6395333353979279\n", "4 1.5197153715827265\n", "5 0.33144403280572465\n", "6 1.5517663730128375\n", "dtype: object" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(str)" ] }, { "cell_type": "code", "execution_count": 19, "id": "01724d65", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.911255Z", "iopub.status.busy": "2026-05-21T14:26:05.911141Z", "iopub.status.idle": "2026-05-21T14:26:05.913589Z", "shell.execute_reply": "2026-05-21T14:26:05.913330Z", "shell.execute_reply.started": "2026-05-21T14:26:05.911244Z" } }, "outputs": [ { "data": { "text/plain": [ "188" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(str).memory_usage()" ] }, { "cell_type": "code", "execution_count": 20, "id": "47e7f593", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.914134Z", "iopub.status.busy": "2026-05-21T14:26:05.914050Z", "iopub.status.idle": "2026-05-21T14:26:05.916759Z", "shell.execute_reply": "2026-05-21T14:26:05.916534Z", "shell.execute_reply.started": "2026-05-21T14:26:05.914126Z" } }, "outputs": [ { "data": { "text/plain": [ "605" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(str).memory_usage(deep=True)" ] }, { "cell_type": "markdown", "id": "8ca82137", "metadata": {}, "source": [ "Zur Konvertierung in einen kategorialen Typ könnt ihr `'category'` als Typ übergeben:" ] }, { "cell_type": "code", "execution_count": 21, "id": "19ce6064", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.917257Z", "iopub.status.busy": "2026-05-21T14:26:05.917179Z", "iopub.status.idle": "2026-05-21T14:26:05.920410Z", "shell.execute_reply": "2026-05-21T14:26:05.920197Z", "shell.execute_reply.started": "2026-05-21T14:26:05.917250Z" } }, "outputs": [ { "data": { "text/plain": [ "0 -1.6634723613898739\n", "1 0.20513361124745808\n", "2 0.3333563040239043\n", "3 -0.6395333353979279\n", "4 1.5197153715827265\n", "5 0.33144403280572465\n", "6 1.5517663730128375\n", "dtype: category\n", "Categories (7, object): ['-0.6395333353979279', '-1.6634723613898739', '0.20513361124745808', '0.33144403280572465', '0.3333563040239043', '1.5197153715827265', '1.5517663730128375']" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(str).astype(\"category\")" ] }, { "cell_type": "markdown", "id": "3ffcc4fe", "metadata": {}, "source": [ "Eine kategoriale `Series` ist nützlich für String-Daten und kann zu großen Speichereinsparungen führen. Das liegt daran, dass pandas bei der Konvertierung in kategoriale Daten nicht länger Python-Strings für jeden Wert verwendet, sondern sich wiederholende Werte nicht dupliziert werden. Ihr habt immer noch alle Funktionen des `str`-Attributs, aber ihr spart viel Speicherplatz wenn ihr viele doppelte Werte habt und steigert die Leistung, da ihr nicht so viele String-Operationen durchführen müsst." ] }, { "cell_type": "code", "execution_count": 22, "id": "df605fdc", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.920857Z", "iopub.status.busy": "2026-05-21T14:26:05.920780Z", "iopub.status.idle": "2026-05-21T14:26:05.923310Z", "shell.execute_reply": "2026-05-21T14:26:05.923043Z", "shell.execute_reply.started": "2026-05-21T14:26:05.920851Z" } }, "outputs": [ { "data": { "text/plain": [ "495" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(\"category\").memory_usage(deep=True)" ] }, { "cell_type": "markdown", "id": "211777e8", "metadata": {}, "source": [ "## Geordnete Kategorien\n", "\n", "Um geordnete Kategorien zu erstellen, müsst ihr einen eigenen [pandas.CategoricalDtype](https://pandas.pydata.org/docs/reference/api/pandas.CategoricalDtype.html) definieren:" ] }, { "cell_type": "code", "execution_count": 23, "id": "bcdbd52d", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.923722Z", "iopub.status.busy": "2026-05-21T14:26:05.923650Z", "iopub.status.idle": "2026-05-21T14:26:05.926633Z", "shell.execute_reply": "2026-05-21T14:26:05.926348Z", "shell.execute_reply.started": "2026-05-21T14:26:05.923714Z" } }, "outputs": [ { "data": { "text/plain": [ "0 -1.663472\n", "1 0.205134\n", "2 0.333356\n", "3 -0.639533\n", "4 1.519715\n", "5 0.331444\n", "6 1.551766\n", "dtype: category\n", "Categories (7, float64): [-1.663472 < -0.639533 < 0.205134 < 0.331444 < 0.333356 < 1.519715 < 1.551766]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pandas.api.types import CategoricalDtype\n", "\n", "\n", "s_sorted = pd.Series(sorted(set(s)))\n", "cat_dtype = CategoricalDtype(categories=s_sorted, ordered=True)\n", "\n", "s.astype(cat_dtype)" ] }, { "cell_type": "code", "execution_count": 24, "id": "2894c502", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.927012Z", "iopub.status.busy": "2026-05-21T14:26:05.926950Z", "iopub.status.idle": "2026-05-21T14:26:05.929460Z", "shell.execute_reply": "2026-05-21T14:26:05.929200Z", "shell.execute_reply.started": "2026-05-21T14:26:05.927005Z" } }, "outputs": [ { "data": { "text/plain": [ "495" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.astype(cat_dtype).memory_usage(deep=True)" ] }, { "cell_type": "markdown", "id": "8ad2da49", "metadata": {}, "source": [ "In der folgenden Tabelle sind die Typen aufgeführt, die ihr an `astype` übergeben könnt.\n", "\n", "Datentyp | Beschreibung\n", ":------- | :-----------\n", "`str`, `'str'` | in Python-String konvertieren\n", "`'string'` | in Pandas-String konvertieren mit `pandas.NA`\n", "`int`, `'int'`, `'int64'` | in NumPy `int64` konvertieren\n", "`'int32'`, `'uint32'` | in NumPy `int32` konvertieren\n", "`'Int64'` | in pandas `Int64` konvertieren mit `pandas.NA`\n", "`float`, `'float'`, `'float64'` | in Floats konvertieren\n", "`'category'` | in `CategoricalDtype` konvertieren mit `pandas.NA`" ] }, { "cell_type": "markdown", "id": "32e02ba5", "metadata": {}, "source": [ "## Umwandlung in andere Datentypen\n", "\n", "Die Methode [pandas.Series.to_numpy](https://pandas.pydata.org/docs/reference/api/pandas.Series.to_numpy.html) oder die Eigenschaft [pandas.Series.values](https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html) liefert uns ein NumPy-Array mit Werten, und [pandas.Series.to_list](https://pandas.pydata.org/docs/reference/api/pandas.Series.to_list.html) gibt eine Python-Liste mit Werten zurück. Warum solltet ihr das wollen? pandas-Objekte sind meist viel benutzerfreundlicher und der Code lässt sich leichter lesen. Zudem werden Python-Listen sehr viel langsamer verarbeitet werden können. Mit [pandas.Series.to_frame](https://pandas.pydata.org/docs/reference/api/pandas.Series.to_frame.html) könnt ihr ggf. einen DataFrame mit einer einzigen Spalte erzeugen:" ] }, { "cell_type": "code", "execution_count": 25, "id": "081a32ba", "metadata": { "execution": { "iopub.execute_input": "2026-05-21T14:26:05.929912Z", "iopub.status.busy": "2026-05-21T14:26:05.929831Z", "iopub.status.idle": "2026-05-21T14:26:05.933941Z", "shell.execute_reply": "2026-05-21T14:26:05.933716Z", "shell.execute_reply.started": "2026-05-21T14:26:05.929905Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
0-1.663472
10.205134
20.333356
3-0.639533
41.519715
50.331444
61.551766
\n", "
" ], "text/plain": [ " 0\n", "0 -1.663472\n", "1 0.205134\n", "2 0.333356\n", "3 -0.639533\n", "4 1.519715\n", "5 0.331444\n", "6 1.551766" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.to_frame()" ] }, { "cell_type": "markdown", "id": "7aa64c3a", "metadata": {}, "source": [ "Auch die Funktion [pandas.to_datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) kann hilfreich sein um in pandas um Werte in Datum und Uhrzeit zu konvertieren." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }