{ "cells": [ { "cell_type": "markdown", "id": "b6d8e288", "metadata": {}, "source": [ "# Python-Datenstrukturen in pandas überführen\n", "\n", "Python-Datenstrukuren wie Listen und Arrays lassen sich in pandas [Series](#Series) oder [DataFrames](#DataFrame) überführen." ] }, { "cell_type": "code", "execution_count": 1, "id": "362d58a7", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "11ba3983", "metadata": {}, "source": [ "## Series\n", "\n", "Python [Lists](https://docs.python.org/3/tutorial/introduction.html#lists) können einfach in pandas Series umgewandelt werden:" ] }, { "cell_type": "code", "execution_count": 2, "id": "d86c6823", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -0.751442\n", "1 0.816935\n", "2 -0.272546\n", "3 -0.268295\n", "4 -0.296728\n", "5 0.176255\n", "6 -0.322612\n", "dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list1 = [\n", " -0.751442,\n", " 0.816935,\n", " -0.272546,\n", " -0.268295,\n", " -0.296728,\n", " 0.176255,\n", " -0.322612,\n", "]\n", "\n", "pd.Series(list1)" ] }, { "cell_type": "markdown", "id": "29236df2", "metadata": {}, "source": [ "Auch mehrere Lists lassen sich einfach in eine pandas Series umwandeln:" ] }, { "cell_type": "code", "execution_count": 3, "id": "50689bfc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -0.751442\n", "1 0.816935\n", "2 -0.272546\n", "3 -0.268295\n", "4 -0.296728\n", "5 0.176255\n", "6 -0.322612\n", "7 -0.029608\n", "8 -0.277982\n", "9 2.693057\n", "10 -0.850817\n", "11 0.783868\n", "12 -1.137835\n", "13 -0.617132\n", "dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list2 = [\n", " -0.029608,\n", " -0.277982,\n", " 2.693057,\n", " -0.850817,\n", " 0.783868,\n", " -1.137835,\n", " -0.617132,\n", "]\n", "\n", "pd.Series(list1 + list2)" ] }, { "cell_type": "markdown", "id": "4103ed20", "metadata": {}, "source": [ "Es kann auch eine Liste als Index übergeben werden:" ] }, { "cell_type": "code", "execution_count": 4, "id": "6511d214", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022-01-31 -0.751442\n", "2022-02-01 0.816935\n", "2022-02-02 -0.272546\n", "2022-02-03 -0.268295\n", "2022-02-04 -0.296728\n", "2022-02-05 0.176255\n", "2022-02-06 -0.322612\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date = [\n", " \"2022-01-31\",\n", " \"2022-02-01\",\n", " \"2022-02-02\",\n", " \"2022-02-03\",\n", " \"2022-02-04\",\n", " \"2022-02-05\",\n", " \"2022-02-06\",\n", "]\n", "\n", "pd.Series(list1, index=date)" ] }, { "cell_type": "markdown", "id": "6b2764b3", "metadata": {}, "source": [ "Mit Python [Dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) könnt ihr nicht nur Werte sondern auch die zugehörigen Schlüssel an eine pandas Series übergeben:" ] }, { "cell_type": "code", "execution_count": 5, "id": "cd74ecdd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022-01-31 -0.751442\n", "2022-02-01 0.816935\n", "2022-02-02 -0.272546\n", "2022-02-03 -0.268295\n", "2022-02-04 -0.296728\n", "2022-02-05 0.176255\n", "2022-02-06 -0.322612\n", "dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict1 = {\n", " \"2022-01-31\": -0.751442,\n", " \"2022-02-01\": 0.816935,\n", " \"2022-02-02\": -0.272546,\n", " \"2022-02-03\": -0.268295,\n", " \"2022-02-04\": -0.296728,\n", " \"2022-02-05\": 0.176255,\n", " \"2022-02-06\": -0.322612,\n", "}\n", "\n", "pd.Series(dict1)" ] }, { "cell_type": "markdown", "id": "8fcb9bf5", "metadata": {}, "source": [ "Wenn ihr ein `dict` übergebt, berücksichtigt der Index in der resultierenden pandas Series die Reihenfolge der Schlüssel im Dict." ] }, { "cell_type": "markdown", "id": "357db634", "metadata": {}, "source": [ "Mit [collections.ChainMap](https://docs.python.org/3/library/collections.html#collections.ChainMap) könnt ihr auch mehrere Dicts in eine pandas.Series verwandeln.\n", "\n", "Zunächst definieren wir hierfür ein zweites Dict:" ] }, { "cell_type": "code", "execution_count": 6, "id": "e2d2be25", "metadata": {}, "outputs": [], "source": [ "dict2 = {\n", " \"2022-02-07\": -0.029608,\n", " \"2022-02-08\": -0.277982,\n", " \"2022-02-09\": 2.693057,\n", " \"2022-02-10\": -0.850817,\n", " \"2022-02-11\": 0.783868,\n", " \"2022-02-12\": -1.137835,\n", " \"2022-02-13\": -0.617132,\n", "}" ] }, { "cell_type": "code", "execution_count": 7, "id": "36514ba2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2022-02-07 -0.029608\n", "2022-02-08 -0.277982\n", "2022-02-09 2.693057\n", "2022-02-10 -0.850817\n", "2022-02-11 0.783868\n", "2022-02-12 -1.137835\n", "2022-02-13 -0.617132\n", "2022-01-31 -0.751442\n", "2022-02-01 0.816935\n", "2022-02-02 -0.272546\n", "2022-02-03 -0.268295\n", "2022-02-04 -0.296728\n", "2022-02-05 0.176255\n", "2022-02-06 -0.322612\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import ChainMap\n", "\n", "\n", "pd.Series(ChainMap(dict1, dict2))" ] }, { "cell_type": "markdown", "id": "faf5f65b", "metadata": {}, "source": [ "## DataFrame\n", "\n", "Listen von Python list können in ein pandas DataFrame geladen werden mit:" ] }, { "cell_type": "code", "execution_count": 8, "id": "db421b4d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
0-0.7514420.816935-0.272546-0.268295-0.2967280.176255-0.322612
1-0.029608-0.2779822.693057-0.8508170.783868-1.137835-0.617132
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6\n", "0 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 -0.322612\n", "1 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 -0.617132" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame([list1, list2])\n", "df" ] }, { "cell_type": "markdown", "id": "b935eaab", "metadata": {}, "source": [ "Ihr könnt auch eine Liste in einen DataFrame-Index überführen:" ] }, { "cell_type": "code", "execution_count": 9, "id": "641b64a4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
2022-01-31-0.7514420.816935-0.272546-0.268295-0.2967280.176255-0.322612
2022-02-01-0.029608-0.2779822.693057-0.8508170.783868-1.137835-0.617132
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "2022-01-31 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 \n", "2022-02-01 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 \n", "\n", " 6 \n", "2022-01-31 -0.322612 \n", "2022-02-01 -0.617132 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([list1, list2], index=[\"2022-01-31\", \"2022-02-01\"])" ] }, { "cell_type": "markdown", "id": "3279218a", "metadata": {}, "source": [ "Ein pandas DataFrame kann aus einem Dict mit Werten in Listen erstellt werden:" ] }, { "cell_type": "code", "execution_count": 10, "id": "45527e34", "metadata": {}, "outputs": [], "source": [ "data = {\n", " \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n", " \"Decimal\": [0, 1, 2, 3, 4, 5],\n", " \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n", " \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n", "}" ] }, { "cell_type": "code", "execution_count": 11, "id": "76476364", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeDecimalOctalKey
0U+00000001NUL
1U+00011002Ctrl-A
2U+00022003Ctrl-B
3U+00033004Ctrl-C
4U+00044004Ctrl-D
5U+00055005Ctrl-E
\n", "
" ], "text/plain": [ " Code Decimal Octal Key\n", "0 U+0000 0 001 NUL\n", "1 U+0001 1 002 Ctrl-A\n", "2 U+0002 2 003 Ctrl-B\n", "3 U+0003 3 004 Ctrl-C\n", "4 U+0004 4 004 Ctrl-D\n", "5 U+0005 5 005 Ctrl-E" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(data)" ] }, { "cell_type": "markdown", "id": "da3c83b2", "metadata": {}, "source": [ "Eine weitere gängige Form von Daten sind verschachtelte Dict von Dicts:" ] }, { "cell_type": "code", "execution_count": 12, "id": "56028e86", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
U+0006U+0007
Decimal67
Octal006007
KeyCtrl-FCtrl-G
\n", "
" ], "text/plain": [ " U+0006 U+0007\n", "Decimal 6 7\n", "Octal 006 007\n", "Key Ctrl-F Ctrl-G" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data2 = {\n", " \"U+0006\": {\"Decimal\": \"6\", \"Octal\": \"006\", \"Key\": \"Ctrl-F\"},\n", " \"U+0007\": {\"Decimal\": \"7\", \"Octal\": \"007\", \"Key\": \"Ctrl-G\"},\n", "}\n", "\n", "df2 = pd.DataFrame(data2)\n", "\n", "df2" ] }, { "cell_type": "markdown", "id": "5ab4b3a9", "metadata": {}, "source": [ "Dicts von Series werden in ähnlicher Weise behandelt:" ] }, { "cell_type": "code", "execution_count": 13, "id": "532b4f28", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
U+0006U+0007
KeyCtrl-FCtrl-G
\n", "
" ], "text/plain": [ " U+0006 U+0007\n", "Key Ctrl-F Ctrl-G" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data3 = {\"U+0006\": df2[\"U+0006\"][2:], \"U+0007\": df2[\"U+0007\"][2:]}\n", "\n", "pd.DataFrame(data3)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }