{
"cells": [
{
"cell_type": "markdown",
"id": "b6d8e288",
"metadata": {},
"source": [
"# Python-Datenstrukturen in pandas überführen\n",
"\n",
"Python-Datenstrukuren wie Listen und Arrays lassen sich in pandas [Series](#Series) oder [DataFrames](#DataFrame) überführen."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "362d58a7",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "11ba3983",
"metadata": {},
"source": [
"## Series\n",
"\n",
"Python [Lists](https://docs.python.org/3/tutorial/introduction.html#lists) können einfach in pandas Series umgewandelt werden:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d86c6823",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 -0.751442\n",
"1 0.816935\n",
"2 -0.272546\n",
"3 -0.268295\n",
"4 -0.296728\n",
"5 0.176255\n",
"6 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list1 = [\n",
" -0.751442,\n",
" 0.816935,\n",
" -0.272546,\n",
" -0.268295,\n",
" -0.296728,\n",
" 0.176255,\n",
" -0.322612,\n",
"]\n",
"\n",
"pd.Series(list1)"
]
},
{
"cell_type": "markdown",
"id": "29236df2",
"metadata": {},
"source": [
"Auch mehrere Lists lassen sich einfach in eine pandas Series umwandeln:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "50689bfc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 -0.751442\n",
"1 0.816935\n",
"2 -0.272546\n",
"3 -0.268295\n",
"4 -0.296728\n",
"5 0.176255\n",
"6 -0.322612\n",
"7 -0.029608\n",
"8 -0.277982\n",
"9 2.693057\n",
"10 -0.850817\n",
"11 0.783868\n",
"12 -1.137835\n",
"13 -0.617132\n",
"dtype: float64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list2 = [\n",
" -0.029608,\n",
" -0.277982,\n",
" 2.693057,\n",
" -0.850817,\n",
" 0.783868,\n",
" -1.137835,\n",
" -0.617132,\n",
"]\n",
"\n",
"pd.Series(list1 + list2)"
]
},
{
"cell_type": "markdown",
"id": "4103ed20",
"metadata": {},
"source": [
"Es kann auch eine Liste als Index übergeben werden:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6511d214",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-01-31 -0.751442\n",
"2022-02-01 0.816935\n",
"2022-02-02 -0.272546\n",
"2022-02-03 -0.268295\n",
"2022-02-04 -0.296728\n",
"2022-02-05 0.176255\n",
"2022-02-06 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"date = [\n",
" \"2022-01-31\",\n",
" \"2022-02-01\",\n",
" \"2022-02-02\",\n",
" \"2022-02-03\",\n",
" \"2022-02-04\",\n",
" \"2022-02-05\",\n",
" \"2022-02-06\",\n",
"]\n",
"\n",
"pd.Series(list1, index=date)"
]
},
{
"cell_type": "markdown",
"id": "6b2764b3",
"metadata": {},
"source": [
"Mit Python [Dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) könnt ihr nicht nur Werte sondern auch die zugehörigen Schlüssel an eine pandas Series übergeben:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "cd74ecdd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-01-31 -0.751442\n",
"2022-02-01 0.816935\n",
"2022-02-02 -0.272546\n",
"2022-02-03 -0.268295\n",
"2022-02-04 -0.296728\n",
"2022-02-05 0.176255\n",
"2022-02-06 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict1 = {\n",
" \"2022-01-31\": -0.751442,\n",
" \"2022-02-01\": 0.816935,\n",
" \"2022-02-02\": -0.272546,\n",
" \"2022-02-03\": -0.268295,\n",
" \"2022-02-04\": -0.296728,\n",
" \"2022-02-05\": 0.176255,\n",
" \"2022-02-06\": -0.322612,\n",
"}\n",
"\n",
"pd.Series(dict1)"
]
},
{
"cell_type": "markdown",
"id": "8fcb9bf5",
"metadata": {},
"source": [
"Wenn ihr ein `dict` übergebt, berücksichtigt der Index in der resultierenden pandas Series die Reihenfolge der Schlüssel im Dict."
]
},
{
"cell_type": "markdown",
"id": "357db634",
"metadata": {},
"source": [
"Mit [collections.ChainMap](https://docs.python.org/3/library/collections.html#collections.ChainMap) könnt ihr auch mehrere Dicts in eine pandas.Series verwandeln.\n",
"\n",
"Zunächst definieren wir hierfür ein zweites Dict:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e2d2be25",
"metadata": {},
"outputs": [],
"source": [
"dict2 = {\n",
" \"2022-02-07\": -0.029608,\n",
" \"2022-02-08\": -0.277982,\n",
" \"2022-02-09\": 2.693057,\n",
" \"2022-02-10\": -0.850817,\n",
" \"2022-02-11\": 0.783868,\n",
" \"2022-02-12\": -1.137835,\n",
" \"2022-02-13\": -0.617132,\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "36514ba2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-07 -0.029608\n",
"2022-02-08 -0.277982\n",
"2022-02-09 2.693057\n",
"2022-02-10 -0.850817\n",
"2022-02-11 0.783868\n",
"2022-02-12 -1.137835\n",
"2022-02-13 -0.617132\n",
"2022-01-31 -0.751442\n",
"2022-02-01 0.816935\n",
"2022-02-02 -0.272546\n",
"2022-02-03 -0.268295\n",
"2022-02-04 -0.296728\n",
"2022-02-05 0.176255\n",
"2022-02-06 -0.322612\n",
"dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import ChainMap\n",
"\n",
"\n",
"pd.Series(ChainMap(dict1, dict2))"
]
},
{
"cell_type": "markdown",
"id": "faf5f65b",
"metadata": {},
"source": [
"## DataFrame\n",
"\n",
"Listen von Python list können in ein pandas DataFrame geladen werden mit:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "db421b4d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" -0.751442 | \n",
" 0.816935 | \n",
" -0.272546 | \n",
" -0.268295 | \n",
" -0.296728 | \n",
" 0.176255 | \n",
" -0.322612 | \n",
"
\n",
" \n",
" | 1 | \n",
" -0.029608 | \n",
" -0.277982 | \n",
" 2.693057 | \n",
" -0.850817 | \n",
" 0.783868 | \n",
" -1.137835 | \n",
" -0.617132 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6\n",
"0 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 -0.322612\n",
"1 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 -0.617132"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame([list1, list2])\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "b935eaab",
"metadata": {},
"source": [
"Ihr könnt auch eine Liste in einen DataFrame-Index überführen:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "641b64a4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 2022-01-31 | \n",
" -0.751442 | \n",
" 0.816935 | \n",
" -0.272546 | \n",
" -0.268295 | \n",
" -0.296728 | \n",
" 0.176255 | \n",
" -0.322612 | \n",
"
\n",
" \n",
" | 2022-02-01 | \n",
" -0.029608 | \n",
" -0.277982 | \n",
" 2.693057 | \n",
" -0.850817 | \n",
" 0.783868 | \n",
" -1.137835 | \n",
" -0.617132 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 \\\n",
"2022-01-31 -0.751442 0.816935 -0.272546 -0.268295 -0.296728 0.176255 \n",
"2022-02-01 -0.029608 -0.277982 2.693057 -0.850817 0.783868 -1.137835 \n",
"\n",
" 6 \n",
"2022-01-31 -0.322612 \n",
"2022-02-01 -0.617132 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame([list1, list2], index=[\"2022-01-31\", \"2022-02-01\"])"
]
},
{
"cell_type": "markdown",
"id": "3279218a",
"metadata": {},
"source": [
"Ein pandas DataFrame kann aus einem Dict mit Werten in Listen erstellt werden:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "45527e34",
"metadata": {},
"outputs": [],
"source": [
"data = {\n",
" \"Code\": [\"U+0000\", \"U+0001\", \"U+0002\", \"U+0003\", \"U+0004\", \"U+0005\"],\n",
" \"Decimal\": [0, 1, 2, 3, 4, 5],\n",
" \"Octal\": [\"001\", \"002\", \"003\", \"004\", \"004\", \"005\"],\n",
" \"Key\": [\"NUL\", \"Ctrl-A\", \"Ctrl-B\", \"Ctrl-C\", \"Ctrl-D\", \"Ctrl-E\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "76476364",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Code | \n",
" Decimal | \n",
" Octal | \n",
" Key | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" U+0000 | \n",
" 0 | \n",
" 001 | \n",
" NUL | \n",
"
\n",
" \n",
" | 1 | \n",
" U+0001 | \n",
" 1 | \n",
" 002 | \n",
" Ctrl-A | \n",
"
\n",
" \n",
" | 2 | \n",
" U+0002 | \n",
" 2 | \n",
" 003 | \n",
" Ctrl-B | \n",
"
\n",
" \n",
" | 3 | \n",
" U+0003 | \n",
" 3 | \n",
" 004 | \n",
" Ctrl-C | \n",
"
\n",
" \n",
" | 4 | \n",
" U+0004 | \n",
" 4 | \n",
" 004 | \n",
" Ctrl-D | \n",
"
\n",
" \n",
" | 5 | \n",
" U+0005 | \n",
" 5 | \n",
" 005 | \n",
" Ctrl-E | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Code Decimal Octal Key\n",
"0 U+0000 0 001 NUL\n",
"1 U+0001 1 002 Ctrl-A\n",
"2 U+0002 2 003 Ctrl-B\n",
"3 U+0003 3 004 Ctrl-C\n",
"4 U+0004 4 004 Ctrl-D\n",
"5 U+0005 5 005 Ctrl-E"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data)"
]
},
{
"cell_type": "markdown",
"id": "da3c83b2",
"metadata": {},
"source": [
"Eine weitere gängige Form von Daten sind verschachtelte Dict von Dicts:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "56028e86",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" U+0006 | \n",
" U+0007 | \n",
"
\n",
" \n",
" \n",
" \n",
" | Decimal | \n",
" 6 | \n",
" 7 | \n",
"
\n",
" \n",
" | Octal | \n",
" 006 | \n",
" 007 | \n",
"
\n",
" \n",
" | Key | \n",
" Ctrl-F | \n",
" Ctrl-G | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" U+0006 U+0007\n",
"Decimal 6 7\n",
"Octal 006 007\n",
"Key Ctrl-F Ctrl-G"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2 = {\n",
" \"U+0006\": {\"Decimal\": \"6\", \"Octal\": \"006\", \"Key\": \"Ctrl-F\"},\n",
" \"U+0007\": {\"Decimal\": \"7\", \"Octal\": \"007\", \"Key\": \"Ctrl-G\"},\n",
"}\n",
"\n",
"df2 = pd.DataFrame(data2)\n",
"\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "5ab4b3a9",
"metadata": {},
"source": [
"Dicts von Series werden in ähnlicher Weise behandelt:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "532b4f28",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" U+0006 | \n",
" U+0007 | \n",
"
\n",
" \n",
" \n",
" \n",
" | Key | \n",
" Ctrl-F | \n",
" Ctrl-G | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" U+0006 U+0007\n",
"Key Ctrl-F Ctrl-G"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data3 = {\"U+0006\": df2[\"U+0006\"][2:], \"U+0007\": df2[\"U+0007\"][2:]}\n",
"\n",
"pd.DataFrame(data3)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.13 Kernel",
"language": "python",
"name": "python313"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}