{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# httpx-Installation und Beispielanwendung" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "Für die Kommunikation mit solchen REST-APIs ist die [httpx](https://www.python-httpx.org)-Bibliothek hilfreich. Mit [Spack](https://www.python4data.science/de/latest/productive/envs/spack/index.html) könnt ihr httpx in eurem Kernel bereitstellen:\n", "\n", "``` bash\n", "$ spack env activate python-311\n", "$ spack install py-httpx\n", "```\n", "\n", "Alternativ könnt ihr httpx auch mit anderen Paketmanagern installieren, z.B.\n", "\n", "``` bash\n", "$ uv add httpx\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Beispiel OSM Nominatim-API\n", "\n", "In diesem Beispiel holen wir unsere Daten von der [OpenStreetMap Nominatim-API](https://nominatim.org/release-docs/develop/api/Overview/). Diese ist erreichbar über die URL `https://nominatim.openstreetmap.org/search?`. Um z.B. Informationen über das Berlin Congress Center in Berlin im JSON-Format zu erhalten, sollte die URL `https://nominatim.openstreetmap.org/search.php?q=Alexanderplatz+Berlin&format=json` angegeben werden, und wenn ihr euch den entsprechenden Kartenausschnitt anzeigen\n", "lassen wollt, so müsst ihr einfach nur `&format=json` weglassen\n", "\n", "Anschließend definieren wir die Such-URL und die Parameter. Nominatim erwartet mindestens die folgenden beiden Parameter\n", "\n", "| Schlüssel | Werte |\n", "| --------- | ------------------------------------ |\n", "| `q` | Adressabfrage, die folgende Spezifikationen erlaubt: `street`, `city`, `county`, `state`, `country` und `postalcode`. |\n", "| `format` | Format, in dem die Daten zurückgegeben werden. Möglich Werte sind `html`, `xml`, `json`, `jsonv2`, `geojson` und `geocodejson`. |\n", "\n", "Die Abfrage kann dann gestellt werden mit:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import httpx\n", "\n", "\n", "search_url = \"https://nominatim.openstreetmap.org/search?\"\n", "params = {\n", " \"q\": \"Alexanderplatz, Berlin\",\n", " \"format\": \"json\",\n", "}\n", "r = httpx.get(search_url, params=params)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r.status_code" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[{'place_id': 159000335,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'way',\n", " 'osm_id': 783052052,\n", " 'lat': '52.5219814',\n", " 'lon': '13.413635717448294',\n", " 'class': 'place',\n", " 'type': 'square',\n", " 'place_rank': 25,\n", " 'importance': 0.47149825263735834,\n", " 'addresstype': 'square',\n", " 'name': 'Alexanderplatz',\n", " 'display_name': 'Alexanderplatz, Mitte, Berlin, 10178, Deutschland',\n", " 'boundingbox': ['52.5201457', '52.5238113', '13.4103097', '13.4160801']},\n", " {'place_id': 159254539,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'node',\n", " 'osm_id': 3908141014,\n", " 'lat': '52.5215661',\n", " 'lon': '13.4112804',\n", " 'class': 'railway',\n", " 'type': 'station',\n", " 'place_rank': 30,\n", " 'importance': 0.43609907778808027,\n", " 'addresstype': 'railway',\n", " 'name': 'Alexanderplatz',\n", " 'display_name': 'Alexanderplatz, Dircksenstraße, Mitte, Berlin, 10179, Deutschland',\n", " 'boundingbox': ['52.5165661', '52.5265661', '13.4062804', '13.4162804']},\n", " {'place_id': 159367604,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'way',\n", " 'osm_id': 346206374,\n", " 'lat': '52.5216214',\n", " 'lon': '13.4131913',\n", " 'class': 'highway',\n", " 'type': 'pedestrian',\n", " 'place_rank': 26,\n", " 'importance': 0.10000999999999993,\n", " 'addresstype': 'road',\n", " 'name': 'Alexanderplatz',\n", " 'display_name': 'Alexanderplatz, Mitte, Berlin, 10178, Deutschland',\n", " 'boundingbox': ['52.5216214', '52.5216661', '13.4131913', '13.4131914']},\n", " {'place_id': 159038218,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'way',\n", " 'osm_id': 301241483,\n", " 'lat': '52.5227066',\n", " 'lon': '13.415336',\n", " 'class': 'highway',\n", " 'type': 'primary',\n", " 'place_rank': 26,\n", " 'importance': 0.10000999999999993,\n", " 'addresstype': 'road',\n", " 'name': 'Alexanderstraße',\n", " 'display_name': 'Alexanderstraße, Mitte, Berlin, 10178, Deutschland',\n", " 'boundingbox': ['52.5226699', '52.5227698', '13.4152008', '13.4154146']}]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Es werden drei verschiedene Orte gefunden, der Platz, eine Bushaltestelle und ein Hotel. Um nun weiter filtern zu können, können wir uns nur den bedeutendsten Ort anzeigen lassen:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'place_id': 159000335,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'way',\n", " 'osm_id': 783052052,\n", " 'lat': '52.5219814',\n", " 'lon': '13.413635717448294',\n", " 'class': 'place',\n", " 'type': 'square',\n", " 'place_rank': 25,\n", " 'importance': 0.47149825263735834,\n", " 'addresstype': 'square',\n", " 'name': 'Alexanderplatz',\n", " 'display_name': 'Alexanderplatz, Mitte, Berlin, 10178, Deutschland',\n", " 'boundingbox': ['52.5201457', '52.5238113', '13.4103097', '13.4160801']}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = {\"q\": \"Alexanderplatz, Berlin\", \"format\": \"json\", \"limit\": \"1\"}\n", "r = httpx.get(search_url, params=params)\n", "\n", "r.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Code\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nachdem wir nun wissen, dass der Code funktioniert, wollen wir alles in eine saubere und flexible Funktion umwandeln. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Um sicherzustellen, dass die Interaktion erfolgreich war, verwenden wir die Methode `raise_for_status` von `httpx`, die eine Exception auslöst, wenn der HTTP-Statuscode nicht `200 OK` ist:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r.raise_for_status()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Da wir die Lastgrenzen der Nominatim-API nicht überschreiten möchten, werden wir unsere Anforderungen mit der Funktion `time.sleep` verzögern:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'place_id': 159000335,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'way',\n", " 'osm_id': 783052052,\n", " 'lat': '52.5219814',\n", " 'lon': '13.413635717448294',\n", " 'class': 'place',\n", " 'type': 'square',\n", " 'place_rank': 25,\n", " 'importance': 0.47149825263735834,\n", " 'addresstype': 'square',\n", " 'name': 'Alexanderplatz',\n", " 'display_name': 'Alexanderplatz, Mitte, Berlin, 10178, Deutschland',\n", " 'boundingbox': ['52.5201457', '52.5238113', '13.4103097', '13.4160801']}]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from time import sleep\n", "\n", "\n", "sleep(1)\n", "r.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Als nächstes deklarieren wir die Funktion selbst. Als Argumente benötigen wir die Adresse, das Format, das Limit der zurückzugebenden Objekte mit dem Standardwert `1` und weitere `kwargs` (**k**ey**w**ord **arg**ument**s**), die als Parameter übergeben werden:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def nominatim_search(address, fmt=\"json\", limit=1, **kwargs):\n", " \"\"\"Thin wrapper around the Nominatim search API.\n", " For the list of parameters see\n", " https://nominatim.org/release-docs/develop/api/Search/#parameters\n", " \"\"\"\n", " search_url = \"https://nominatim.openstreetmap.org/search?\"\n", " params = {\"q\": address, \"format\": fmt, \"limit\": limit, **kwargs}\n", " r = httpx.get(search_url, params=params)\n", " # Raise an exception if the status is unsuccessful\n", " r.raise_for_status()\n", "\n", " sleep(1)\n", " return r.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nun können wir die Funktion ausprobieren, z.B. mit" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'place_id': 159000335,\n", " 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',\n", " 'osm_type': 'way',\n", " 'osm_id': 783052052,\n", " 'lat': '52.5219814',\n", " 'lon': '13.413635717448294',\n", " 'class': 'place',\n", " 'type': 'square',\n", " 'place_rank': 25,\n", " 'importance': 0.47149825263735834,\n", " 'addresstype': 'square',\n", " 'name': 'Alexanderplatz',\n", " 'display_name': 'Alexanderplatz, Mitte, Berlin, 10178, Deutschland',\n", " 'boundingbox': ['52.5201457', '52.5238113', '13.4103097', '13.4160801']}]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nominatim_search(\"Alexanderplatz, Berlin\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Caching\n", "\n", "Falls innerhalb einer Session immer wieder dieselben Abfragen gestellt werden sollen,ist es sinnvoll, diese Daten nur einmal abzurufen und wiederzuverwenden. In Python können wir `lru_cache` aus der `functools`-Standardbibliothek von Python verwenden. `lru_cache` speichert die N letzten Anfragen (**L**east **R**ecent **U**sed) und sobald das Limit überschritten wird, werden die ältesten Werte verworfen. Um dies für die Methode `nominatim_search` zu verwenden, müsst ihr lediglich einen Import und einen *Decorator* defnieren:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from functools import lru_cache\n", "\n", "\n", "@lru_cache(maxsize=1000)\n", "def nominatim_search(address, fmt=\"json\", limit=1, **kwargs):\n", " \"\"\"…\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`lru_cache` speichert die Ergebnisse jedoch nur während einer Session. Wenn ein Skript wegen einem Timeout oder einer Exception beendet wird, sind die Ergebnisse verloren. Sollen die Daten dauerhafter gespeichert werden, können Tools wie [joblib](https://joblib.readthedocs.io/en/stable/) oder [python-diskcache](https://grantjenks.com/docs/diskcache/) verwendet werden." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.13 Kernel", "language": "python", "name": "python313" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.0" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }