Bond Economics: Loading OECD Data With Python

Wednesday, May 2, 2018

Loading OECD Data With Python

One of the projects I have been juggling is building a package to import data from the OECD. The OECD helpfully provides an API which allows for custom queries into their large data sets: https://data.oecd.org/api/ (for free!). Unless you really like XML or JSON (which I do not), you want to find a wrapper for the query and download protocols. As a Python developer, the solution that worked best for me was the pandaSDMX Python library: https://pandasdmx.readthedocs.io/en/latest/

The package is highly object oriented, which means that I probably should have read the documentation. However, I was able to grab the data I wanted with a bit of experimentation.

This was my initial stab of grabbing the LEI data and then dumping it into a CSV (based on code from a question on Stack Exchange). (The 'df' variable is a Pandas dataframe.)

oecd = pandasdmx.Request('OECD')
data_response = oecd.data(resource_id='MEI_CLI', key='all?startTime=2018')
df = data_response.write(data_response.data.series, parse_time=False)
df.to_csv('c:\\Temp\\test_lei.txt', sep='\t')

The OECD data provides a couple of challenges to work with.

The first issue is that the OECD does not provide some of the meta data navigation tools provided by the other public databases that pandaSDXM connects to. The data navigation calls done in the pandaSDMX documentation will not work on the OECD data (which is another reason I largely skipped over the documentation).

The next challenge with the OECD data is finding out where the data you want live. The leading indicators are found with the "Main Economic Indicators" (MEI) database, but that database is big, and the communication protocol used (SDMX-ML) is bloated. In order to keep the data transmission reasonable, you need to use a smaller subset of the database, which has its own database identifier (such as 'MEI_CLI'). Of course, trying to find that database identifier takes a bit of work on the web page: you need to find the database in question, and then extract the database code.

For example, the Composite Leading Indicators page is: https://www.oecd-ilibrary.org/economics/data/main-economic-indicators/composite-leading-indicators_data-00042-en

If you then go to "Data", you get sent to: http://stats.oecd.org/viewhtml.aspx?datasetcode=MEI_CLI&lang=en, where you can read off the 'MEI_CLI' code (or get the query for data by choosing the appropriate API export option). This is somewhat painful to do if you want data scattered across a variety of data sets.

I have not tested it yet, but the OECD API documentation shows an extremely useful option: you can query for data that has been changed since a particular date. For financial data that is not revised, one can just query for data after the last data points you have locally archived. However, this is not sufficient for economic data, as back history can be revised. According to the documentation, those revised data points will also be returned. This greatly reduces the burden for both sides for an automated update of data sets.

The major challenge of such work is not the raw query, rather marshaling the data so that it can be imported into a database. You need to map the series meta data to a whatever the retrieval mechanism used in your database (typically some internal ticker system). The reason why you want to use a single package for interfacing with all the data sources is that it easier to set up the mapping code, which will (hopefully!) work for all the supported data providers.

One reason I put this up was that finding this package took some time. Before I found pandaSDMX, I looked at a few other options, and all of them had defects. Since I did not want to waste my time seeing whether there were fixes for the problems I ran into, it may be that my decision to reject them was unfair. As a result, I will not name any names with regards to the other options.

(c) Brian Romanchuk 2018

20 comments:

UnknownJuly 20, 2018 at 5:27 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
KanhasoftSeptember 24, 2018 at 7:53 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AnonymousJanuary 12, 2019 at 1:39 PM
Hey Brian,

just a short comment and a question. First, thanks for this post, you saved my day. Second; I am using pandaSDMX for CLI data of specific countries but I am not sure hot to get the last known data... well I managed to get the data, I only have problems with extracting it into new dataframe.

Any suggestions?

Best regards, David
ReplyDelete
Replies
Uomo in ammolloJanuary 24, 2019 at 8:56 AM
Hi
Do you know if there is any way to make it work with standard pandas pandas_datareader library?
ReplyDelete
Replies
Richard H. BlackFebruary 13, 2019 at 12:39 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Seo ExpertMarch 23, 2019 at 2:07 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Seo ExpertMarch 23, 2019 at 2:07 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
marksonAugust 5, 2019 at 8:01 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
123456789September 13, 2019 at 4:34 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Jack sonSeptember 14, 2019 at 8:06 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
JennetOctober 3, 2019 at 7:02 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
marksonDecember 11, 2019 at 4:33 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AnonymousFebruary 18, 2020 at 8:44 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
saivenkatJuly 3, 2020 at 5:58 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Elon MuskSeptember 6, 2020 at 9:29 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
logan pualMarch 15, 2021 at 3:52 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
waqas bMarch 5, 2022 at 2:49 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
berryhenryAugust 13, 2022 at 8:22 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies

Add comment

Note: Posts are manually moderated, with a varying delay. Some disappear.

The comment section here is largely dead. My Substack or Twitter are better places to have a conversation.

Given that this is largely a backup way to reach me, I am going to reject posts that annoy me. Please post lengthy essays elsewhere.

Bond Economics

Pages

Recent Posts

Wednesday, May 2, 2018

Loading OECD Data With Python

20 comments:

Navigation

Pages

Recent Posts

Wednesday, May 2, 2018

Loading OECD Data With Python

20 comments:

Subscribe To

Navigation