(Editorial note: I was hit by a cold last week, and have to catch up on various things. I expect that I will have a publishing pause until next week. I was working on my PCA tutorial, but I want to take time on that.)
I have been looking at downloading data from official sources using Python, based on the SDMX data protocol. (I was using the pandasdmx package.) Most of the official sources now have a SDMX interface (although not all are configured in the Python package). The problem is that each provider has its own classification scheme, and a series can be defined by up to 10 meta-data fields. Mapping that to an existing database time series scheme can be a challenge. Furthermore, each provider implements slightly different query schemes.
(Another issue is that the downloads can be relatively slow. The web programmers who developed the SDMX protocol managed to find the most inefficient data scheme possible. It may be that the backend implementation is more efficient that meets the eye.)
The beauty of DB.nomics is that there is only a single front end to deal with. Furthermore, the web tools allow researchers without computing skills to browse the DB.nomics website to find the exact query needed to access data.
DB.nomics also has support for research replication, so that it is easy to find the exact data set used by other researchers. This may be increasingly important for anyone who deals with academic economic research.
As a disclaimer, I have only started looking at DB.nomics. The first point of concern I can see is the issue of capacity: how much data can a single user download? For a small research shop, I doubt that this would be a concern, but obviously you would want to save the data locally so that you can update required data once. (A typical workflow for economists is to download a time series for some form of publication, either internal or external. These data are put into a chart, and the chart may be re-run hundreds of times before final publication. You do not want to query for the time series in every single chart run.)
The next issue is the speed of updates. Based on an extremely limited survey, the data appeared to be reasonably up-to-date, although it could easily be a day or two behind official release. As long as you are aware of the data you are using, this is not too big a problem. In the worst case, you just need to patch in the latest value of the series before publication. The real problem is if series stop updating for months at random intervals, which forces users to continuously cross-check data every single time. (I saw this problem on other data providers that I prefer not to name.)
The FRED database of the St. Louis Fed is probably easier to use, at least for working with individual time series. If you need to download tables of data (e.g., national accounts data sets), DB.nomics may be a much easier database to work with (using an API).
Funnily enough, I have been working on the interface to DB.nomics for a client, and do not actually have an interface for my own database. I hope to rectify that sooner or later, and my charts will have "downloaded via DB.nomics" in the caption.
- Easy-to-use web interface that final users can use to track down data themselves for import.
- A single API for the supported data providers.
- Tools for data set replication.
- Data table support.
- Users will have to judge data download speed and capacity limits, as well as timeliness, based on their own needs.
Finally, I guess we should all give a big merci beaucoup to the French agencies involved.
(c) Brian Romanchuk 2018