I have been trying to find interesting datasets on Thailand for a while now. My search often ended at Nation Statistical Office of Thailand, which releases reports and surveys. The datasets were often embedded as tables or pictures in the documents.Later on, I came cross Government Open Data of Thailand website host datasets in various formats (xls, json, csv) ready to be used. I immediately began looking through the data bank for anything I could play with.
Started in 2013, the data bank now contains 89 datasets related to Thailand. It’s really neat that this is being put together, and I look forward to see more Thailand related (nicely formatted) datasets. There were a lot of interesting datasets, but I really wanted to work on the household income, spending and debt datasets because they were relatively clean, and could be made into an interesting visualization.
The original dataset were tables in three separate excel files containing the income, spending, and debt of each regions, and provinces from 1996 to 2013 with some years missing in between. There were definitely some errors within the dataset. For example, Samut Songkram Province had average total debt between 8,000 Baht and 10,000 Baht in the year 2006 to 2013. The numbers were 3 to 4 times smaller than the 2nd smallest numbers from other provinces. I decided not the do anything about the errors since I really wanted to present the original dataset as it was.
Since I will be sending the data to client side, I wanted to convert the data to JSON format. I didn’t know if there was a good way to convert a CSV to JSON, so I chose to design the structure of the JSON file myself, in a way that would be friendly to use with my code. This was a bit difficult since I only had a rough idea of how the end product will look like back then. I wrote a script in Python to do the work.
#Calculating the Ratios
The ratios were actually the last part to be added. Before starting the project, I initially wanted to include some sort of ratio based on the given data. In the first iteration, I decided not to do it because I was afraid that my own calculation of these ratios could be misleading to the audience. After receiving some feedback, I realized that the ratios could make the presentation much more interesting. The absolute numbers were already interesting on their own, but certain economic behaviors can be inferred by calculating these ratios, which even more fun to think about. Being potentially misleading, the ratios should be throughly explained.
Income-Spending Ratio (I-S) Ratio
The ratio is measure of how much income is saved per month. This is similar to savings rate in economics, but the ratio does not account for debt and other liabilties as part of monthly expenditure. The real savings rates are probably much lower than I-S Ratio. The ratio is calculated by dividing monthly income with monthly spending.
r = (MonthlyIncome) / (MonthlySpending)
Debt-Income Ratio (D-I) Ratio
This measures how much debt is owed relative to the income. This is similar to Debt-to-Income Ratio (DTI) where the monthly liabilities are divided by income. Note that in this dataset, only total debt is given. I assumed that the total debt is to be paid off with in one year, so I divided the total debt by 12 to try to come up with something similar. The final ratio is calculated by dividing the monthly debt with monthly income.
r = (TotalDebt/12) / (MonthlyIncome)
Spending-Debt Ratio (S-D) Ratio
This measures how spending is done relative to the owed debt. I am not sure if there is a ratio defined similar to this in economics. Same idea as Debt-Income Ratio, I divided monthly spending with monthly debt.
r = (MonthlySpending) / (TotalDebt/12)
To add the ratios, I simply modified the script I used in creating the first version of data.json from CSV file in the previous step.
#Mapping the provinces
Initially, I tried to look for the GeoJSON file for Thailand’s provinces, but I came up empty. I only found the ones with the layout of Thailand as a country, but not down to provinces level. After a lot of digging, I found the provinces layout in shape file format. I converted the shape file into GeoJSON, then into TopoJSON format, leaving out unused attributes. I made all the files available on here. The file size went from around 10MB to 25.1MB and then to 4.6MB. The size of 4.6MB is still very large because the original .shp file was very detailed. I used mapshapher to simplify the outline of the provinces. The final TopoJSON file size was only 94 KB, but still detailed enough for my use. Once I have TopoJSON file, it was easy to draw out the provinces via d3.
#Coloring the map
Similar to my last project on Uber’s surge pricing, I did a heat map of each provinces based on the selected year and data. When the selection of year and data type is changed, the map is recolored based on the corresponding datasets. Initially, I applied the quantize method to color the map, but because there is a lot of disparity in some of the data, it just didn’t look very good. To fix it up, I read up a little bit on map coloring, and applied what I learned. In the end, I went with the simple and straightforward quantile method. When a dataset is selected, its data values are sorted and divided into five equal quantiles. For each quantile, a color is assigned and the province is colored according to quantile that its value falls into. Adding the color legend was slightly tricky, but it turned out great.