crowdsourcing meteorological data

By catdynamics on April 16, 2008.

Weather forecasting is a tricky business.
It is a lot better than it used to be, but most of the time, for most places, you can't put much faith in forecasts much beyond 3-5 days.
Which is a big improvement on how it used to be, but one could still hope for better.

So... I was chatting to a met grad student, as one does, and he told me that the current forecasts are running into fine grain limits of initial condition data - the weather stations are too sparse and not always well located for the initial data needed to do medium term forecasts, the 5-10 day forecasts. Locations tend to be optimised for transport and habitation, not forecasts.
Computing power is also a limitation, but one that is increasingly less of an issue (and also solvable by crowdsourcing of course).

So... how about crowdsourcing meteorological sensors.

A useful level of graininess might be trucks, or possibly cars.
Commercial trucks and increasingly also cars come with cell equipped GPS boxes.
It would be trivial to add temperature and relative humidity sensors, and get millions of mobile weather sensors sending back real time data.
Data packets are small, and a useful rate could be as rarely as few times per day.
Trucks could also have wind speed and direction sensors on the roof, which ought to be high enough. Modeling out groundspeed should be trivial.
Cloud sensors are also easy I gather.

This won't provide ideal coverage, the dead spots would need conventional robotic weather stations still, and calibration of the temperature sensors could be tricky, since there is significant systematic bias there. The agglomeration algorithm would also have to be clever enough to understand garages.

But, this would seem to be a relatively cheap and easy way to add a large number of consistent sensors, and it is scalable, you can start small, work on calibration and data analysis issues and then just pile on more as fast as you can - which can be faster than you can analyze the data, since historical calibration data is also valuable, not just the real time data.

Further, this need not rely on volunteers, or trip paranoia about surveillance - just doing it on the fleet of federal and state vehicles would provide interesting data, and non-governmental vehicles could join in on a voluntary basis.

The more I think about it, the more I like it.
A key issue is to not try to do a complete system in one swoop.
A full fed standard weather station is not going to be put in a truck mounted GPS box overnight, although some of the mobile military sensors for UAVs are getting close to being off-shelf. It is also not a matter of getting immediate faultless comprehensive coverage.
The way to do this is to start with mimimal data - temperature and humidity, for example - figuring on expanding and upgrading to other observables when there is occasion and resources to.
Also, any additional data is valuable, the input is scalable, and you don't have to start off trusting the data. Rather do modeling with known calibrated sensors, then add in the new mobile data and see if forecasting improves - calibrate out systematics, and tag dead and bad sensors - don't go out and replace them, just drop the data and tag it as bad - ok, archive it in case future calibration can recover good data.
Redundancy is good, you want multiple sensors colocated at different times so they can cross-calibrate and verify each other.
In the long run, you get broad distrbution trusted data with a historical record, and each iteration improves the forecasting and historical data set.

More like this

Launching Weather Balloons in 45-mile-per-hour Winds

Brookhaven Lab atmospheric scientist Ernie Lewis with a mini "weather" balloon aboard the Horizon This guest post was written by Ernie Lewis, an atmospheric scientist at Brookhaven Lab, who is leading a year-long climate study aboard two Horizon Lines cargo ships, the Spirit and Reliance. He…

OCO launch failure

NASA's Orbiting Carbon Observatory had a launch failure. Taurus launch out of Vandenberg last night, tried to stay up to see if I could see the launch - often possible from the area - but I crashed before scheduled launch time. Apparently payload failed to separate after stage burnout and the…

Is Road Train a reality?

Perhaps. The project is a collective effort funded by the European Commission, and led by British company Ricardo which develops engines, transmissions, and vehicles systems among other things. Chief among the consortium’s participants is Swedish auto manufacturer Volvo. Utilizing Volvo’s own…

NASA scientists seek to improve sea ice predictions

An excellent idea, you might well have thought. But that leads me to look at the annual Sea Ice Outlook (SIO) aka SIPN. Here are some pix: 2014: 2015: 2016: [2013: everyone underestimated. 2012 was massively low, and everyone overestimated. If you'd just predicted from extrapolating the OLS line…

Don't many cars already have temperature sensors. I don't know how accurate those are -- or how much more accuracy you'd need to use that information for forecasting vs just wanting to know whether you can leave your jacket in the car -- but if I were to get a car with weather tools onboard, I'd definitely volunteer my information to the NWS.

The main problem is the quality of the data, then secondly, its location: you want good 3-D data, not just surface effects.

As it is, when a national met office (I work with the Irish met. service; we run their operational forecasts on our computers) gets the obs. data, the first task is quality control: removing obs. that claim to be ships in the Sahara, etc... (human input error). When you get automatic data, calibration, etc. can poison the input data too.

Better to have few, but accurate data, than bad input data.
The forecast is based on a 3D grid (or spectral model, but thats equivalent). How do you get initial conditions for all that 3D data? its done by taking the observations and previous forecasts, and applying a "variational assimilation" (3D or 4D, including time; ), essentially running a forecast for yesterday, and correcting it with observations to minimise error. This has two implications:
(1) its computationally intensive (and not in a masssively parallel way that can be outsourced to the internet; its heavily comms. intensive), and (2) the initial conditions can be better without a misdata than with it.

In fact, its worth thinning out data, removing uninteresting data (such as nearby datapoints that give the same ground temp), unless its near a feature of interest (e.g. a storm). The question becomes, how do you know your data is reliable?

For 5-10 day out forecasts, depending on time of year and jet stream patterns, most of the needed data would have to come from the eastern Pacific and northeastern Asia. I don't think you're going to get many cars and trucks reporting from there anytime soon.

Also, as Alastair points out indirectly in his reference to 3-D data, observations from altitudes significantly above the surface are of greatest value. You can't fly weather balloons from cars and trucks.

I have no idea what weather data that goes into building the medium range models we get from aircraft flying over the Pacific. That could potentially be much more valuable.

Somewhat off topic, but related. I think it would be great if a meteorology blogger got added to the ScienceBlogs borg. You've certainly got a great grad program there at Penn State; know of any candidates?

A meteo blogger for sciblogs would be good.
People I know at PSU are mostly climate rather than weather.
It is also hard to know who makes a decent blogger just knowing them in an academic setting.
There are also some political complications due to PSUs relationship with accuweather

If you have a sparse network and are quality limited then the priority should be to get the existing nodes working as well as possible.

But, if you increase the data input by three or four orders of magnitude and have a redundant network with cross-calibration you can afford to include a lot of bad data.
You will need an algorithm to filter the data and correct for systematics, but it has to be automated and self-correcting.

3-D is a problem, since most stations are limited to the surface.
You're also not going to crowd the pacific, a major problem there is not just lack of traffic but also lack of communications.

Two separate questions: whether with more computing resources having orders of magnitude more data but clumped and heterogenous would help; and, whether micro forecasting can be improved - not whether a low is going to roll in off the ocean in four days, but whether the showers next tuesday will stay north of Bald Eagle ridge or not.

Ok, some details may help put the problem in perspective.

In terms of grid size, most 'limited area' models, in Europe, use grid sizes of 5-10 km, with 40-90 vertical levels. We are now moving to using 'mesoscale' models, with 2.5 km grids horizontally as CPU power increases. These models get their boundary conditions (weather approaching from the sides) from a 'global model', such as the ECMWF model on a 25km grid. The aim is that the regional models (such as HIRLAM and ALADIN in Europe, which I work with) give finer detail than the global model.

Finer models are used in research, but even moving to 2.5km is not getting the improvements you'd naively think. This is because the physics is 'subscale', ie smaller than the grid scale. showers are typically < 10km in extent. To cope, the models have 'parameterized' versions of convection, etc.

While the forecast quality increases year-on-year, it does so slowly; i don't think you'd get much support from 'in the field meteorologists' for 10^3 increases in data size and model. As it is, many organisations have issues assimilating all the data we currently have available. These are typically civil service organisations, not as well funded as you might like.

In terms of practical improvements, for example, my colleagues in Met Eireann are implementing better snow models and measurements from Greenland and Newfoundland; the snow cover there affects the weather in Ireland 48 hours later (think heat flux affecting convection and rain). This is considered more important than more surface measurements in Ireland.

NWP is the field that brought us the 'butterfly theory of chaos', after all. In practice, improvements in regional forecasting come from spotting and fixing weaknesses in the current models, with the research focuses being two-fold: "high impact weather" and "ensemble" forecasting.

"High impact" weather is storms, etc. Getting these right is most important, probably the key aim of the forecasters. "Ensemble" forecasting is a proposed solution to he chaos problem: as we have poor knowledge of initial conditions, we run a set of models based on variations of the assumed i.c.s', all effectively consistent with the observations, to see what range of variations we get in the forecasts. Hence instead of a single forecast, that may be wrong, we get several forecasts, one or two of which may show some HIW, such as a storm or 'clear air turbulence', which pilots would like to avoid. CAT can be very destructive and dangerous and pilots would appreciate knowing about it: even a forecast of 1% risk of catastrophic turbulence in an area would tell them to avoid a planned route.

In short, massive sensor networks would need to be in the right place at the right time to be useful, not necessarily on the surface in high population areas. While useful in research on how storms work, they are less so in regional weather forecasting, where local features can be obliterated by incoming fronts brushing them aside.

Ireland is about 70,000 km2, so with 10 km grids you need 700 stations, uniformly distributed, to get a single data point per grid, at the surface only.
If you go to 2.5 km grids, you need over 10,000 stations.
Otherwise most of your initial data is spline fits to sparsely sampled data.

The US has an area of about 9,000,000 km2, and I don't think they have 100,000 weather stations.

The computational capacity to do the models is increasing exponentially, with a short doubling time, still. This rapidly overtakes any power law increase in data or computing requirements.

Chaos is a two edged sword - as Poincare taught us long before Lorenz - it actually can allow better short term predictability (and leverage for control), if you have high precision initial data, OR if you can update the data for quality control. Phase space divergences are funny things, for one there is a convergence somewhere in phase space for every exponential divergence.

For Ireland, the grids are about 300x400. we run multiple models; a 'coarse' grid at 11 km from Newfoundland to Britain, and a 'fine' one at 5km over a smaller area. Thats 120,000 points horizontally, 60 levels.

We never have that many observations: there are 12 synoptic stations, with a lot more reports from ships and aircraft, etc. and a handful of buoys. More data comes from satellite: with ATOVS for example, we can retrieve a temperature gradient
(for bits a satellite can see: caveat clouds, I think) from reflectance and a radiative-transfer model.

The size of the model is based on having to watch 48-54 hours weather ahead; you need the model to stretch across the Atlantic, because features from there will be here inside 48 hours (remember jet streams). But the lateral boundary conditions (weather entering the grid) comes from a coarser model at eg. 25km and then interopolated. Before you go to much finer internal grids, you need to run your external grid finer.

There is a proposal to 'patch' the different regional models together and use the results from the neighbouring grids as boundary conditions, but I'm not sure of its status.

Beyond that, you also have to consider other corrections to your model before just inserting higher resolution inputs. In particular, surface effects: the NWP models have "climate" models that describe the topography and nature of plant growth, etc over terrain. They're fairly coarse: same sized grid, monthly timescale. You need to track moisture content of soil, and leaf growth in forests as you drop in scale. This is the cutting edge of a lot of the mesoscale modelling, and needs to be done before more (temp, humidity, wind speed, etc.) measurements become useful.

Ultimately, yes, the data will be useful. But there's a lot of science to do first before we get to that scale.

Well, if find gridded real time data is needed, or helpful, to forecasting,
then crowd sourcing is an potentially effective way to gather it, specifically if you put some basic robust sensor set on off-shelf-included-by-default thingies that also pack GPS and cells.
Something like a truck, or car, is a possible starting point, and the nice thing is that it is scalable and potentially useful incrementally.
The bad thing is that it is hard work to make use of the data, the data can become a firehose, and it would take decades to get a large pervasive network.

I suspect what will happen is that it will happen anyway and then the data gathered.
Lot of cars already include temperature sensors, I know I calibrate mine by eye when I drive past road signs showing the temperature (definite "hot parking lot effect" present) and as it happens my car has a rain sensor.
Not too much to link these to GPS data and cell chips calling in, as a lot of cars do for security and road assistance anyway.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

QRT

October 14, 2017

scienceblogs.com is shutting down moving back to ye olde blog: catdynamics out

A missing piece of the puzzle

January 22, 2017

I've been puzzling over the rationale for some recent events... Exxon has a large contract to develop oil and natural gas resources in the Russia. This can only go forward if sanctions on Russia are lifted, which seems likely to happen in the near future. But, there is too much oil and capacity to…

Glöggt er gests augað

January 22, 2017

The Aspen Art Museum is doing a series of interdisciplinary lectures, titled "Another Look" Another Look Lecture: Gabriel Orozco & Cosmology - so this is a thing. I did one of the lectures. The first one, I gather. It was quite an interesting experience, for me at least. Good fun, riffing on…

Jólasveinar og Jólakettir

December 23, 2016

The origins and history of the Yule Lads with bonus Christmas Cat... Even I did not know that peak Yule Lads was 82! Criminy!

Last minute stocking stuffers for nörds