Commit 2ad210b3 authored by Cecconi Baptiste's avatar Cecconi Baptiste
Browse files

Add new file

parent 9fa2aacc
Vocabulary: http://www.ivoa.net/rdf/UCDlist/
Author: Markus Demleitner
Date: 2021-06-24
Nb of terms discussed together : 1
New Term: stat.histogram
Action: Addition
Label: A list of counts or ratios in bins
Prefix: P
Description:
Rationale:
Use Case: A column contains not a single value but a
distribution-like entity. For an example, run a query like
```
select
gavo_histogram(phot_g_mean_mag, 5, 15, 10) as dist,
round(100/parallax) as bin
from tgas.main
where parallax>5
group by bin
```
on the TAP service http://dc.g-vo.org/tap and inspect the dist column.
Proposed Solution:
a primary-only atom stat.histogram would work for this. In the example in the
use case, the service could assign a UCD stat.distribution;pos.parallax to the
dist column. A footnote like "This also includes simple nonparametric
distributions" or similar would be appreciated.
Discussion:
The column dist from the example certainly cannot be annotated as pos.parallax,
and while there is stat.likelihood, there is no indication that this is
intended to cover collection of likelihoods. I argue the UCD system should
allow a plausible annotation of the dist column. If so, it appears we need some
new term.
This new term should, I think, cover simple binned aggregations as well as
nonparametric distributions (in the sense of \sum_{x\in\Omega} P(x)=1) in order
to have sufficient generality. Hence, I'd avoid stat.distribution, which would
imply normalisation.
Also, stat.distribution (or something like that) might, some day, be useful to
annotate a different thing that this would not cover: columns containing
distributions in the sense of some representation of "a gaussian with mu=3
and sigma=0.3" or "a poisson distribution with lambda=0.2".
Answer/discussion:
The discussion started on the semantics mailing list . see the starting point
for this discussion here.
The proposed term is considering an array element instead of a single value,
usually considered for the quantity tagged by a UCD.
Note that the semantic tag does not describe any property of the array except
its statistical definition of histogram.
Decision:
Examine more use cases where this UCD could be combined to check the P or Q
status before approving addition
Further discussion:
Being able to tell that a column contains a histogram could be useful.
However, the use of the UCD only is not sufficient to describe that column.
We suggest that the histogram column is connected to a <GROUP ... /> through
a ref attribute. The GROUP should define the binsize, the number of bins and
the lower / upper bounds of the histogram, with units. Example:
```
[...]
<GROUP ID='histogram'>
<PARAM name='lower_bound' value=5 unit='' ucd='phot.mag;em.opt;stat;mean'/>
<PARAM name='upper_bound' value=15 unit='' ucd='phot.mag;em.opt;stat;mean'/>
<PARAM name='bin_count' value=10 unit='' ucd='meta.number'/>
<PARAM name='bin_size' value=1 unit='' ucd=''/>
</GROUP>
[...]
<FIELD ID='dist' datatype='int', ucd='stat.histogram;phot.mag;em.opt;stat;mean' ref='histogram' arraysize='*'/>
[...]
```
The result of the proposed query can't be used out of context, since the
histogram properties are not described. This is beyond the scope of Semantics/UCD,
but this shows that we have to be cautious and may be propose a Note on how to
deal with histograms in VOTable columns.
Further discussion:
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment