Rest bilden

derhenry · 30 November 2016

Hallo zusammen,

anbei gleich mal ein Query und das Ergebnis dazu:

Code:

SELECT Ausprägung, count (*) Häufigkeit
  FROM tabelle
GROUP BY Ausprägung
ORDER BY Häufigkeit DESC

Ausprägung Häufigkeit
100203026 2403862
100203033 1959882
100203055 1952693
100203029 1005549
100203056 650497
100203020 276646
100203035 157322
100203049 154300
100203009 101112
100203014 61487
100203016 55849
100203008 55113
100203013 46300
100203051 21555
100203006 13910
100203039 13811
100203022 13677
100203052 6568
100203015 6507
100203004 6215
100203007 4427
100203042 3287
100203019 2994
100203054 2932
100203017 1652
100203041 1114
100203010 918
100203011 587
100203030 325
100203040 193
100203044 157
100203032 114
100203025 55
100203046 44
100203037 29
100203048 26
100203005 25
100203001 8
100203018 2

Ich möchte nun alle Ausprägungen, deren Anteil an der ersten (größten) Ausprägung < 1% ist in einen "Rest"-Topf packen. Es soll also so ausschauen:

100203026 2403862
100203033 1959882
100203055 1952693
100203029 1005549
100203056 650497
100203020 276646
100203035 157322
100203049 154300
100203009 101112
100203014 61487
100203016 55849
100203008 55113
100203013 46300
Rest 101132

Wie bekomme ich das hin?
Danke und Grüße

akretschmer · 30 November 2016

Vie Window-Funktionen:

Code:

test=*# select * from derhenry ;
  a  |  h   
-----------+---------
 100203026 | 2403862
 100203033 | 1959882
 100203055 | 1952693
 100203029 | 1005549
 100203056 |  650497
 100203020 |  276646
 100203035 |  157322
 100203049 |  154300
 100203009 |  101112
 100203014 |  61487
 100203016 |  55849
 100203008 |  55113
 100203013 |  46300
 100203051 |  21555
 100203006 |  13910
 100203039 |  13811
 100203022 |  13677
 100203052 |  6568
 100203015 |  6507
 100203004 |  6215
 100203007 |  4427
 100203042 |  3287
 100203019 |  2994
 100203054 |  2932
 100203017 |  1652
 100203041 |  1114
 100203010 |  918
 100203011 |  587
 100203030 |  325
 100203040 |  193
 100203044 |  157
 100203032 |  114
 100203025 |  55
 100203046 |  44
 100203037 |  29
 100203048 |  26
 100203005 |  25
 100203001 |  8
 100203018 |  2
(39 Zeilen)

test=*# select *, 100 * sum(h) over(rows between unbounded preceding and current row) / (sum(h) over (rows between unbounded preceding and unbounded following))::numeric as anteil from derhenry ;  a  |  h  |  anteil   
-----------+---------+----------------------
 100203026 | 2403862 |  26.7638667946893165
 100203033 | 1959882 |  48.5845955974697119
 100203055 | 1952693 |  70.3252842655056746
 100203029 | 1005549 |  81.5207603334051828
 100203056 |  650497 |  88.7631956555430660
 100203020 |  276646 |  91.8432878959810032
 100203035 |  157322 |  93.5948630911769474
 100203049 |  154300 |  95.3127922594988234
 100203009 |  101112 |  96.4385424478809461
 100203014 |  61487 |  97.1231199642296641
 100203016 |  55849 |  97.7449257070787143
 100203008 |  55113 |  98.3585370502655164
 100203013 |  46300 |  98.8740271377140119
 100203051 |  21555 |  99.1140139376049907
 100203006 |  13910 |  99.2688836377434048
 100203039 |  13811 |  99.4226511020576850
 100203022 |  13677 |  99.5749266512160667
 100203052 |  6568 |  99.6480527612454775
 100203015 |  6507 |  99.7204997158680987
 100203004 |  6215 |  99.7896956314942844
 100203007 |  4427 |  99.8389845001148997
 100203042 |  3287 |  99.8755809562151849
 100203019 |  2994 |  99.9089152396238414
 100203054 |  2932 |  99.9415592339305151
 100203017 |  1652 |  99.9599520983898005
 100203041 |  1114 |  99.9723550348351055
 100203010 |  918 |  99.9825757670225293
 100203011 |  587 |  99.9891112461009799
 100203030 |  325 |  99.9927296970387934
 100203040 |  193 |  99.9948785002110949
 100203044 |  157 |  99.9966264903564386
 100203032 |  114 |  99.9978957316084716
 100203025 |  55 |  99.9985080848441015
 100203046 |  44 |  99.9989979674326055
 100203037 |  29 |  99.9993208445932104
 100203048 |  26 |  99.9996103206682355
 100203005 |  25 |  99.9998886630480673
 100203001 |  8 |  99.9999777326096135
 100203018 |  2 | 100.0000000000000000
(39 Zeilen)

test=*# select * from (select *, 100 * sum(h) over(rows between unbounded preceding and current row) / (sum(h) over (rows between unbounded preceding and unbounded following))::numeric as anteil from derhenry ) foo where anteil < 99;
  a  |  h  |  anteil   
-----------+---------+---------------------
 100203026 | 2403862 | 26.7638667946893165
 100203033 | 1959882 | 48.5845955974697119
 100203055 | 1952693 | 70.3252842655056746
 100203029 | 1005549 | 81.5207603334051828
 100203056 |  650497 | 88.7631956555430660
 100203020 |  276646 | 91.8432878959810032
 100203035 |  157322 | 93.5948630911769474
 100203049 |  154300 | 95.3127922594988234
 100203009 |  101112 | 96.4385424478809461
 100203014 |  61487 | 97.1231199642296641
 100203016 |  55849 | 97.7449257070787143
 100203008 |  55113 | 98.3585370502655164
 100203013 |  46300 | 98.8740271377140119
(13 Zeilen)

test=*#

derhenry · 30 November 2016

Ah, danke! Aber wo ist die Zeile "Rest"?

akretschmer · 30 November 2016

Code:

test=*# with foo as (select * from (select *, 100 * sum(h) over(rows between unbounded preceding and current row) / (sum(h) over (rows between unbounded preceding and unbounded following))::numeric as anteil from derhenry ) foo ) select * from foo where anteil < 99 union all select null, null, sum(h) from foo where anteil >= 99;
  a  |  h  |  anteil   
-----------+---------+---------------------
 100203026 | 2403862 | 26.7638667946893165
 100203033 | 1959882 | 48.5845955974697119
 100203055 | 1952693 | 70.3252842655056746
 100203029 | 1005549 | 81.5207603334051828
 100203056 |  650497 | 88.7631956555430660
 100203020 |  276646 | 91.8432878959810032
 100203035 |  157322 | 93.5948630911769474
 100203049 |  154300 | 95.3127922594988234
 100203009 |  101112 | 96.4385424478809461
 100203014 |  61487 | 97.1231199642296641
 100203016 |  55849 | 97.7449257070787143
 100203008 |  55113 | 98.3585370502655164
 100203013 |  46300 | 98.8740271377140119
  |  |  101132
(14 Zeilen)

Als eine Lösung. Wenn da wirklich Rest dastehen soll:

Code:

test=*# with foo as (select * from (select *, 100 * sum(h) over(rows between unbounded preceding and current row) / (sum(h) over (rows between unbounded preceding and unbounded following))::numeric as anteil from derhenry ) foo ) select a::text, h, anteil from foo where anteil < 99 union all select 'Rest', null, sum(h) from foo where anteil >= 99;
  a  |  h  |  anteil   
-----------+---------+---------------------
 100203026 | 2403862 | 26.7638667946893165
 100203033 | 1959882 | 48.5845955974697119
 100203055 | 1952693 | 70.3252842655056746
 100203029 | 1005549 | 81.5207603334051828
 100203056 |  650497 | 88.7631956555430660
 100203020 |  276646 | 91.8432878959810032
 100203035 |  157322 | 93.5948630911769474
 100203049 |  154300 | 95.3127922594988234
 100203009 |  101112 | 96.4385424478809461
 100203014 |  61487 | 97.1231199642296641
 100203016 |  55849 | 97.7449257070787143
 100203008 |  55113 | 98.3585370502655164
 100203013 |  46300 | 98.8740271377140119
 Rest  |  |  101132
(14 Zeilen)

Easy, oder? ;-)

derhenry · 1 Dezember 2016

Hallo! Vielen Dank. Leider verstehe ich die Syntax-Darstellung hier im Forum nicht. Aber das mit dem over ist interessant.
Muss ich, um Deinen ersten Beitrag abzubilden, erst die Häufigkeit durch count(*) und dann das Fenster in einem übergeordneten Query bilden, oder geht das in einem Abwasch?
Was mache ich falsch?

Code:

SELECT *,
       (sum (h)
           OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING))
  FROM (SELECT auspraegung, count (*) h
          FROM tabelle

        GROUP BY auspraegung) a
ORDER BY h DESC

akretschmer · 1 Dezember 2016

derhenry schrieb:
Was mache ich falsch?

Vergleich einfach die Syntax. Mein Query (also das allererste) war (Copy&Paste hat es etwas vermanscht)

Code:

select *, 100 * sum(h) over(rows between unbounded preceding and current row) / (sum(h) over (rows between unbounded preceding and unbounded following))::numeric as anteil from derhenry ;

Rest bilden

derhenry

Fleissiger Benutzer

akretschmer

Datenbank-Guru

derhenry

Fleissiger Benutzer

akretschmer

Datenbank-Guru

derhenry

Fleissiger Benutzer

akretschmer

Datenbank-Guru