Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
bc vs. awk
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3618
Location: Rasi, Finland

PostPosted: Wed Aug 21, 2024 7:16 pm    Post subject: bc vs. awk Reply with quote

I found this rather interesting:
Code:
zucca@NBLK-WAX9X ~ $ time echo -e "scale=5\n1.00001^100000" | bc
2.71826

real   0m11.629s
user   0m11.599s
sys   0m0.005s
zucca@NBLK-WAX9X ~ $ time awk 'BEGIN {print 1.00001^100000}'
2.71827

real   0m0.005s
user   0m0.002s
sys   0m0.003s
... My guess is awk takes some shortcuts by rounding while calculating the formula, while bc doesn't drop a single digit of accuracy behind the curtains.
What do other think (or know)?
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3333

PostPosted: Wed Aug 21, 2024 8:02 pm    Post subject: Reply with quote

Quote:
echo -e "scale=7\n1.00001^100000" | bc
2.7182682
I think awk wins this round


Anyway, AFAIR bc uses base10 for its calculations, and calculates with arbitrary precision. It basically calculates things the same way a human with a pen and paper would.
Apparently, it just truncated the output without properly rounding it, which kinda makes sense given the above.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3618
Location: Rasi, Finland

PostPosted: Wed Aug 21, 2024 8:33 pm    Post subject: Reply with quote

szatox wrote:
AFAIR bc uses base10 for its calculations, and calculates with arbitrary precision. It basically calculates things the same way a human with a pen and paper would.
That really would explain the execution time.

Btw... The result is eerily close to Euler's number.
EDIT: Do'h! That's one way to calculate it:
Code:
(1+1/n)^n

_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20341

PostPosted: Wed Aug 21, 2024 10:32 pm    Post subject: Reply with quote

szatox wrote:
Anyway, AFAIR bc uses base10 for its calculations, and calculates with arbitrary precision. It basically calculates things the same way a human with a pen and paper would.
Apparently, it just truncated the output without properly rounding it, which kinda makes sense given the above.
I presumed most people were taught to round up/down*. It seems odd to me to just ignore "those other numbers."

* I've also learned that there different rules to rounding, which seem very strange. I don't remember what they are now... maybe it had to do with money or data analysis?
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
flexibeast
Guru
Guru


Joined: 04 Apr 2022
Posts: 384
Location: Naarm/Melbourne, Australia

PostPosted: Thu Aug 22, 2024 2:18 am    Post subject: Reply with quote

pjp wrote:
I've also learned that there different rules to rounding

The 'Rounding' page on Wikipedia has a table of various approaches to rounding to an integer.
pjp wrote:
I don't remember what they are now... maybe it had to do with money or data analysis?

i'd be interested to know as well .... The above Wikipedia page only mentions a couple of meteorological standards.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20341

PostPosted: Thu Aug 22, 2024 4:06 am    Post subject: Reply with quote

Yikes. I had no idea.

I wish I could remember the context of whatever rounding it was.

"Directed rounding to an integer" mentions financial calculations, but I'm almost certain it was decimal rounding.

"Rounding half away from zero" is a possibility. Also called commercial rounding. It mentions "currency conversions and price roundings."

"Rounding half to even" seems very possible. "Banker's rounding" has a certain familiarity to it. Also that it is "the default rounding mode used in IEEE 754 operations for results in binary floating-point formats."

I'm leaning toward the last one, but I really don't recall what rabbit hole I went down.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3618
Location: Rasi, Finland

PostPosted: Thu Aug 22, 2024 10:38 am    Post subject: Reply with quote

pjp wrote:
szatox wrote:
Anyway, AFAIR bc uses base10 for its calculations, and calculates with arbitrary precision. It basically calculates things the same way a human with a pen and paper would.
Apparently, it just truncated the output without properly rounding it, which kinda makes sense given the above.
I presumed most people were taught to round up/down*. It seems odd to me to just ignore "those other numbers."
So if bc rounds "incorrectly" but awk does it "correctly" and is pretty accurate (anyone know how to add more precision to awk's math operations?), then does the proposed base10 calculating really add up that much overhead? I mean performing around 5500 times slower is quite significant.
_________________
..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 377

PostPosted: Thu Aug 22, 2024 11:04 am    Post subject: Reply with quote

I think BC calculates powers by repeated multiplication. Since it's working to arbitrary precision, once you have a arbitrary-precision multiplication, you can get an arbitrary-precision power just by repeating it. I'm not sure what other way there is to get a true power to a fixed number of digits without internal rounding. In any event, bc seems to take twice as long if I make the power term twice as large; but I guess you'd need to look at the source code to be sure that's what's going on.

I believe gawk uses regular floating-point math, and (I would guess) calculates powers by multiplying logarithms. Such an operation would take, I guess, a fixed time. You can do the same thing with bc, and it's pretty fast:

Code:

time echo -e "scale=20; e(l(1+ 1/100000)*100000)" | bc -l
2.71826823717448944150

real   0m0.002s
user   0m0.001s
sys   0m0.001s


BR, Lars.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20341

PostPosted: Thu Aug 22, 2024 4:06 pm    Post subject: Reply with quote

Zucca wrote:
So if bc rounds "incorrectly" but awk does it "correctly" and is pretty accurate (anyone know how to add more precision to awk's math operations?), then does the proposed base10 calculating really add up that much overhead? I mean performing around 5500 times slower is quite significant.
awk uses "%.6g" by default:
Code:
$ time awk 'BEGIN {printf "%.20g\n", 1.00001^100000}'
2.7182682371975284141

real    0m0.002s
user    0m0.001s
sys     0m0.000s
And even that only does 19 decimal digits, so that's odd.

I would have guessed bc to be "better" since that's it's sole purpose. Apparently gawk can do 100bits of precision. Compared to lars' use of bc, awk's output differs notably after 10 digits:
Code:
awk
2.7182682371 975284141

bc
2.7182682371 7448944150


And I think that issue is related to the rabbit hole I went down regarding rounding.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
wanne32
n00b
n00b


Joined: 11 Nov 2023
Posts: 68

PostPosted: Fri Aug 23, 2024 2:34 pm    Post subject: Reply with quote

You can use maxima to calculate the exact number without rounding:
Code:
maxima --batch-string='(1/100000+1)^100000;'

Or floor to 21 digits:
Code:
maxima --batch-string='floor((1/100000+1)^100000*100000000000000000000)/100000000000000000000;'

So 2.71826823717448966803... would be right.
So bc with 20digits is only slightly better since it also seems to implement the power function on its own, rounding intermediate results.
acalc does a really decent job:
Code:
calc '1.00001^100000'
~2.71826823717448966804

While the glibc is blazing fast but has really bad precision:
Code:
#include <stdio.h>
#include <math.h>
int main()
{ printf("%0.20lf",pow(1.00001,100000.0)); }

Code:
gcc -Wall a.c ; time ./a.out
2.71826823719229748733
real    0m0.001s
user    0m0.001s
sys     0m0.000s

This affects more or less all programming languages,non-math programs like python/libreoffice calc, kcalc...
Back to top
View user's profile Send private message
sublogic
Apprentice
Apprentice


Joined: 21 Mar 2022
Posts: 265
Location: Pennsylvania, USA

PostPosted: Sat Aug 24, 2024 12:21 am    Post subject: Reply with quote

wanne32 wrote:
While the glibc is blazing fast but has really bad precision:
Code:
#include <stdio.h>
#include <math.h>
int main()
{ printf("%0.20lf",pow(1.00001,100000.0)); }

Code:
gcc -Wall a.c ; time ./a.out
2.71826823719229748733
real    0m0.001s
user    0m0.001s
sys     0m0.000s

This affects more or less all programming languages,non-math programs like python/libreoffice calc, kcalc...


There is already a loss of precision in the binary representation of 1.00001 . When taking a power of a number so close to 1 the log1p() function can be useful:
Code:
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
{
   printf("%0.20le", 1.00001-1);
   puts("\t1.00001-1");

   printf("%0.20lf", pow(1.00001, 100000.0));
   puts("\t\tpow(1.00001, 100000.0)");

   printf("%0.20lf", exp(log1p(.00001)*100000.0));
   puts("\t\texp(log1p(.00001)*100000.0)");

   exit(0);
}
Result:
Code:
1.00000000000655120402e-05   1.00001-1
2.71826823719229748733      pow(1.00001, 100000.0)
2.71826823717448995410      exp(log1p(.00001)*100000.0)
The log1p() version gets about 16 decimals right, which is machine precision for double precision floating-point.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3364
Location: Canada

PostPosted: Sat Aug 24, 2024 8:38 am    Post subject: Reply with quote

I remember when I was a graduate student in the 80-s, the younger brother of my classmate was writing a bachelors thesis ( kind of diploma thesis) in math department which was about algorithms and precision of computer evaluation of trigonometric functions .... It is all a non-trivial business when you work close to machine precision.
Back to top
View user's profile Send private message
wanne32
n00b
n00b


Joined: 11 Nov 2023
Posts: 68

PostPosted: Sat Aug 24, 2024 10:00 am    Post subject: Reply with quote

sublogic wrote:
Code:
2.71826823717448995410      exp(log1p(.00001)*100000.0)
I see why that works. Being near zero is much more exact than near one. But I would never had the idea by myself. Respect for that.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9752
Location: almost Mile High in the USA

PostPosted: Sat Aug 24, 2024 1:43 pm    Post subject: Reply with quote

Yes, bc is not using the FPU or even CPU register floating or fixed point math (like FDIV or ADDL instructions) because these arbitrary precision numbers can and will exceed 32 or 64-bit registers of the CPU. So it rather manually handle each digit as a character. Larger numbers with more digits will require more CPU time. Similar calculation times should be seen with using GMP.

Awk, perl, C, etc., etc. will handle numbers with CPU/FPU floating point routines and whatever's in glibc. They also make shortcuts but unless you're using a Pentium with the FDIV bug, were designed to be accurate within the precision (number of bits in a cpu register) asked and no more. Some floating point operations require Newton's method to calculate and thus iterative, but it takes roughly the same amount of time no matter how "large" the number is. Yes some operations even use look up tables but that's how it's done to take as little time (and silicon) as possible to do the math.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum