View previous topic :: View next topic |
Author |
Message |
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3752 Location: Rasi, Finland
|
Posted: Tue Nov 26, 2024 8:48 pm Post subject: sh - function to pass data stdin -> stdout unchanged |
|
|
I have this function, which takes tsv formatted stdin: Code: | tsv_column() {
if [ -t 1 ]
then
# Columnize tsv formatted input
column -t -s $'\t'
else
# We're in pipe. Output raw tsv.
# Useless use of cat?
cat
fi
} | Is there a way to avoid cat, but to use some shell redirection for example?
I was thinking about exec, but the previous state of redirections need to be restored after function returns (so maybe invoke subshell?). I'm really not sure which way works and is the "right" way.
I'm looking for a solution that works in busybox shell and in bash. It surely doesn't hurt to have it working in dash too for example. ;) _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
Last edited by Zucca on Tue Nov 26, 2024 9:32 pm; edited 1 time in total |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3465
|
Posted: Tue Nov 26, 2024 9:01 pm Post subject: |
|
|
Can you pass the second command either as a parameter to your function or though env, and then call it from within your function either with pipe or directly, depending on which branch you hit?
Actually, do you even need a function for just a single test with one-line paths? _________________ Make Computing Fun Again |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1290 Location: Richmond Hill, Canada
|
Posted: Tue Nov 26, 2024 9:10 pm Post subject: |
|
|
Couldn't you use read and echo to replace cat?
like while read line; do
echo $line
done |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3752 Location: Rasi, Finland
|
Posted: Tue Nov 26, 2024 9:23 pm Post subject: |
|
|
@pingtoo, I think cat performs better than echo inside a loop.
@szatox, I didn't quite follow. The point of this function it to recognize if the user ran the script inside a pipeline, so it won't try to columnize the output, but keep the original tsv formatting for easier parsing.
EDIT: My wording in the first post was maybe misleading. EDIT2: Edited the first post. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
Last edited by Zucca on Tue Nov 26, 2024 9:34 pm; edited 2 times in total |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22746
|
Posted: Tue Nov 26, 2024 9:25 pm Post subject: |
|
|
pingtoo wrote: | Couldn't you use read and echo to replace cat?
like while read line; do
echo $line
done | That would be a poor reimplementation of cat, and likely less efficient too. I think the goal is for the unidentified producer to be directly connected to the unidentified consumer. I'm not aware of a way to do what OP wants. I think szatox is on the right track, by avoiding even creating the producer's stdout connected to a pipe that the shell later needs to proxy to the consumer.
Zucca, could you show example uses of the interactive and non-interactive flow? |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1290 Location: Richmond Hill, Canada
|
Posted: Tue Nov 26, 2024 9:29 pm Post subject: |
|
|
Zucca wrote: | @pingtoo, I think cat performs better than echo inside a loop. | I have not tested myself so maybe my thinking is wrong.
But what if change IFS= *(as in nothing). Will the read got entire stdout from pipe output? so it is effective same as cat? |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3752 Location: Rasi, Finland
|
Posted: Tue Nov 26, 2024 9:47 pm Post subject: |
|
|
Hu wrote: | Zucca, could you show example uses of the interactive and non-interactive flow? |
Code: | command_which_produces_tsv_data | tsv_column | ... here tsv_column() would format the tsv data to more human readable form by columnizing the data.
Code: | command_which_produces_tsv_data | tsv_column | some_other_command | ... here tsv_column() would pass the tsv data as-is so that some_other_command can easily parse it.
The point is that tsv_column() lives inside a script, which would normally output neatly columnized data, but if the user decides to pipe the stdout of that script into somewhere, then the data formatting would be preserved for later processing.
I intend to add some way to force either of the formats, but that's really not the question here.
Can I have a shell function pass the data from stdin to stdout without cat (not using any command, but some clever redirection maybe)? I was thinking of , but that doesn't work. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3465
|
Posted: Tue Nov 26, 2024 10:21 pm Post subject: |
|
|
Ah, ok, had a brain fart here. After a second read I get what what you want.
Quote: | I was thinking about exec, but the previous state of redirections need to be restored after function returns (so maybe invoke subshell?) |
First, if you want it to connect, you need an outlet and a plug. You can't just rub 2 outlets together and expect the power to flow, if you know what I mean, it's like, a part with a hole needs a part with a peg for things to click
Sure, go ahead and prove me wrong, but I say cleanup won't be a problem because redirecting 0 to 1 with "exec" won't even "cat" it.
Second, subshell wouldn't be any better than cat, either of those spawns another process and does all the same magic, so keep it simple. I really doubt you can get anything better than what you started with. You do actually need that cat to serve as a wire connecting your 2 file descriptors
Unless you can cheat your way out of it entirely. But as far as we know, the data comes from stdin.If you want to cut the pipeline short, you'd basically have to do the logic with figuring out whether or not you're in a pipeline outside of your script, and invoke either columns or whatever program consumes the data stream instead of cat.
I've been thinking about something like this:
Code: | $ f () {
if [ -n $1 ]
command"$1"
shift
exec "$command" "$@"
else
columns
fi
}
$ f optional_data_consumer possibly with options
|
_________________ Make Computing Fun Again
Last edited by szatox on Tue Nov 26, 2024 10:42 pm; edited 3 times in total |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1290 Location: Richmond Hill, Canada
|
Posted: Tue Nov 26, 2024 10:35 pm Post subject: |
|
|
I would bet my version of using "read" and "echo" together with 'IFS=' is just efficient as cat in a pipe line. May be the while loop can be remove because the "read" and "echo" happen in middle of pipe line anyway.
if I had Dtrace setup than I can trace for syscall in that context compare to "cat" in the middle of pipe.
I am thinking my version have one less cost for invoke a external program. since "read" and "echo" are universal in every shell.
However I admit if the invoke logic can be change to detect pipe condition prior to invoke tsv_column that "cat" is more efficient than read/echo loop. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3752 Location: Rasi, Finland
|
Posted: Tue Nov 26, 2024 10:54 pm Post subject: |
|
|
pingtoo wrote: | I would bet my version of using "read" and "echo" together with 'IFS=' is just efficient as cat in a pipe line. |
Code: | zucca@NBLK-WAX9X ~ $ tr -dc 'a-z\t[:space:]A-Z0-9\n' < /dev/urandom | head -c $((1024*1024*512)) > /tmp/test.file
zucca@NBLK-WAX9X ~ $ time cat /tmp/test.file > /dev/null
real 0m0.127s
user 0m0.003s
sys 0m0.122s
zucca@NBLK-WAX9X ~ $ time while IFS= read l; do echo $l; done < /tmp/test.file > /dev/null
^C
real 2m36.665s
user 2m24.975s
sys 0m8.615s | @pingtoo, as you see I gave up after 2m 30s mark. ;) _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3752 Location: Rasi, Finland
|
Posted: Tue Nov 26, 2024 10:57 pm Post subject: |
|
|
szatox wrote: | First, if you want it to connect, you need an outlet and a plug. You can't just rub 2 outlets together and expect the power to flow, if you know what I mean, it's like, a part with a hole needs a part with a peg for things to click :lol: | Yes, that's my problem. It's not terribly inefficient of using cat but... _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
eschwartz Developer
Joined: 29 Oct 2023 Posts: 240
|
Posted: Tue Nov 26, 2024 11:56 pm Post subject: |
|
|
"cat" is an unbelievably efficient program that is extremely good at doing what it does. People have written blog posts about how they tried to rewrite cat in Rust for the fearless concurrency and blazing fast speed, and were shocked to discover that rust was in fact extremely slow and it's because they incorrectly thought all you need to do is "just read data from one end and print it on the other end". It turns out that smart programmers who have spent decades optimizing a program have a few tricks up their sleeves and it has nothing to do with programming languages (yes, bash is a programming language, and bash does NOT provide primitives for the functionality the GNU cat developers have relied on here).
You will also find that "cat" is used all over the place as basically the canonical way to implement a pipeline filter that sometimes does something, but in the "cat" case, does nothing. There is a reason for this. The only faster thing to do would be to complexify a different part of your program by introducing branching that determines whether or not to invoke a pipeline filter. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1290 Location: Richmond Hill, Canada
|
Posted: Wed Nov 27, 2024 1:11 am Post subject: |
|
|
Zucca wrote: | pingtoo wrote: | I would bet my version of using "read" and "echo" together with 'IFS=' is just efficient as cat in a pipe line. |
Code: | zucca@NBLK-WAX9X ~ $ tr -dc 'a-z\t[:space:]A-Z0-9\n' < /dev/urandom | head -c $((1024*1024*512)) > /tmp/test.file
zucca@NBLK-WAX9X ~ $ time cat /tmp/test.file > /dev/null
real 0m0.127s
user 0m0.003s
sys 0m0.122s
zucca@NBLK-WAX9X ~ $ time while IFS= read l; do echo $l; done < /tmp/test.file > /dev/null
^C
real 2m36.665s
user 2m24.975s
sys 0m8.615s | @pingtoo, as you see I gave up after 2m 30s mark. | Thank you. I always over thinking of something. you gave me good answer. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22746
|
Posted: Wed Nov 27, 2024 2:26 am Post subject: |
|
|
The shown pipeline puts everything in a single shell, but if the | tsv_column part is buried in a script, and the potentially present | some_other_command isn't, then that's not the same pipeline as shown.
You can play games with exec to save aside and restore stdout, and if you commit to using Bash, you can use process substitution to conditionally insert your final stage. I think if you want non-Bash Bourne, then you will need to follow szatox's suggestion. Code: | #!/bin/sh
function real_work() {
# ...
}
if [ -t 1 ]; then
real_work | column -t -s $'\t'
else
real_work
fi | Then put everything else inside real_work, so that it is conditionally invoked either with its stdout piped to column or with its stdout preserved. |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3752 Location: Rasi, Finland
|
Posted: Wed Nov 27, 2024 8:43 am Post subject: |
|
|
Thanks guys.
I'll see the alternatives, as "skipping cat" seems almost impossible in terms of having function just pass the data forward.
My problem is that there are a lot of functions that output tsv (or NUL separated (nsv?)) data.
It has been easy (to write, that is) just to construct the pipeline with this tsv_column at the end.
After some time I decided to go trough the scripts and see if I could improve any places where I had left TODO or BUG comment. This useless use of cat was one. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
|