View previous topic :: View next topic |
Author |
Message |
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Fri Feb 23, 2024 3:41 am Post subject: read a file from the end to the beginning |
|
|
how to read a file from the end to the beginning?
"tac" can do this?
but if i read a file with tac and write it to a new one, and read the new one again with tac, the file is not the same as by the first reading.
Code: | dd if=/dev/urandom of=test1gb bs=1G count=1
1+0 Datensätze ein
1+0 Datensätze aus
1073741824 Bytes (1,1 GB, 1,0 GiB) kopiert, 2,26988 s, 473 MB/s
tac test1gb > test1gb_1
tac test1gb_1 > test1gb_2
md5sum test1gb*
09118633eeb78190f81dac3d59ca9b5a test1gb
0c27771a14ab8758d6b6faa378adbd5d test1gb_1
c266a73814411dd824b3416595707896 test1gb_2 |
is there an other programm for read a file from the end to the beginning? |
|
Back to top |
|
|
freke Veteran
Joined: 23 Jan 2003 Posts: 1029 Location: Somewhere in Denmark
|
Posted: Fri Feb 23, 2024 9:48 am Post subject: |
|
|
I suspect this has to do with the type of file - works fine for ASCII-text files.
Can be done with binary files with Code: | ns ~ # < test1gb xxd -p -c1 | tac | xxd -p -r > test1gb_1
ns ~ # < test1gb_1 xxd -p -c1 | tac | xxd -p -r > test1gb_2
ns ~ # md5sum test1gb*
06d8ec759353542e1a9ac33c0018a302 test1gb
3202e50b76d5d33506945670f12dd760 test1gb_1
06d8ec759353542e1a9ac33c0018a302 test1gb_2 |
xxd is from app-editors/vim-core |
|
Back to top |
|
|
Goverp Advocate
Joined: 07 Mar 2007 Posts: 2179
|
Posted: Fri Feb 23, 2024 10:28 am Post subject: Re: read a file from the end to the beginning |
|
|
SarahS93 wrote: | how to read a file from the end to the beginning?
"tac" can do this?
but if i read a file with tac and write it to a new one, and read the new one again with tac, the file is not the same as by the first reading.
Code: | dd if=/dev/urandom of=test1gb bs=1G count=1
1+0 Datensätze ein
1+0 Datensätze aus
1073741824 Bytes (1,1 GB, 1,0 GiB) kopiert, 2,26988 s, 473 MB/s
tac test1gb > test1gb_1
tac test1gb_1 > test1gb_2
md5sum test1gb*
09118633eeb78190f81dac3d59ca9b5a test1gb
0c27771a14ab8758d6b6faa378adbd5d test1gb_1
c266a73814411dd824b3416595707896 test1gb_2 |
is there an other programm for read a file from the end to the beginning? |
My guess is this would be down to handling of trailing linefeeds and/or embedded nulls. I'd suggest this is a meaningless test, since I assume your objective is not to read 1GB of random data, either forwards or backwards. With meaningful data you could run "diff" on the two files to see what the change was, but with 1 GB of random stuff (and therefore random length lines) it will be hard. Using test data with embedded nulls is good for the soul, and for discovering coding mistakes, but it's not necessarily useful for determining how a program should work with the actual application data.
If you are considering reading data in C, then it's possible to write a program to read line by line starting from the end (no, I don't have an example to hand, but I'm sure they're out there); if the objective is to read byte by byte backwards from the end, that's even easier, though unlikely to be much use! If you want to write shell script, or use *nix utilities to process lines in a proper text file backwards, cat will almost certainly be what you want. _________________ Greybeard |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3432
|
Posted: Fri Feb 23, 2024 11:37 am Post subject: |
|
|
Tac reverses the order of lines in a file, is this what you want?
Works fine for text files, but the result won't be even close to what you'd expect from reading a binary file backwards.
Also, when reading a binary file backwards: what is the word size you want to use? Big or little endian?
Quote: | My guess is this would be down to handling of trailing linefeeds and/or embedded nulls | Maybe. Tac allows you to manually set a separator (check man tac), so there is a chance it might kinda work with arbitrary files, but it is still a sketchy scenario. _________________ Make Computing Fun Again |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Fri Feb 23, 2024 12:23 pm Post subject: |
|
|
the way with xxd tac and xxd works good for binary files, thanks.
is there a way that to do throug pv to see a status bar? |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9611 Location: beyond the rim
|
Posted: Fri Feb 23, 2024 12:44 pm Post subject: |
|
|
freke wrote: | I suspect this has to do with the type of file - works fine for ASCII-text files.
Can be done with binary files with Code: | ns ~ # < test1gb xxd -p -c1 | tac | xxd -p -r > test1gb_1
ns ~ # < test1gb_1 xxd -p -c1 | tac | xxd -p -r > test1gb_2
ns ~ # md5sum test1gb*
06d8ec759353542e1a9ac33c0018a302 test1gb
3202e50b76d5d33506945670f12dd760 test1gb_1
06d8ec759353542e1a9ac33c0018a302 test1gb_2 |
xxd is from app-editors/vim-core |
One thing to note here: When you're using tac in a pipe like this it will read the entire pipe content first into memory (there isn't really any option to avoid that) before writing it in reverse order to stdout. So if you want to use this on multi-gigabyte files you can easily run out of memory. If you use tac with a filename argument it can avoid that (as it can check how large the file is and seek to the end), but then you have to use temporary files to store the output of your initial filters which will cost storage space and be slower.
Last edited by Genone on Fri Feb 23, 2024 12:51 pm; edited 1 time in total |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9611 Location: beyond the rim
|
Posted: Fri Feb 23, 2024 12:50 pm Post subject: Re: read a file from the end to the beginning |
|
|
Goverp wrote: | My guess is this would be down to handling of trailing linefeeds and/or embedded nulls. I'd suggest this is a meaningless test, since I assume your objective is not to read 1GB of random data, either forwards or backwards. With meaningful data you could run "diff" on the two files to see what the change was, but with 1 GB of random stuff (and therefore random length lines) it will be hard. Using test data with embedded nulls is good for the soul, and for discovering coding mistakes, but it's not necessarily useful for determining how a program should work with the actual application data. |
Given OPs history, the application data is likely random (encrypted) binary data, and reversing is just used for obfuscation. |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Fri Feb 23, 2024 11:51 pm Post subject: |
|
|
Quote: | Given OPs history, the application data is likely random (encrypted) binary data, and reversing is just used for obfuscation. |
you are right.
if i do the tac xxd thing, the read/write speed is about 100mb/s. is the used file system (ext4) the limitation? |
|
Back to top |
|
|
freke Veteran
Joined: 23 Jan 2003 Posts: 1029 Location: Somewhere in Denmark
|
Posted: Sat Feb 24, 2024 12:38 am Post subject: |
|
|
I don' think that's fs limited, but rather hw - I get similar timings on both tmpfs and ext4 (~37-40 secs for the 1gb file) |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Wed Feb 28, 2024 7:19 am Post subject: |
|
|
are there other ways to read a file reverse?
with python or perl? are they faster?
or how can it speed up with xxd and tac? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3432
|
Posted: Wed Feb 28, 2024 12:50 pm Post subject: |
|
|
Quote: | are there other ways to read a file reverse?
with python or perl? are they faster?
or how can it speed up with xxd and tac? | You _can_ write a byte reverse yourself in any language you want. You can even do it in C, file access and array of bytes are simple enough anyone can get there with a bit of Copy-Paste.
But, since you're asking about performance, let me ask you something before you start:
Do you know what DMA is?
Do you know how is sequential memory access different than random memory access?
Do you know how many bytes can a 64 bit CPU copy in a single instruction and how many bytes can it copy when reversing?
Going down the list makes your code slower and slower.
You _can't_ reverse at full speed. It's not going to happen. And since the data is already encrypted, there is no reason to mangle it even more, which means you could completely avoid running this code altogether for the best performance possible. _________________ Make Computing Fun Again |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9611 Location: beyond the rim
|
Posted: Mon Mar 04, 2024 8:54 am Post subject: |
|
|
SarahS93 wrote: | if i do the tac xxd thing, the read/write speed is about 100mb/s. is the used file system (ext4) the limitation? |
Unlikely. If you're reading/writing from/to a mechanical HDD that performance is probably as good as you will get. Otherwise you'd have to double-check where the bottleneck is.
Quote: | are there other ways to read a file reverse?
with python or perl? are they faster? |
If the bottleneck is hardware, then changing the implementation won't do much. And of course it can be implemented in basically any programming language, e.g. a basic python implementation (untested) could look like
Code: |
ifile = open("path/to/input/file", "rb")
ofile = open("path/to/output/file", "wb")
isize = os.stat("path/to/input/file").st_size
bs = 1024 * 1024
pos = (isize / bs) * bs
while pos > 0:
ifile.seek(pos)
data = ifile.read(bs)
data = bytes(reversed(data))
ofile.write(data)
pos -= bs
# process first block of data
ifile.seek(0)
data = ifile.read(bs)[:pos + bs]
data = bytes(reversed(data))
ofile.write(data)
ifile.close()
ofile.close()
|
Certainly not the most optimized version, but you get the idea.
EDIT: Removed syntax errors
Last edited by Genone on Wed Jun 12, 2024 8:09 am; edited 2 times in total |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Sun Jun 09, 2024 1:55 am Post subject: |
|
|
genome
i write your code into test.py
change the input and output file and try to run it
Code: | if = open("test1gb", "rb")
of = open("test1gb_8", "wb")
isize = os.stat("test1gb").st_size
bs = 1024 * 1024
pos = (isize / bs) * bs
while pos > 0:
if.seek(pos)
data = if.read(bs)
data = bytes(reversed(data))
of.write(data)
pos -= bs
# process first block of data
if.seek(0)
data = if.read(bs)[:pos + bs]
data = bytes(reversed(data))
of.write(data)
if.close()
of.close() |
Code: | ./test.py: Zeile 1: Syntaxfehler beim unerwarteten Symbol »(«
./test.py: Zeile 1: `if = open("path/to/input/file", "rb")' |
i found this too
perl -0777pe '$_=reverse $_' test1gb > test1gb_1
it is very fast because perl read the file into the memory and rewrite it than
how is it possible to add a status bar like pv to this command?! |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1735 Location: Germany
|
Posted: Sun Jun 09, 2024 7:58 am Post subject: |
|
|
I've also not tested the python script from Genone, but please make sure you have not copied some invisible garbage into your script. Make sure those are real ( and not some look alike.
Quote: | how is it possible to add a status bar like pv to this command?! |
As far as I know the pv command, it will not work because there is not pipe involved. It is only one command involved and not more then one. https://linux.die.net/man/1/pv _________________ Forum Guidelines
PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sun Jun 09, 2024 8:56 am Post subject: |
|
|
SarahS93,
Reading a file in reverse will always be slow by design.
The HDD tries to keep a cache but that by design reads subsequent blocks, not previous ones.
The kernel filesystem cache does something similar.
You need to defeat the cache systems that hove been developed since before the IBM PC was a thing.
To deal with files on a block device, work with block size data elements, not individual bytes. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9611 Location: beyond the rim
|
Posted: Tue Jun 11, 2024 1:52 pm Post subject: |
|
|
SarahS93 wrote: |
Code: | ./test.py: Zeile 1: Syntaxfehler beim unerwarteten Symbol »(«
./test.py: Zeile 1: `if = open("path/to/input/file", "rb")' |
|
*sigh* That error looks like you attempted to execute the code using bash instead of python which obviously won't work. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22660
|
Posted: Tue Jun 11, 2024 1:58 pm Post subject: |
|
|
Also, it looks like the environment again does not have LC_MESSAGES=C, so the error messages are localized, which makes it harder for us to understand the problem.
However, the shown statement is not valid Python either, because if is a reserved word in python. Code: | $ bash -c 'if = open("a", "r")'
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `if = open("a", "r")'
$ python -c 'if = open("a", "r")'
File "<string>", line 1
if = open("a", "r")
^
SyntaxError: invalid syntax
| I agree that the shown error looks much more like the bash output than the Python output.
Changing if to a non-keyword, such as input_file would avoid the SyntaxError. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2433
|
|
Back to top |
|
|
|