View previous topic :: View next topic |
Author |
Message |
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Sat Nov 04, 2023 12:02 am Post subject: how to split a file in not equally large parts? |
|
|
how to split a 75GB file in to 7 not equally parts?
....:
75gb_file.1 = 11GB
75gb_file.2 = 3GB
75gb_file.3 = 26GB
75gb_file.4 = 17GB
75gb_file.5 = 6GB
75gb_file.6 = 8GB
75gb_file.7 = 4GB
with "split" all files have the same size, that does not work.
Last edited by SarahS93 on Wed Nov 08, 2023 10:05 pm; edited 4 times in total |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22672
|
Posted: Sat Nov 04, 2023 1:01 am Post subject: |
|
|
Would this work? Code: | declare -i counter
counter=1
for i in 11 3 26 17 6 8 4; do head -c${i}G > "part.$counter"; (( ++ counter )); done < 75G |
|
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Sat Nov 04, 2023 1:31 am Post subject: |
|
|
it works, thats great, thanks! |
|
Back to top |
|
|
toralf Developer
Joined: 01 Feb 2004 Posts: 3940 Location: Hamburg
|
Posted: Sat Nov 04, 2023 9:09 am Post subject: |
|
|
Hu wrote: | Would this work? Code: | declare -i counter
counter=1
for i in 11 3 26 17 6 8 4; do head -c${i}G > "part.$counter"; (( ++ counter )); done < 75G |
| wow - til this. |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Wed Nov 08, 2023 10:26 pm Post subject: |
|
|
with
shuf -i 1-100 -n 4
i become 4 random numbers
but whats the way to split a number randomly into multiple numbers give the number and n groups?
for instance, if the number is 100 and the number of groups is 4 it should give any random list of 4 numbers that add upto 100:4
input number = 100
number of groups = 4
...
25 33 12 30
11 19 47 23
8 17 40 35
(i am looking for an other way for the "11 3 26 17 6 8 4" where i must not calculate myself)
i found
awk 'BEGIN{srand()} {s=int(rand()*7); print > (FILENAME"."s)}' file75G
but the spitet files togehter are 1 byte bigger the the not splittet file, dont know why
but the splittet files are all round about 10,7G, thats not what i am looking for. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Wed Nov 08, 2023 10:52 pm Post subject: |
|
|
That's a NP problem... Hard to create a solution (though there are a lot), though this is checkable (almost like the knapsack problem).
Lots of heuristics however... easiest one is to take your max, pick a random number from 1..max, subtract it from the original, then rinse and repeat on the remaining. The latter pieces will tend to be unfairly small however -- hence this is a heuristic.
Another heuristic is to take max/n pieces at first and then randomly shuffle a bit to each bin until one is happy with the results... ugly and could take a lot of iterations.
I hope this is just a theoretical problem, else could you disclose why one would need this??! _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3433
|
Posted: Wed Nov 08, 2023 11:03 pm Post subject: |
|
|
Sure, let's play
Code: | i=4; j=100; while [[ $i -gt 1 ]]; do k=$((RANDOM%${j})); echo $k ; j=$((j-${k})); i=$((i-1)); done; echo $j | this will tend to produce bigger blocks at the beginning, it might be a good idea to modify $k before it's used to ensure it's within reasonable range
Code: | i=4; j=100; ( while [[ $i -gt 1 ]]; do echo $((RANDOM%${j})); i=$((i-1)); done ; ) | ( sort -hr; echo 0 ; ) | ( k=$j; while read line; do echo $(( $k - line )); k=$line; done ) |
pick random points, order them, and calculate distance between them
Both work in bash
eccerr0r: probably this answers your "why" question. Just a gut feeling though.
https://forums.gentoo.org/viewtopic-p-8806711.html#8806711 |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Thu Nov 09, 2023 2:02 pm Post subject: |
|
|
eccerr0r
i am looking for a way to split a file into equally parts.
The way from Hu in Post #2 is great, but i must calculate the partsize myself.
szatox
thanks for your lines, it works!
sometimes i become a number 0
is there a way to exclude the number 0 ?
bigger blocks is not what i am looking for.
how do i modify $k ?
can you give me please an example
i am looking for a way that 100:4 =
27 25 33 15 is good
25 23 35 17 is good too
5 7 6 82 is not good
10 10 10 60 is not good
... |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3433
|
Posted: Thu Nov 09, 2023 2:40 pm Post subject: |
|
|
Quote: | sometimes i become a number 0
is there a way to exclude the number 0 ?
bigger blocks is not what i am looking for.
how do i modify $k ? | I'd use multiplication, division and addition (in this particular order), a constant and the value of i. Current loop only holds 3 instructions, it's not difficult to analyze, so try doing it yourself.
Whatever you come up with, it goes right in front of "echo $k" though.
Code: | i=4; j=100; a=( ); while [[ $j -gt 0 ]]; do k=$((RANDOM%${i})); a[$k]=$((a[$k]+1)); j=$((j-1)); done; echo ${a[@]} |
Here, chunks of normally distributed sizes, in random order.
I think the first version should result in Pareto distribution, the second is uniform, and the last one is normal. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Thu Nov 09, 2023 11:48 pm Post subject: |
|
|
What I still don't get is why is
25 25 25 25
not the best solution if you want "equally" parts, this is fast, simple, and deterministic? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20485
|
Posted: Fri Nov 10, 2023 2:46 am Post subject: |
|
|
The title says "not equally large parts". _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22672
|
Posted: Fri Nov 10, 2023 3:34 am Post subject: |
|
|
Yes, but a later post in the thread specifies equally large parts. If we want unequally large parts, my preference would be to divide the file into chunks of sizes 1, 1, 1, ... and N-(num chunks). Using one byte for most chunks makes the algorithm easy, and guarantees the parts are not all of equal size. If instead we need to have every chunk use a unique size, then make the first chunk 1 byte, the second 2 bytes, and so on, until the last chunk is the remainder of the file. Since there is no stated purpose for this exercise, it is difficult to determine whether this solution is acceptable.
OP: what are you trying to achieve with this split? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Fri Nov 10, 2023 9:01 am Post subject: |
|
|
Yes, I was wondering because this seems to have no practical purpose other than as a thought experiment... if there is one I'd like to know.
Does it even have to be random... 24 26 23 27 works (and this is a precooked heuristic - start with equal numbers and every other chunk you add one subtract one, add two, subtract two...keep going until you run out, should work fine n chunks less than total divided by n... no randomness at all and every file is different size and all are about the size of the average... up until there are a huge number of chunks for a small file....
... why.... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20485
|
Posted: Sat Nov 11, 2023 4:31 pm Post subject: |
|
|
Hu wrote: | Yes, but a later post in the thread specifies equally large parts. If we want unequally large parts, my preference would be to divide the file into chunks of sizes 1, 1, 1, ... and N-(num chunks). Using one byte for most chunks makes the algorithm easy, and guarantees the parts are not all of equal size. If instead we need to have every chunk use a unique size, then make the first chunk 1 byte, the second 2 bytes, and so on, until the last chunk is the remainder of the file. Since there is no stated purpose for this exercise, it is difficult to determine whether this solution is acceptable.
OP: what are you trying to achieve with this split? | I think that use of "equally" was a typo / language issue. The rest of the post clarifies with examples what they are looking for. I interpret "100:4" to be 100% into 4 parts. The examples of "good" and "not good" suggest (my words) a "maximum size difference" between each of the 4 chunks. From the examples, 33:15 and 35:17 only differ by 18. Those are "good," whereas 82:7 and 60:10 are "not good." So a difference of something less than 50 with 18ish is acceptable. Obviously SarahS93 should be more specific, otherwise the correct solution may be impossible to achieve.
As for the big picture, and based partly on unrelated posts, I'm guessing the goal is to scramble a file as a means of "security" to then allow a recipient of the scrambled file to reassemble it when given the reconstruction information, presumably separate from the file. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sat Nov 11, 2023 5:19 pm Post subject: |
|
|
random small pieces are just as difficult/annoying to properly reassemble as larger similar sized pieces, so hence a good reason for these pieces still remains...
The file is still plaintext anyway so important data can still be gleaned from it. If you encrypt it, well, the encryption adds entropy anyway and the piece scrambling is just another step to reconstruct where a larger key size would have been just as good with fewer steps to reconstruct.
Hence this still just looks like an academic exercise. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20485
|
Posted: Sat Nov 11, 2023 9:07 pm Post subject: |
|
|
Well, if you somehow guess the first separation is at 10GB and are able to rejoin each with the same offset, that's easier than figuring different offsets at each step. I'm not saying it's a good idea or at all secure, but past posts lead me in that direction. Maybe it's academic, maybe it isn't. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Sat Nov 11, 2023 11:35 pm Post subject: |
|
|
the soloutions here are very acceptable, thanks!
it must not be random, 24, 26, 23 and 27 is good too.
100:4 is only an example
there is a practical purpose for me!
i have encrypted container files from 22gb upto 77gb
they i will split in 5 or upto 9 parts with unequel size.
after that, i do split each of the parts again into equel size parts and put them into a wrong sequence into a new file.
pjp wrote: |
As for the big picture, and based partly on unrelated posts, I'm guessing the goal is to scramble a file as a means of "security" to then allow a recipient of the scrambled file to reassemble it when given the reconstruction information, presumably separate from the file. |
yes, so it is
but for any file i do use other options
one file is in 5 parts, an other is in 9 parts and an othere one is in 7 parts and so on....
i will not have any pattern
eccerr0r
no it is not an academic exercise.
szatox
i=7; j=75; a=( ); while [[ $j -gt 0 ]]; do k=$((RANDOM%${i})); a[$k]=$((a[$k]+1)); j=$((j-1)); done; echo ${a[@]}
works very great for me, thanks!
any idea how to add a filter for lines like
10 10 5 11 5 19 15
there are two times "10"
i think about if there any number two or more times, than run the command again?
Code: | i=7; j=77; a=( ); while [[ $j -gt 0 ]]; do k=$((RANDOM%${i})); a[$k]=$((a[$k]+1)); j=$((j-1)); done; echo ${a[@]}
11 17 15 7 9 4 14 |
works! i can use it for gernate random "gigabyte" numbers
but if i run it with megabyte number like 77777 (77,~ gb)
Code: | i=7; j=77777; a=( ); while [[ $j -gt 0 ]]; do k=$((RANDOM%${i})); a[$k]=$((a[$k]+1)); j=$((j-1)); done; echo ${a[@]}
11127 11267 11064 11214 10972 11005 11128 |
than all numbers closer together
i will combine it with
Code: | declare -i counter
counter=1
for i in $(i=7; j=77; a=( ); while [[ $j -gt 0 ]]; do k=$((RANDOM%${i})); a[$k]=$((a[$k]+1)); j=$((j-1)); done; echo ${a[@]} ) ; do head -c${i}G > "part$counter"; (( ++ counter )); done < file |
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3433
|
Posted: Sun Nov 12, 2023 12:41 am Post subject: |
|
|
Quote: | yes, so it is
but for any file i do use other options
one file is in 5 parts, an other is in 9 parts and an othere one is in 7 parts and so on....
i will not have any pattern
eccerr0r
no it is not an academic exercise. |
Well... Let me just tell you this: everyone can come up with such a security scheme that he won't be able to imagine anyone breaking it.
E.g. a monkey might want to hide a banana in a hollow.
Anyway, if the file has a structure that can be traced, either with a dictionary or by magic numbers or whatever, it can probably be stitched back together, making your security measure ineffective.
If you encrypt it first to get rid of any traces of structure, breaking a proper encryption will take literally ages (even if we add currently unknown computing power boost from new technologies), making this extra step unnecessary while still inconvenient. Basically, you're doing something stupid, so it would actually be better to keep it in the realm or academic exercises.
Quote: | any idea how to add a filter for lines like
10 10 5 11 5 19 15
there are two times "10"
i think about if there any number two or more times, than run the command again? | Sure, you can run it mutiple times and discard results you don't like, or make it actually deterministic, like do integer division with remainder to determine average chunk size and desired bias, and then in a loop add and subtract counter to spread the values, skipping some chunks to let the resulting bias actually matching your remainder.
Play with the numbers a bit, there is no "one right way to do that". I mean, if you use it for security, all ways will be equally bad but it's still a fun little exercise, so whatever. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20485
|
Posted: Sun Nov 12, 2023 2:16 am Post subject: |
|
|
szatox wrote: | Well... Let me just tell you this: everyone can come up with such a security scheme that he won't be able to imagine anyone breaking it.
E.g. a monkey might want to hide a banana in a hollow.
Anyway, if the file has a structure that can be traced, either with a dictionary or by magic numbers or whatever, it can probably be stitched back together, making your security measure ineffective. | Given that the information is encrypted, it seems much less likely that it could be stitched back together. Wouldn't it have to be reassembled correctly _before_ they could break the encryption? If you have a file that was encrypted and then had its parts disassembled and randomly reassembled, it would not be possible for the person in possession of the file to be of assistance in either correctly reordering the file or decrypting it.
szatox wrote: | If you encrypt it first to get rid of any traces of structure, breaking a proper encryption will take literally ages (even if we add currently unknown computing power boost from new technologies), making this extra step unnecessary while still inconvenient. Basically, you're doing something stupid, so it would actually be better to keep it in the realm or academic exercises. | I disagree and think that perspective is primarily useful in academic theory. No matter how academically unbreakable an encryption might be, it becomes pointless when wrench-decryption is applied. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3433
|
Posted: Sun Nov 12, 2023 1:09 pm Post subject: |
|
|
Quote: | No matter how academically unbreakable an encryption might be, it becomes pointless when wrench-decryption is applied. | Sure, but the same thing applies to reordering the pieces, and the attack does not become even a tiny bit more difficult. It won't even require upgrading a 5$ wrench to an 8$ one. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22672
|
Posted: Sun Nov 12, 2023 4:01 pm Post subject: |
|
|
Until the most recent post from OP, it looked like OP was trying to use this split+scramble in lieu of, rather than in addition to, proper cryptography. For the very first post, I was willing to consider that this was just an attempt to deal with weird size caps on an upload service, and splitting the file into pieces allowed them to be uploaded to a service that would not take the full file in a single step.
History shows many examples of people inventing self-perceived "great" designs to use to obfuscate data, calling it encryption, and then having it fail horribly when serious attackers (whether cryptography researchers who publish their attack, or malicious attackers who use the attack for direct gain) examine the algorithm and find that an output "ciphertext" can be converted back to the plaintext form in very reasonable time. Therefore, I (and, I think several other posters) have a strong bias toward assuming that anyone asking for this sort of thing is doing it as part of a homemade encryption scheme that will likely fail the first time a serious attacker tries to analyze it. The responsible thing to do in that case is to warn the requester that this is not a secure way to conceal data, and that using a well verified cryptographic protocol will be both easier and safer.
Yes, physical assault can break even the best cryptographic schemes, but one of the goals of good cryptography is that an attacker's only choices are (1) a brute force search of a large key space, which ideally should take so long on average that the attacker chooses not to bother or (2) attack the endpoint via espionage / assault.
If this is being used in addition to a known good cryptography scheme, then I think it is still pointless because it adds complexity with little value. However, if OP wants to overcomplicate a secure system, and does so in a way that does not reduce its security, that is OP's choice. |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Sun Nov 12, 2023 9:52 pm Post subject: |
|
|
if i read it right, short form, scramble and split makes it not bad, but not better for more securtiy?
create an encrypted file, and into it an encrypted file too, and so on and so on ... how deep, 5 times? diferent chippers, no headers at the encrypted files (storage for all 5 separated) - is this more sercure than 1 encrypted file with scamble and split?
encrypted file (header detached) - is it possible to identify this file as an encrypted file and as an encrypted luks file? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3433
|
Posted: Sun Nov 12, 2023 10:58 pm Post subject: |
|
|
SarahS93, explanation in a nutshell here: https://xkcd.com/538/
The wrench attack does have some disadvantages, like it's not exactly stealthy, but brute force is best applied to the weakest link, and it ain't AES right now.
Encrypting stuff with different algorithms and different (unrelated) passwords could help with not-yet-discovered vulnerabilities which could potentially make decryption trivial in the future, but it won't protect you from the wrench in hands of a determined attacker, while still piling up inconveniences for the users. And sufficiently inconvenienced users write passwords down on sticky notes next to their monitors.
Quote: | encrypted file (header detached) - is it possible to identify this file as an encrypted file and as an encrypted luks file? | Hard to say. Luks container with a detached header should look just like random noise (unless you made a mistake creating it. Like making it a sparse file, or using a bad encryption mode)
Do you mind explaining why you have a file with 100GB of noise though? That's sus. |
|
Back to top |
|
|
SarahS93 l33t
Joined: 21 Nov 2013 Posts: 728
|
Posted: Fri Jun 21, 2024 11:29 pm Post subject: |
|
|
with
Code: | declare -i counter ; counter=1 ; for i in 11 3 26 17 6 8 4; do head -c${i}G > "part.$counter"; (( ++ counter )); done < 75G |
the file is reading and splitting, ok
how to run this throug pipe viewer ?
Code: | declare -i counter ; counter=1 ; for i in 11 3 26 17 6 8 4; do head -c${i}G | pv > "part.$counter"; (( ++ counter )); done < 75G |
the output goes throug pv, but with each new file that is createt, pv makes a new line, not good to show.
dont know the way to send the input throug pipe viewer.
the input comes from "< 75G" how to send this throug pv and than to head? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3433
|
Posted: Sat Jun 22, 2024 12:36 am Post subject: |
|
|
Maybe put the whole input through a single pv instead of putting each chunk through a different one?
Instead of
for ... pv ... done < input
use this
pv input | for ... done _________________ Make Computing Fun Again |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|