I'm lost with all the text encoding stuff...

Find quick help here to get you started with Hollywood

I'm lost with all the text encoding stuff...

Postby peceha » Mon Oct 09, 2017 6:24 pm

Hello,
I have a file (binary, from a game) and I would like to replace some part inside - easy.... no :D

Code: Select all
tbl = {["a"]   = $58,["c"]   = $68,["e"]= $78,["g"]   = $88,["h"]= $90,["k"]   = $A8,["n"]= $C0,["t"]   = $F0,["u"]= $F8,[" "]= $4E,["_"]= $47}

OpenFile(1,"org_bidat")

Local size=FileSize("org_bidat")
Local data$=ReadBytes(1,size)

Local s$="kann gut tau_chen"
Local hexChain=""

For Local i=0 To StrLen(s$)-1
   Local idx=MidStr(s$,i,1)
   hexChain=hexChain..hexstr(tbl[idx]).." "
Next

Local pos=FindStr(data$,sR,True,0,#ENCODING_RAW)

DebugPrint(hexChain)
DebugPrint(pos)

CloseFile(1)


The output I get is:
$A8 $58 $C0 $C0 $4E $88 $F8 $F0 $4E $F0 $58 $F8 $47 $68 $90 $78 $C0
-1

It means that my string was not found (-1)

But that string really exists in the file:
Image

Later I realized that #ENCODING_RAW in following line
Code: Select all
Local pos=FindStr(data$,sR,True,0,#ENCODING_RAW)

and is changing the string a little (some hex values are not the same anymore, compare top and bottom line):
$A8 $58 $C0 $C0 $4E $88 $F8 $F0 $4E $F0 $58 $F8 $47 $68 $90 $78 $C0
$E8 $58 $C0 $C0 $4E $C8 $F8 $F0 $4E $F0 $58 $F8 $47 $68 $D0 $78 $C0


So now I know why nothing was found - it is not looking for my original string anymore.

But when I remove #ENCODING_RAW I get error :
Error in line 75 (test.hws): Invalid UTF-8 string in argument 1!


How to find that string????
Please!!! Help!!! :D
peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Postby lazi » Mon Oct 09, 2017 7:37 pm

Just trying to understand your script.

Tell me what is that sR in the second parameter of FindStr()?
User avatar
lazi
 
Posts: 285
Joined: Fri Feb 25, 2011 12:08 am

Re: I'm lost with all the text encoding stuff...

Postby peceha » Mon Oct 09, 2017 7:45 pm

one line got lost

Code: Select all
For Local i=0 To StrLen(s$)-1
   Local idx=MidStr(s$,i,1)
   sR=sR..Chr(tbl[idx])   <------------------------------ I missed that line
   hexChain=hexChain..hexstr(tbl[idx]).." "
Next

peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Postby lazi » Mon Oct 09, 2017 8:04 pm

Chr() has encoding option too.

Before anything else, please check what happens if UTF8 is turned off by this line in the top of the source:

@OPTIONS {Encoding = #ENCODING_ISO8859_1}
User avatar
lazi
 
Posts: 285
Joined: Fri Feb 25, 2011 12:08 am

Re: I'm lost with all the text encoding stuff...

Postby peceha » Mon Oct 09, 2017 8:19 pm

@OPTIONS {Encoding = #ENCODING_ISO8859_1}

this helped !!!!!! thanks!!!

But now there is another problem:
one of the letters is represented by "$00" (letter "v") and when I want to find eg:

"ist Fan vom HSV" -> it finds nothing
"ist Fan v" -> it finds such string
"vom HSV" -> also finds that string (at pos. 232043 <--- important !!)
"om HSV" -> this is found at pos.232043 !!! - same position as string above

Looks like $00 is ignored.
peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Postby peceha » Mon Oct 09, 2017 9:02 pm

Image

As seen on the picture:
letter [o] is at position: 232043

and it doesn't matter if I look for "vom HSV" or "om HSV" - I always have the same position.
I think it may have something to do with my new problem from post above (cannot find the whole phrase: "ist Fan vom HSV" if there is a $00 within it)
peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Postby peceha » Tue Oct 10, 2017 9:17 am

Yet another example:

Code: Select all
opo   = {
   ["$00"]   = "v",
   ["$04"]   = "ö",
   ["$08"]   = "w",
   ["$20"]   = "z",
   ["$47"]   = "_",
   ["$4E"]   = " ",
   ["$57"]   = "A",
   ["$58"]   = "a",
   ["$60"]   = "b",
   ["$68"]   = "c",
   ["$74"]   = "ä",
   ["$78"]   = "e",
   ["$7F"]   = "F",
   ["$80"]   = "f",
   ["$88"]   = "g",
   ["$8F"]   = "H",
   ["$90"]   = "h",
   ["$98"]   = "i",
   ["$A8"]   = "k",
   ["$B0"]   = "l",
   ["$B8"]   = "m",
   ["$C0"]   = "n",
   ["$C8"]   = "o",
   ["$D0"]   = "p",
   ["$E0"]   = "r",
   ["$E7"]   = "S",
   ["$E8"]   = "s",
   ["$F0"]   = "t",
   ["$F8"]   = "u",
   ["$FF"]   = "V"
}

OpenFile(1,"org_bidat")
OpenFile(2,"bidat.txt",#MODE_READWRITE)

inLine=False
text$=""
While Not Eof(1)
   Local c$=ReadByte(1)
   c$=HexStr(c$)
   If RawGet(opo,c$)
      inLine=True
      text$=text$..opo[c$]
   Else
      If inLine=True
         WriteLine(2,text$)
         text$=""
         inLine=False
      EndIf
   EndIf
Wend

CloseFile(2)
CloseFile(1)


That table contains part of alphabet I already figured out.
When I run that program, all Values found in that table should be printed into separate file (and should be replaced with symbols accordingly)


Look what the program does when it finds $00, which is letter[v] - I have highlighted this part:
Image
It just skips it.
The "om HSV" written below is because of my script - when it cannot find a symbol in an alphabet table it starts writing into new line.
Looks like $00 is being ignored.

The highlighted text should be in one line only and should read:
ist Fan vom HSV



I hope now it is clear what I've been trying to explain (I have some difficulties when it comes to english :) )
How can I solve it?
peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Postby Allanon » Tue Oct 10, 2017 12:46 pm

I think that your problem is because null terminated strings, IIRC null is equivalent of $00, so when you find the 'v' and add the $00 to the string you are building you are terminating that string.

You have to way IMO:
1. Use a memory area where you can poke your character values.
2. Use another symbol for the $00 and when you have finished the conversion, replace that symbol with $00, however remember that you cannot use it as a normal string because $00 means end of string.

Here are some infos about it

Hope it helps :)
User avatar
Allanon
 
Posts: 435
Joined: Sun Feb 14, 2010 8:53 pm
Location: Italy

Re: I'm lost with all the text encoding stuff...

Postby peceha » Tue Oct 10, 2017 4:16 pm

That article made my head explode :)

But thanks for explanation.
I think I will try to catch "$00" and use a substitute somehow - will see if I can manage that,

Thanks
peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Postby peceha » Tue Oct 10, 2017 8:17 pm

Well...
no need for any substitutes :D
there was a mistake in my script - I looked into documentation for hexStr() and what I saw at the end was:
This will return the string "$FF".


and I assumed that hexStr() always gives the text with 3 letters, like: "$00", "$0A" - but that is not true. It gives: "$0", "$A" instead.
So I changed the first indexes of my alphabet table accordingly and all is working like a charm !!!!

Thank You all for help.
peceha
 
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Next

Return to Newbie questions

Who is online

Users browsing this forum: No registered users and 1 guest

cron