I'm lost with all the text encoding stuff...

Find quick help here to get you started with Hollywood
peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

I'm lost with all the text encoding stuff...

Post by peceha » Mon Oct 09, 2017 6:24 pm

Hello,
I have a file (binary, from a game) and I would like to replace some part inside - easy.... no :D

Code: Select all

tbl = {["a"]	= $58,["c"]	= $68,["e"]= $78,["g"]	= $88,["h"]= $90,["k"]	= $A8,["n"]= $C0,["t"]	= $F0,["u"]= $F8,[" "]= $4E,["_"]= $47}

OpenFile(1,"org_bidat")

Local size=FileSize("org_bidat")
Local data$=ReadBytes(1,size)

Local s$="kann gut tau_chen"
Local hexChain=""

For Local i=0 To StrLen(s$)-1
	Local idx=MidStr(s$,i,1)
	hexChain=hexChain..hexstr(tbl[idx]).." "
Next

Local pos=FindStr(data$,sR,True,0,#ENCODING_RAW)

DebugPrint(hexChain)
DebugPrint(pos)

CloseFile(1)
The output I get is:
$A8 $58 $C0 $C0 $4E $88 $F8 $F0 $4E $F0 $58 $F8 $47 $68 $90 $78 $C0
-1
It means that my string was not found (-1)

But that string really exists in the file:
Image

Later I realized that #ENCODING_RAW in following line

Code: Select all

Local pos=FindStr(data$,sR,True,0,#ENCODING_RAW)
and is changing the string a little (some hex values are not the same anymore, compare top and bottom line):
$A8 $58 $C0 $C0 $4E $88 $F8 $F0 $4E $F0 $58 $F8 $47 $68 $90 $78 $C0
$E8 $58 $C0 $C0 $4E $C8 $F8 $F0 $4E $F0 $58 $F8 $47 $68 $D0 $78 $C0
So now I know why nothing was found - it is not looking for my original string anymore.

But when I remove #ENCODING_RAW I get error :
Error in line 75 (test.hws): Invalid UTF-8 string in argument 1!
How to find that string????
Please!!! Help!!! :D

User avatar
lazi
Posts: 285
Joined: Fri Feb 25, 2011 12:08 am

Re: I'm lost with all the text encoding stuff...

Post by lazi » Mon Oct 09, 2017 7:37 pm

Just trying to understand your script.

Tell me what is that sR in the second parameter of FindStr()?

peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Post by peceha » Mon Oct 09, 2017 7:45 pm

one line got lost

Code: Select all

For Local i=0 To StrLen(s$)-1
   Local idx=MidStr(s$,i,1)
	sR=sR..Chr(tbl[idx])   <------------------------------ I missed that line
   hexChain=hexChain..hexstr(tbl[idx]).." "
Next


User avatar
lazi
Posts: 285
Joined: Fri Feb 25, 2011 12:08 am

Re: I'm lost with all the text encoding stuff...

Post by lazi » Mon Oct 09, 2017 8:04 pm

Chr() has encoding option too.

Before anything else, please check what happens if UTF8 is turned off by this line in the top of the source:

@OPTIONS {Encoding = #ENCODING_ISO8859_1}

peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Post by peceha » Mon Oct 09, 2017 8:19 pm

@OPTIONS {Encoding = #ENCODING_ISO8859_1}

this helped !!!!!! thanks!!!

But now there is another problem:
one of the letters is represented by "$00" (letter "v") and when I want to find eg:

"ist Fan vom HSV" -> it finds nothing
"ist Fan v" -> it finds such string
"vom HSV" -> also finds that string (at pos. 232043 <--- important !!)
"om HSV" -> this is found at pos.232043 !!! - same position as string above

Looks like $00 is ignored.

peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Post by peceha » Mon Oct 09, 2017 9:02 pm

Image

As seen on the picture:
letter [o] is at position: 232043

and it doesn't matter if I look for "vom HSV" or "om HSV" - I always have the same position.
I think it may have something to do with my new problem from post above (cannot find the whole phrase: "ist Fan vom HSV" if there is a $00 within it)

peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Post by peceha » Tue Oct 10, 2017 9:17 am

Yet another example:

Code: Select all

opo	= {
	["$00"]	= "v",
	["$04"]	= "ö",
	["$08"]	= "w",
	["$20"]	= "z",
	["$47"]	= "_",
	["$4E"]	= " ",
	["$57"]	= "A",
	["$58"]	= "a",
	["$60"]	= "b",
	["$68"]	= "c",
	["$74"]	= "ä",
	["$78"]	= "e",
	["$7F"]	= "F",
	["$80"]	= "f",
	["$88"]	= "g",
	["$8F"]	= "H",
	["$90"]	= "h",
	["$98"]	= "i",
	["$A8"]	= "k",
	["$B0"]	= "l",
	["$B8"]	= "m",
	["$C0"]	= "n",
	["$C8"]	= "o",
	["$D0"]	= "p",
	["$E0"]	= "r",
	["$E7"]	= "S",
	["$E8"]	= "s",
	["$F0"]	= "t",
	["$F8"]	= "u",
	["$FF"]	= "V"
}

OpenFile(1,"org_bidat")
OpenFile(2,"bidat.txt",#MODE_READWRITE)

inLine=False
text$=""
While Not Eof(1)
	Local c$=ReadByte(1)
	c$=HexStr(c$)
	If RawGet(opo,c$)
		inLine=True
		text$=text$..opo[c$]
	Else
		If inLine=True
			WriteLine(2,text$)
			text$=""
			inLine=False
		EndIf
	EndIf
Wend

CloseFile(2)
CloseFile(1)
That table contains part of alphabet I already figured out.
When I run that program, all Values found in that table should be printed into separate file (and should be replaced with symbols accordingly)


Look what the program does when it finds $00, which is letter[v] - I have highlighted this part:
Image
It just skips it.
The "om HSV" written below is because of my script - when it cannot find a symbol in an alphabet table it starts writing into new line.
Looks like $00 is being ignored.

The highlighted text should be in one line only and should read:
ist Fan vom HSV

I hope now it is clear what I've been trying to explain (I have some difficulties when it comes to english :) )
How can I solve it?

User avatar
Allanon
Posts: 435
Joined: Sun Feb 14, 2010 8:53 pm
Location: Italy
Contact:

Re: I'm lost with all the text encoding stuff...

Post by Allanon » Tue Oct 10, 2017 12:46 pm

I think that your problem is because null terminated strings, IIRC null is equivalent of $00, so when you find the 'v' and add the $00 to the string you are building you are terminating that string.

You have to way IMO:
1. Use a memory area where you can poke your character values.
2. Use another symbol for the $00 and when you have finished the conversion, replace that symbol with $00, however remember that you cannot use it as a normal string because $00 means end of string.

Here are some infos about it

Hope it helps :)
----------------------------
[Allanon] Fabio Falcucci
AMC - Creative Development // Docs Site // Support Forum
Support me on Patreon for Hollywood libraries!

peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Post by peceha » Tue Oct 10, 2017 4:16 pm

That article made my head explode :)

But thanks for explanation.
I think I will try to catch "$00" and use a substitute somehow - will see if I can manage that,

Thanks

peceha
Posts: 111
Joined: Tue Dec 13, 2016 9:39 am
Location: Poland

Re: I'm lost with all the text encoding stuff...

Post by peceha » Tue Oct 10, 2017 8:17 pm

Well...
no need for any substitutes :D
there was a mistake in my script - I looked into documentation for hexStr() and what I saw at the end was:
This will return the string "$FF".
and I assumed that hexStr() always gives the text with 3 letters, like: "$00", "$0A" - but that is not true. It gives: "$0", "$A" instead.
So I changed the first indexes of my alphabet table accordingly and all is working like a charm !!!!

Thank You all for help.

Post Reply