DeserializeTable and EM character

Discuss any general programming issues here
Post Reply
Flinx
Posts: 188
Joined: Sun Feb 14, 2021 9:54 am
Location: Germany

DeserializeTable and EM character

Post by Flinx »

The DeserializeTable function fails if a string contains the character EM (0x19).

Code: Select all

TestTable={}
TestTable[0]= ByteChr(25)
TestTable$=SerializeTable(TestTable)
TestTable2= DeserializeTable(TestTable$)
Using the Inbuilt adapter it does not fail.
Because the strings for my program comes from outside I have the question, must I check for this and possible other special characters or is it a bug?
I convert the strings into UTF8 and check them with ValidateStr() and I thought this should be enough.

Ralf
Flinx
Posts: 188
Joined: Sun Feb 14, 2021 9:54 am
Location: Germany

Re: DeserializeTable and EM character

Post by Flinx »

Meanwhile I have found that the character was the result of a bad Unicode conversion, but I think that shouldn't matter for DeserializeTable() if a string can contain arbitrary data.
User avatar
jPV
Posts: 603
Joined: Sat Mar 26, 2016 10:44 am
Location: RNO
Contact:

Re: DeserializeTable and EM character

Post by jPV »

I'm also having problems with some non alphanumeric characters when deserializing JSON from web APIs. Many of them have "@" character for hyperlinks, ":" for CURIEs, and "-" characters otherwise in table keys.

Here are examples:

Code: Select all

; works
json$ = "{\"test\": \"alphabets\"}"
t = DeserializeTable(json$)
DebugPrint(t["test"])

; fails
json$ = "{\"te:st\": \"colon\"}"
t = DeserializeTable(json$)
DebugPrint(t["te:st"])

; fails
json$ = "{\"@test\": \"at\"}"
t = DeserializeTable(json$)
DebugPrint(t["@test"])

; fails
json$ = "{\"te-st\": \"hyphen\"}"
t = DeserializeTable(json$)
DebugPrint(t["te-st"])
Basically these should work as table keys in Hollywood, because this is fine:

Code: Select all

; all work
t = {}
t["test"] = "alphabets"
t["te:st"] = "colon"
t["@test"] = "at"
t["te-st"] = "hyphen"
DebugPrint(t["test"], t["te:st"], t["@test"], t["te-st"]) 
Any work-arounds or solutions? I'm getting spaghetti code if I try to replace those characters with placeholders, because it messes the actual data strings too...
User avatar
airsoftsoftwair
Posts: 5433
Joined: Fri Feb 12, 2010 2:33 pm
Location: Germany
Contact:

Re: DeserializeTable and EM character

Post by airsoftsoftwair »

Flinx's problem definitely sounds like a Hollywood bug. jPV's issue, however, could be considered a feature. Even though it's possible to use those special characters as table indices it's not really supported by the serialization interface because the idea is to only (de)serialize items that can be addressed using the "." syntax. It's not possible to use special characters like @ or - with that syntax. Not sure though if it the serializer should be more tolerant here, though.
User avatar
jPV
Posts: 603
Joined: Sat Mar 26, 2016 10:44 am
Location: RNO
Contact:

Re: DeserializeTable and EM character

Post by jPV »

I found that Allanon's JSON library does handle these charactes, so I'm using it for now (have a deadline for a project soon where I really need this), but I would think that it would be really nice if this could be supported internally too now that we have a built-in function for deserializing anyway. Maybe an option to let it be more tolerant?
User avatar
airsoftsoftwair
Posts: 5433
Joined: Fri Feb 12, 2010 2:33 pm
Location: Germany
Contact:

Re: DeserializeTable and EM character

Post by airsoftsoftwair »

jPV wrote: Sat Aug 14, 2021 8:52 am I found that Allanon's JSON library does handle these charactes, so I'm using it for now (have a deadline for a project soon where I really need this), but I would think that it would be really nice if this could be supported internally too now that we have a built-in function for deserializing anyway. Maybe an option to let it be more tolerant?
The general problem is that I'm not sure if it's possible to make the deserializer handle *any* JSON because Hollywood also uses some extensions to signal the content of the individual JSON items, e.g. bytecode containing a function or binary data. So there can always be conflicts.
User avatar
airsoftsoftwair
Posts: 5433
Joined: Fri Feb 12, 2010 2:33 pm
Location: Germany
Contact:

Re: DeserializeTable and EM character

Post by airsoftsoftwair »

All issues should be fixed now.

Code: Select all

- Fix: Removed some restrictions that Hollywood imposed on JSON key names against the specification; all
  characters are valid now in key names except the space character because Hollywood sometimes uses that
  to specify the type of the binary data
- Fix: JSON deserializer didn't recognize the \u escape sequence
- Fix: Some control characters weren't serialized/deserialized correctly when Hollywood was in UTF-8 mode
Post Reply