XMLParser: possible bug in pos() method

Discuss about plugins that don't have a dedicated forum
Post Reply
User avatar
mrupp
Posts: 112
Joined: Sun Jan 31, 2021 7:44 pm
Location: Switzerland
Contact:

XMLParser: possible bug in pos() method

Post by mrupp »

Hi there
I'm using XMLParser to parse an XML (obviously) and my goal is to get the full XML of each listed item. I was having quite some troubles trying that, but in the end I could pin it down to a possible bug in the pos() method having troubles with special characters like ä, ö, ü, é, à, etc.

Here's a strongly simplified example that shows the bug:

Code: Select all

@REQUIRE "RapaGUI", {Link = True}
@REQUIRE "xmlparser", {Link = True}
@APPTITLE "XmlParser-Test"

Global xml1$ = [[<root>
<item id="Q:01">
	<dc:title>A Prayer For England</dc:title>
	<r:narrator>Sinead O'Connor</r:narrator>
</item>
<item id="Q:02">
	<dc:title>Remember</dc:title>
	<r:narrator>aouaou</r:narrator>
</item>
<item id="Q:03">
	<dc:title>Now Is The Time</dc:title>
	<r:narrator>eaeeae</r:narrator>
</item>
</root>]]

Global xml2$ = [[<root>
<item id="Q:01">
	<dc:title>A Prayer For England</dc:title>
	<r:narrator>Sinéad O'Connor</r:narrator>
</item>
<item id="Q:02">
	<dc:title>Remember</dc:title>
	<r:narrator>äöüäöü</r:narrator>
</item>
<item id="Q:03">
	<dc:title>Now Is The Time</dc:title>
	<r:narrator>éàèéàè</r:narrator>
</item>
</root>]]

Function p_EventFunc(msg)
	Switch(msg.ID)
	Case "btnStart1":
		moai.DoMethod("ctrlLog", "clear")
		p_ParseItemList(xml1$)
	Case "btnStart2":
		moai.DoMethod("ctrlLog", "clear")
		p_ParseItemList(xml2$)
	EndSwitch
EndFunction

Function p_Log(text$)
	moai.DoMethod("ctrlLog", "insert", text$ .. "\n", "bottom")
EndFunction

Function p_ParseItemList(xml$)
	p_Log("----------------------------------------\nXML to parse:\n----------------------------------------")
	p_Log(xml$)
	p_Log("\n----------------------------------------\nStart parsing:\n----------------------------------------")

	Local tracks = { }, currentIndex, line, column, currentPos, endPos, currentElement$ = "", currentAttributes$ = ""
	callbacks = {
		StartElement = Function (parser, elementName$, attributes)
			currentElement$ = elementName$
			currentAttributes = attributes
			Switch elementName$
			Case "item":
				currentIndex = ListItems(tracks)
				line, column, currentPos = parser:pos()
				currentPos = currentPos - 1 ; currentPos is on char 'i', substract 1 to include the opening <
				tracks[currentIndex] = { id = attributes.id }
			EndSwitch
		EndFunction,
		EndElement = Function (parser, elementName$)
			Switch elementName$
			Case "item":
				line, column, endPos = parser:pos()
				endPos = endPos + 6 ; endPos is on char '<', add 6 to include "/item>"
				tracks[currentIndex].item = MidStr(xml$, currentPos, endPos - currentPos)
				p_Log("Start: " .. currentpos .. ", End: " .. (endPos - currentPos))
				p_Log(tracks[currentIndex].item .. "\n")
			EndSwitch
		EndFunction
	}

	Local p = xmlparser.new(callbacks)
	p:parse(xml$)
	p:close()
	callbacks = Nil

	Return(tracks)
EndFunction

InstallEventHandler({RapaGUI = p_EventFunc})

moai.CreateApp([[<?xml version="1.0" encoding="iso-8859-1"?>
<application id="app">
	<window id="mainWindow" width="400" height="800" title="XmlParser-Test">
		<vgroup>
			<button id="btnStart1">Start parsing XML 1</button>
			<button id="btnStart2">Start parsing XML 2</button>
			<texteditor id="ctrlLog" noWrap="true" />
		</vgroup>
	</window>
</application>]])

Repeat
	 WaitEvent
Forever
The first example doesn't contain any special characters and runs just fine, but the second containing some does not:
Image Image

It seems that with every special character the internal position-counter is off by one. This adds up, and as my real XML is quite large, by the end I'm only getting rubbish...

SamuraiCrow
Posts: 423
Joined: Fri May 15, 2015 5:15 pm
Location: Waterville, Minnesota USA

Re: XMLParser: possible bug in pos() method

Post by SamuraiCrow »

Doesn't there need to be a header that indicates that the encoding is UTF8 instead of ASCII?
I'm on registered MorphOS using FlowStudio.

User avatar
mrupp
Posts: 112
Joined: Sun Jan 31, 2021 7:44 pm
Location: Switzerland
Contact:

Re: XMLParser: possible bug in pos() method

Post by mrupp »

SamuraiCrow wrote:
Sat Jul 24, 2021 8:04 pm
Doesn't there need to be a header that indicates that the encoding is UTF8 instead of ASCII?
Good idea. Unfortunately, it didn't help. I added

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
to the beginning of the XMLs in my example, but with the same result as before.

User avatar
airsoftsoftwair
Posts: 4759
Joined: Fri Feb 12, 2010 2:33 pm
Location: Germany
Contact:

Re: XMLParser: possible bug in pos() method

Post by airsoftsoftwair »

That‘s not a bug. All position values in the xmlparser plugin are in bytes, not in characters. Luaexpat behaves the same.

User avatar
mrupp
Posts: 112
Joined: Sun Jan 31, 2021 7:44 pm
Location: Switzerland
Contact:

Re: XMLParser: possible bug in pos() method

Post by mrupp »

airsoftsoftwair wrote:
Wed Jul 28, 2021 10:08 pm
That‘s not a bug. All position values in the xmlparser plugin are in bytes, not in characters. Luaexpat behaves the same.
Hmm, too bad (and not very useful that Luaexpat implemented it that way, imho, what would somebody need the byte count for???)... any ideas on how I could work around this issue? I guess I would need something like MidStr() that uses the byte count instead of character count as values, right?

User avatar
mrupp
Posts: 112
Joined: Sun Jan 31, 2021 7:44 pm
Location: Switzerland
Contact:

Re: XMLParser: possible bug in pos() method

Post by mrupp »

I guess I would need something like MidStr() that uses the byte count instead of character count as values, right?
GOT IT: Changed line 75 from

Code: Select all

tracks[currentIndex].item = MidStr(xml$, currentPos, endPos - currentPos)
to

Code: Select all

tracks[currentIndex].item = MidStr(xml$, currentPos, endPos - currentPos, #ENCODING_RAW)
and now it WORKS! :D

Post Reply