Home > English, Erlang > XML and Erlang

XML and Erlang

As I told you many times, I’m working on this side project about XMPP and Twitter.

The main goal of the project is to provide a sort of interface between XMPP users and their Twitter accounts.

The first feature I would like to provide to users is the recovery of their private message: basically users registered on an XMPP server will have the opportunity to receive direct Twitter messages as normal message stanzas.

I showed you in one of my first posts (here you can find it in Italian) how to recover one user’s timeline, to retrieve the direct messages you just need to slightly adapt that code as follows:

1> application:start(inets).
2> Aut = lists:flatten(["Basic ", base64:encode("user:password")]).
3> {ok, {Status, Head, Body}} = http:request(get,
                    [{"Authorization", Aut}]
                }, [], []).

where the variable Body will be bound to the actual HTML representing the content of the page.

Body will contain in this case a bunch of XML with all our direct messages, now what we need is a parser to extract the useful information: the sender of the message and the message itself.

Erlang provides, among the others, two useful modules to parse XML: xmerl and xmerl_scan; these modules can be very useful if you want to parse, validate or export to other formats your XML. By parsing an XML document you will get a record, displaying the structure of the document, as return value. The record also holds the data of the document and is defined in xmerl.hrl.

Ok, now let’s use play with them a little bit  by using the command xmerl_scan:string(Variable): its result (if the parsing was successful) is a tuple {XML, Misc}. Misc is the XML production which is the mark-up that comes after the element of the document.

4> {Xml, Misc} = xmerl_scan:string(Body).

Note that if you stored your XHTML/RSS somewhere as a file you can also use:

5> {Xml, Misc} = xmerl_scan:file(FileName).

Xml is an xmlElement record, with different fields (e.g. name, parents, attributes and content) those fields can be accessed as you usually do with record’s fields. Seems trivial that name of the element is in the name field, parents field is the names of the parent elements saved. Parents is just a list of tuples where the first element in each tuple is the name of the parent element. The list is in reverse order.

The record xmlAttribute contains the name and value of an attribute in the fields name and value. All attributes of an element is a list of xmlAttribute in the field attributes of the xmlElement record.

Taking a look to the actual XHTML page you will see that all of the messages are incapsulated as XML in the following form:

<title>Message from Sender to recipient</title>
<pubDate>Publishing date</pubDate>
<guid>a link</guid>
<link>another link</link>

It seems quite obvious that the information we need from this XML are <description> and <title> elements , we can extract them in  this way:

6> Descriptions = xmerl_xpath:string("//item/description/text()", Xml).
7> Titles = xmerl_xpath:string("//item/title/text()", Xml).

Now we have two lists, one with the actual messages and one with the Senders.

If you are working on the shell as I am, you should import the record xmerl_scan:

8> rr(xmerl_scan).

while if you are writing you own module you should use:


How can we access value insides the XML element now? It is quite triavial again! You can just do something like this:

9> [H|T] = Descriptions.
10> H#xmlText.value.
"Text of the message"

Ok, from now I leave you alone! Just try and retry!

Categories: English, Erlang Tags: , , ,
  1. July 10, 2010 at 9:59 am

    Hi Paolo. Xml and erlang… You have interesting articles. I think there is not much literature about it in the web. But processing files in erlang is slow as hell as far as i can tell, i hope you don’t have to process a lot of data?

    Thanks for the message on twitter about germanys defeat… I hope we will beat Uruguay tonight!!

    • pdincau
      July 11, 2010 at 9:31 pm

      not much data ;D
      actually xml is a pain as well, but somehow i have to deal with it!

      about germany, this year you had a great team!

  2. July 15, 2010 at 7:31 pm

    hi paolo, i have updated my page now… finally! A friend and me have written an article on distributed hashtables and implemented a protocol “chord” in erlang. Document is in german, but we have put some code in the document… nothing special. I will probably translate it to english. 🙂

    • pdincau
      July 16, 2010 at 3:08 pm

      hi, i will take a look at it…at leat at the code 😀 there’s always something to learn!

  1. July 5, 2010 at 7:08 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: