Home > English, Erlang > Count unique items in a text file using Erlang

Count unique items in a text file using Erlang


Many times during our programming daily routine, we have to deal with log files. Most of the log files I have seen so far are just text files where the useful  information are stored line by line.

Let’s say you are implementing a super cool game backend in Erlang, probably you would end up with a bunch of  servers implementing several actions (e.g. authentication, chat, store character progress etc etc); well I am pretty sure you would not store the characters info in a text file, but maybe (and I said maybe) you could find useful to store in a text file some of the information that comes from the authentication server.

For example you can have a log file called auth.log where you store the username, the IP, and a timestamp connected to a request the authentication server handled. Let’s keep it simple and imagine the log file will be somehow similar to this:

pdincau 193.205.210.66 1256953732
paolod 193.205.210.33 1256999472
pdincau 193.205.23012 1256999496
michael_m 127.0.0.1 1257592446

As you can see the log file is very trivial (in this post the point is not the log file itself), we just store in a single line the information related to a request, and in each line the fields related to aforesaid request are separated using a space character.

At some point, I believe you would like to count how many unique users connected to you supadupa game; and a good way to know this consists in counting how many unique users you have in your auth.log file. How can we implement somethink like this in Erlang?

Let’s see some Erlang code for it, but hey! I must be honest here, this code is not mine, actually this post is based on a stack-overflow question I have found while looking if somebody already solved this problem online, so all the credits for the following code goes to its real author: Emil Vikström.

% Count the number of distinct users in the file named Filename
count_users(Filename) ->
    {ok, File} = file:open(Filename, [read, raw, read_ahead]),
    Usernames = usernames(File, sets:new()),
    file:close(File),
    sets:size(Usernames).

% Add all users in File, from the current file pointer position and forward,
% to Set.
% Side-effects: File is read and the file pointer is moved to the end.
usernames(File, Set) ->
    case file:read_line(File) of
        {ok, Line} ->
            Username = hd(string:tokens(Line, " ")),
            usernames(File, sets:add_element(Username, Set));
        eof ->
            Set
    end.

This code is pretty easy to understand, it uses sets to solve our initial problem. Log files are usually very huge (e.g. I had a 56Mb file log lately) therefore Emile’s code is very interesting since we don’t read the all file at once, but instead we read one line at a time from the file and add the username to the set, all inside a tail recursive function.

That’s all folks, hope you will find this post useful!🙂

Categories: English, Erlang Tags:
  1. October 16, 2012 at 8:36 pm

    Do you have a URL to the full source? Gist, GitHub, Stackowerflow? Thanks in advance.

  1. October 17, 2012 at 2:09 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: