String filtering in Java

Today I’ve needed to match strings in Java against a pattern similar to the ones used in filename matching. Java already has a String.matches() method to test against a regular expression, but hasn’t one to match against more limited filename-like patterns (* and ? wildcards).

At first, I tought about implementing the matching by hand, comparing character to character, but soon found a quickier and simpler approach: transform the pattern into a regular expression. Maybe this source code could be useful for you:

    private boolean matchFilter(String sample, String filter) {
        if (sample==null || filter==null) return true;

        StringBuffer f=new StringBuffer(".*");
        
        for (StringTokenizer st=new StringTokenizer(
                filter,"%*",true);
                st.hasMoreTokens();) {
            String t=st.nextToken();
            if (t.equals("?")) f.append(".");
            else if (t.equals("*")) f.append(".*");
            else f.append(Pattern.quote(t));
        }
        f.append(".*");
        
        return sample.matches(f.toString());
    }

4 thoughts on “String filtering in Java”

  1. Buenas, Enrique, soy Luis Barreiro, tu ex compañero de facultad… por casualidad acabo de dar con tu weblog… y acabo de ver tu ejemplo sobre filtrado de String en Java, un pequeño consejo, no uses StringTokenizer para obtener tokens de un String; StringTokenizer es una clase legada que es mantenida por razones de compatibilidad, pero se desaconseja su uso; a tal efecto se recomienda usar el método split de la propia clase String.

    Un saludo, tio!

  2. Muchas gracias por el consejo, me alegro mucho de tener de nuevo noticias tuyas. 🙂

    Si algún día quieres contarme qué es de tu vida, envíame un correo a eocanha [arroba] igalia.com

    Saludos!

  3. Hi,

    Thanks for having this great idea!!

    We have an update because the code seems to be to happy to match. This was caused by the preceding and ending “.*” which in the end will match to much. e.g.

    filter: “?.x?*”
    file: “file.xml”

    resulted in regex “.*.Q.xE..*.*” which matches “file.xml” but should not match it.

    removal of the beginning and ending “.*” now results in regex
    “.Q.x..*” which does not match “file.xml” which is okay.

    Also the “%” in string tokenizer is ofcourse replaced by “?” because “?” and “*” are the special cases.

    Also we return false if either input parameter is null instead of true which means matches. Seems more logical.

    Here is the updated routine. Stick it in a class called FileUtils. There are testcases appended for both true and false.

    /**
    * Tests if the given filename matches the filter. The filter is the regular
    * filename filter and not a regular expresion. Allowed are *.* or ???.xml,
    * etcetera.
    *
    * @param fileName
    * @param filter
    * @return true if matches and false if either null or no match
    */
    public static boolean matchFilter(String fileName, String filter) {

    if (fileName == null || filter == null)
    return false;

    StringBuffer f = new StringBuffer();

    for (StringTokenizer st = new StringTokenizer(filter, “?*”, true); st.hasMoreTokens();) {
    String t = st.nextToken();
    if (t.equals(“?”))
    f.append(“.”);
    else if (t.equals(“*”))
    f.append(“.*”);
    else
    f.append(Pattern.quote(t));
    }
    return fileName.matches(f.toString());
    }

    TEST CASE:

    package nl.remain.core.util;

    import junit.framework.TestCase;

    public class FileUtilsTest extends TestCase {

    public void testMatchFilter() {

    assertTrue(FileUtils.matchFilter(“file.xml”, “*.xml”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “file.xml”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “????.*”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “?i?e.xml”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “f???.xml”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “*.??l”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “*.x?l”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “?*.x?*”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “fi*.xml”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “file.*ml”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “?*?.*”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “f???.???”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “f???.*??”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “*???l”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “*x?l”));
    assertTrue(FileUtils.matchFilter(“file.xml”, “?*x?*”));

    assertFalse(FileUtils.matchFilter(“file.xml”, “r*.xml”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “?????.xml”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “f*F.*”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “FILE.xml”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “File.xml”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “*.??x”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “*.x?m”));
    assertFalse(FileUtils.matchFilter(“file.xml”, “?.x?*”));

    }

    }

Comments are closed.