Determining a File Type In Java
In most applications, we need to download and upload file features. During these downloads and uploads, we sometimes need to specify the format of the file, or we need to be sure that the file has the same format with the format that was chosen by the user. For these needs, we can use several approaches in Java. We will list these approaches in this article.
1. Files.probeContentType(Path)
With the probeContentType
method of the java.nio.file.Files class that came with Java 7, we can get the type of the file that we gave in the path name which we passed in as the parameter to the method.
Below, we gave the name of the example file JPG_Test_File.jpg
to getFileTypeByProbeContentType
method. In this method, we call Files.probeContentType(Path)
method, and we get image/jpeg as the file type.
xxxxxxxxxx
public class FileTypeDetection {
public static String getFileTypeByProbeContentType(String fileName){
String fileType = "Undetermined";
final File file = new File(fileName);
try{
fileType = Files.probeContentType(file.toPath());
}
catch (IOException ioException){
System.out.println("File type not detected for " + fileName);
}
return fileType;
}
public static void main(String[] args) {
System.out.println(getFileTypeByProbeContentType("JPG_Test_file.jp"));
}
}
Output: image/jpeg
But, if we only change the extension of the file and make it PPTX and give the new file name as parameter to the same method we don’t get the same result:
Output: application/vnd.openxmlformats-officedocument.presentationml.presentation
If we rename the file and remove the extension completely, we couldn’t get a file type by the same method.
Output : null
2. MimetypesFileTypeMap.getContentType(String)
We can use the file's name and pass it to the getContentType
method of MimetypesFileTypeMap
class came with Java 6 in order to get the file type.
Here is our method:
xxxxxxxxxx
public static String getFileTypeByMimetypesFileTypeMap(final String fileName){
final MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
return fileTypeMap.getContentType(fileName);
}
If we call this method for the file we changed the extension of to PPTX, we get the following result as file type:
Output: application/octet-stream
3. URLConnection.getContentType()
With the getContentType
method of the URLConnection
, class we can get content type of a file.
xxxxxxxxxx
public static String getFileTypeByUrlConnectionGetContentType(final String fileName){
String fileType = "Undetermined";
try{
final URL url = new URL("file://" + fileName);
final URLConnection connection = url.openConnection();
fileType = connection.getContentType();
}
catch (MalformedURLException badUrlEx){
System.out.println("ERROR: Bad URL - " + badUrlEx);
}
catch (IOException ioEx){
System.out.println("Cannot access URLConnection - " + ioEx);
}
return fileType;
}
If we call this method for the file that we changed the extension of to PPTX, we get the following result as file type:
Output: content/unknown
4. Apache Tika
Previous three approaches are provided by the JDK. However, there are others like Apache Tika. Apache Tika is a very successful library and is good at detecting file type via analyzing file content independently of its extension.
Our method gets InpustStream as parameter and uses detect method of Apache Tika:
xxxxxxxxxx
public static String getFileTypeByTika(InputStream istream) {
final Tika tika = new Tika();
String fileType ="";
try {
fileType = tika.detect(istream);
} catch (IOException e) {
System.out.println("*** getFileTypeByTika - Error while detecting mime type from InputStream ***");
System.out.println("*** getFileTypeByTika - Error message: " + e.getMessage());
e.printStackTrace();
}
return fileType;
}
If we convert the file that was originally in JPEG format to FileInputStream, but we change the extension of it to PPTX and give it as a parameter to getFileTpyeByTika
, we get the following result:
Output: image/jpeg
Tika detected the type of file correctly.
We can use the detect
method of Apache Tika with parameter has type of File, instead of InputStream. We will use following method to use the detect method of Tika with File parameter:
xxxxxxxxxx
public static String getFileTypeByTika2(File file) {
final Tika tika = new Tika();
String fileTypeDefault ="";
try {
fileTypeDefault = tika.detect(file);
} catch (IOException e) {
System.out.println("*** getFleeTypeByTika2 - Error while detecting file type from File ***");
System.out.println("*** getFileTypeViaTika2 - Error message: " + e.getMessage());
e.printStackTrace();
}
return fileTypeDefault;
}
If we provide the file that was originally in JPEG format, but we change the extension to PPTX again, Tika will detect the file type correctly:
Output: image/jpeg
As we can see, Apache Tika can detect the file type correctly despite the change of the file extension. We can use Apache Tika when the file type is crucial or file type can effect the flow of an application.